Page 1 of 1
Removing empty lines after removing HTML tags
Posted: Wed Oct 15, 2014 10:16 pm
by Tribblehunter
Hi all.
Have successfully created my first couple of programs which has proven useful (and not just something for learning!).
Now need to finesse them and am stuck on an issue with parsing data from web site.
I have the source from the website in a field. I have removed the HTML tags ( which gives me the text and other things).
I am trying to remove all the blank lines so it is easier to work with the text.
Filter field with empty does not seem to do it.
I am guessing there are some 'hidden characters' which are left from the removal of the HTML.
Any pointers? I have searched the internet but can not seem to find exactly what is happening.
Re: Removing empty lines after removing HTML tags
Posted: Wed Oct 15, 2014 10:25 pm
by Simon
Hi Tribblehunter,
You should not be manipulating text in a field as this is a very slow process (processor intensive), dump the field into a variable and do all your work on it then stuff it back into the field.
See if filter without empty works for you then.
Simon
Re: Removing empty lines after removing HTML tags
Posted: Thu Oct 16, 2014 11:46 pm
by jiml
Re: Removing empty lines after removing HTML tags
Posted: Fri Oct 17, 2014 12:20 am
by Tribblehunter
Thanks guys.
I will try the suggestions out.
Re: Removing empty lines after removing HTML tags
Posted: Tue Oct 21, 2014 10:35 pm
by Tribblehunter
Neither worked.
Managed to use regex "\s" to remove all spaces, but this left all text on one line.
Re: Removing empty lines after removing HTML tags
Posted: Tue Oct 21, 2014 10:44 pm
by Simon
Hi Tribblehunter,
Since it doesn't see the empty lines as empty, they could have something like tabs in them or one of the unprintable characters.
You should do a charToNum on the blank line to see what is there.
Simon
Re: Removing empty lines after removing HTML tags
Posted: Tue Oct 21, 2014 11:16 pm
by [-hh]
Hi all,
perhaps our problem is that we dont know whether you are working
= with the text from a field or
= with the htmltext from a field
For example setting the htmltext of a field to the content of a html file gives you the text of a file (and you don't have to remove html tags in case LC did this to your needs).
Then you could do, as Simon eventually said:
Code: Select all
-- a bit cumbersome, could be one regex line (forbidden in
-- the begiiners subforum). Here you can easily control the details:
put the text of fld "IN" into S
replace tab with space in S
repeat while space & space is in S
replace space & space with space in S
end repeat
repeat while cr & space is in S
replace cr & space with cr in S
end repeat
repeat while space & cr is in S
replace space & cr with cr in S
end repeat
repeat while cr & cr is in S
replace cr & cr with cr in S
end repeat
put S into fld "OUT"
This should remove nearly all unwanted whitespace from your field.
Re: Removing empty lines after removing HTML tags
Posted: Wed Oct 22, 2014 6:51 am
by Tribblehunter
I understand more now! Thank you very much.
I will experiment this evening. Off to do normal job now!
Re: Removing empty lines after removing HTML tags
Posted: Thu Oct 23, 2014 10:23 pm
by Tribblehunter
Thanks for all the help.
With a bit of playing around it worked. Ish. But good enough for the testing phase. Little program for organising workshop manuals and checking internet for manuals via serial number entry.
My first proper program that is actually being used!! lol