Page 1 of 2
Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 3:30 pm
by jameshale
Hi,
I have fields with multiple empty lines (return characters) preceding the actual text.
I would like to use the replacetext function to remove them.
as in
Code: Select all
put "^\r" into rt
Put replacetext(Fld "f1",rt,"") into Fld "f1
I have tried a few different expressions in rt but the most I can accomplish affects only the first occurrence.
I know I could accomplish this with a repeat loop removing empty lines but the replace text function should also be able to do it surely.
any thoughts?
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 3:41 pm
by Klaus
Hi James,
sorry I have no idea of REGEX, but doesn't:
do the trick?
Best
Klaus
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 4:30 pm
by dunbarx
My instant reaction was "What Klaus said".
I then stopped for just one second thinking that maybe one could have "empty" somehow embedded inside a line, but then thought that I have no idea if that even makes sense. Spaces ain't empty.
The only "place" that empty can "live" (apart from being the entirety of the contents of a container as a whole) is the entirety of a line. I cannot see any other chunk, apart from a line, having empty as a member.
I think this is so. (?)
Craig
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 4:40 pm
by richmond62
Code: Select all
on mouseUp
put fld "xyz" into XYZ
put empty into fld "xyz"
repeat until XYZ is empty
if char 1 of XYZ is cr then
--- do nothing
else
put char 1 of XYZ after fld "xyz"
end if
delete char 1 of XYZ
end repeat
end mouseUp
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 4:47 pm
by Cairoo
Klaus wrote: ↑Thu Apr 06, 2023 3:41 pm
Hi James,
sorry I have no idea of REGEX, but doesn't:
do the trick?
Best
Klaus
What Klaus said, except you may have to use the form "filter
lines of".
Gerrie
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 5:06 pm
by Klaus
Will also work without "... lines of ..."!
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 5:54 pm
by dunbarx
Correct.
If you read my rant, that "empty" exists only as the explicit contents of a container OR as a "character" in a container that contains lines, then eliminating empty is identical to eliminating lines that contain empty.
One cannot have a line (or any other chunk) with both empty and anything else at all. That is what made me hesitate for just a second before posting the usual "What Klaus said"
Craig
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 5:55 pm
by stam
I regularly use regex (excuse the pun!) but wouldn't use regex for this personally -
filter is an excellent fit.
Regarding syntax, this is:
Code: Select all
filter lines of <container> without empty [into <container2>]
The [ ] denotes optional (ie if you want to store the filtered result in a different container)
HTH
S.
PS: If you're set on using regex, the correct expression would probably be something like (\R\s*\R) where \R is CR, LF or CRLF, \s is any whitespace (including tab etc) and * denotes zero or more of the previous char - so you'd search for return + optional whitespaces + return and replace it with a single return, ie:
Code: Select all
put replaceText(<container>, "\R\s*\R", return) into <container>
but not really sure this is a better solution that filter above.
I always recommend
https://regex101.com when designing/testing regex
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 6:08 pm
by dunbarx
@Stam.
Cannot hurt.
@ All
If one has three lines in a container (field 2 in this test):
a
b
and one does this:
Code: Select all
on mouseUp
repeat with y = 1 to 10
put charToNum (char y of fld 2) & return after temp
end repeat
end mouseUp
One gets:
97
10
10
98
Empty has no ASCII value, even though it does have a reality. When LC deletes the lines of text that are empty, it "knows" that even though those lines do indeed contain a valid character (ASCII 10) it deletes them anyway. LC just knows that this is how humans see it, and accommodates.
But otherwise I am not sure how to find empty except by other means, such as finding the length of each line. In the above example, there are three lines, but the length of line 2 is 0.
This is why "return" (CR) is considered a "control" character, whereas "a" is a real character.
Craig
Craig
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 6:20 pm
by stam
Craig, you're assuming empty is indeed empty. But one should guard against three being a space or a tab in there. Or maybe even some silly ascii code that doesn't have a char, like ascii 7 or 27.
Whitespace chars (like the above) are considered 'empty' in the filter statement, but also by the
\s regex token (as in my amended post above).
Personally I avoid looping like the plague

Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 6:22 pm
by Cairoo
Klaus wrote: ↑Thu Apr 06, 2023 5:06 pm
Will also work without "... lines of ..."!
Yes, Klaus, I stand corrected.
Something I've noticed while looking for an actual regex solution is that LiveCode wrongly interprets the regex "\r" as "\n".
So the following regex-based code should work, but doesn't:
Code: Select all
put replaceText(replaceText(replaceText(fld "f1","^\r",""),"\r\r",""),"\r$","") into fld "f1"
and the following shouln't work, but does:
Code: Select all
put replaceText(replaceText(replaceText(fld "f1","^\n",""),"\n\n",""),"\n$","") into fld "f1"
Perhaps it warrants a bug report?
Gerrie
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 6:25 pm
by stam
Cairoo wrote: ↑Thu Apr 06, 2023 6:22 pm
Perhaps it warrants a bug report?
I posted an explanation above: the correct regex - and I say correct because it caters for any 'whitespace' characters (like space or tab) in the 'empty' line, so you can capture both
return & return as well as
return & space & return:
Code: Select all
put replaceText(<container>, "\R\s*\R", return) into <container>
if you want to capture all forms of a newLine char, use
\R, not \r or \n. More specifically, \R captures: \r\n|\n|\x0b|\f|\r|\x85
Also, unless you want to concatenate lines you still have to replace the double returns with a single return...
HTH
S.
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 6:37 pm
by Cairoo
stam wrote: ↑Thu Apr 06, 2023 6:25 pm
...
if you want to capture all forms of a newLine char, use
\R, not \r or \n....
Indeed the correct regex. It still bugs me that LiveCode wrongly interprets "\r" as "\n", though.
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 6:50 pm
by stam
Cairoo wrote: ↑Thu Apr 06, 2023 6:37 pm
Indeed the correct regex. It still bugs me that LiveCode wrongly interprets "\r" as "\n", though.
could this have something to do with the fact the line ending could be: \r, \n, or \r\n?
are you sure your \r isn't picking up \r\n?
If you are convinced your regex is correct test it out on
https://regex101.com, and you'll be able to see what exactly is being captured... it also has a hand set of features like a quick reference, hovering over tokes in your expression tells you what they do etc.
For what it's worth I practically never need nested replaceText or findText messages - regex is an incredibly flexible language that can manage all of that, but not easy to use; this website lets me tinker with regex until I get it to do what I want with a single statement...
S.
PS: in your example you nest two replaceText commands but that's unnecessary - all you need to do is replace 2 consecutive line endings with one. There is a scenario where the 'empty' line may have for example tabs in it - if you have records in TSV format but the fields of one record are all empty. The text will will just show line->empty line->line but in reality it's line->(tab tab tab)->line so important to guard for that.
Filter lines does this for you but the regex I post above will as well.
Re: Regex to remove multiple return characters
Posted: Thu Apr 06, 2023 7:12 pm
by dunbarx
Stam.
I see the ghostlike issue here, but characters like "tab" are "real".
If you have a field with what looks like an empty line, but it in fact contains a tab char, then:
Code: Select all
filter lines of fld 2 without empty
leaves those lines alone. They are not, er, empty.
I am not sure we are in disagreement overall. The issue really is that there are "real" chars and "control" chars. One has to jump through loops to really delete lines that read empty, as opposed to lines that actually are empty.
Craig