Page 1 of 3

Regex for LiveCode

Posted: Tue Jul 16, 2024 12:02 pm
by stam
Hi all,
mainly for by own benefit I created a regex cheatsheet and then decided to flesh it out a bit, so it's become a mini-primer.
I'm an intermediate, not expert, regex user, and find this kind of thing helpful for my own learning.
Might be useful for those new to regex...

I've also added a handler to workaround 'global match' mode, which doesn't exist in PCRE regex/LiveCode (all matches are returned), and will add 1-2 more useful workarounds in the near future.


I've made it into a wiki using GitHub: https://github.com/stam66/regexPrimerForLivecode/wiki

I'm planning to make an stack out of this as well for interactive testing, although as always I recommend a 'proper' online IDE for this: https://regex101.com

If anyone has any interest, have a look at the wiki and please raise an issue on GitHub if you want to make corrections or recommendations.

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 12:54 pm
by bn
Hi Stam,

Thank you for making this Regex primer available. I always struggle with Regex. This is very helpful.

Kind regards
Bernd

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 5:13 pm
by dunbarx
I always struggle with Regex
Why? It has a plain english syntax, just like LC.

Craig

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 5:14 pm
by Klaus
dunbarx wrote:
Tue Jul 16, 2024 5:13 pm
I always struggle with Regex
Why? It has a plain english syntax, just like LC.
Craig
LOL! :D :D :D
A really good one! :)

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 6:03 pm
by stam
In truth you couldn’t have a more un-livecode-ish syntax… but what it does it does well… for the odd occasion this is needed.

Sometimes you can kludge it with multiple nested loops to achieve the same result. Not sure that’s more elegant…

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 6:16 pm
by Klaus
You overlooked the irony tags! :-D

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 6:36 pm
by dunbarx
I use regex now and then, when it seems that it can do something better than a bunch of "filter" or "match-xxx" stuff can. But I always have to go to the "filter" command in the dictionary for a quick refresher tutorial. And that is only a small subset of regex.

Always.

Craig

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 7:00 pm
by stam
dunbarx wrote:
Tue Jul 16, 2024 6:36 pm
I use regex now and then, when it seems that it can do something better than a bunch of "filter" or "match-xxx" stuff can. But I always have to go to the "filter" command in the dictionary for a quick refresher tutorial. And that is only a small subset of regex.

Always.

Craig
I started this because I needed just a little bit more and really just wanted to create a cheat sheet as a quick reference.

Here’s a shortcut: https://github.com/stam66/regexPrimerFo ... cheatsheet

As for testing the regex I would wholeheartedly recommend https://regex101.com

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 7:28 pm
by dunbarx
Stam.

Thanks, that is a pretty compact cheatSheet. I can only recommend (and ask) that you give an example with each entry. From the dictionary in the "filter entry:
[chars] : Matches any one of the characters inside the brackets. The filterPattern A[BC]D matches "ABD" or "ACD", but not "AD" or "ABCD"
Craig

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 7:31 pm
by SparkOut
I have not been able to look yet but I will say a huge thank you already.

Having a Thierry at your beck and call would be miles better, of course. But this will be enormously valuable without that possibility.

Re: Regex for LiveCode

Posted: Tue Jul 16, 2024 8:06 pm
by stam
dunbarx wrote:
Tue Jul 16, 2024 7:28 pm
Stam.

Thanks, that is a pretty compact cheatSheet. I can only recommend (and ask) that you give an example with each entry. From the dictionary in the "filter entry:
[chars] : Matches any one of the characters inside the brackets. The filterPattern A[BC]D matches "ABD" or "ACD", but not "AD" or "ABCD"
Craig
Erm... there are some minimal examples here and there - and an explanation for almost all metacharacters. The cheatsheet is not the wiki, it's only 1 page of 14.
There is a navigation table to the right (on desktop - on mobile it appears below) and each header of a group is also a link to each section.
I could add more examples if needed. I'm open to suggestions with a source text and what to match - otherwise can just take inspiration from some of the tutorial sites that go through most of these things piecemeal.

I do not claim to be a regex authority like Thierry, but I have been able to use regex extensively (and for what it's worth, I've already messaged him with a query but he has his hands full by the sound of it).

I've had issues here and there, and this wiki is my way of putting knowledge into order for my own use - but it will likely help others. It is not a full tutorial by any stretch and there are large segments like conditional statements and POSIX commands that are entirely missing. If there is a need I can look into it and add more 'chapters'.

I'm also including some handlers to work around some issues with LC - the one handler I've added so far is to emulate the 'global match' mode in PHP / javascript (where all matches are returned, not just the first one as with PCRE / LiveCode regex).

I'm going to add a second handler to improve the matchText() function (the limitation of matchText is that you need to manually add variables to hold each capture group, but that should be easy enough to work around).

Re: Regex for LiveCode

Posted: Thu Jul 18, 2024 2:23 pm
by stam
dunbarx wrote:
Tue Jul 16, 2024 7:28 pm
Stam.

Thanks, that is a pretty compact cheatSheet. I can only recommend (and ask) that you give an example with each entry. From the dictionary in the "filter entry:
[chars] : Matches any one of the characters inside the brackets. The filterPattern A[BC]D matches "ABD" or "ACD", but not "AD" or "ABCD"
Craig
Well I'm not sure that explains to the reader what [BC] actually means and why it doesn't match ABCD - or how to make it match ABCD as well: A[BC]+D will do, while A[BC]*D will match both AD, ABC and ABCD (and any string that starts with A, has zero or more Bs or Cs and ends with D). To make it match only ABD, ACD and AD: A[BC]?D.

A common problem with pretty much all learning material from LC is they never do the basics, so while the example shown may make sense, it's rare that this will allow the reader to expand on this, just based on the tutorial provided.
Another great example of this deficiency is the responsiveLayout tutorial - yes it makes sense, but doesn't touch the many other options available and the library is pretty much unusable by mere mortals.

Taking onboard your suggestion for examples though, I've added some for Character Classes to start with, having shamelessly borrowed from tutorial sites: https://github.com/stam66/regexPrimerFo ... s-examples

Do you think this will be good enough? if so, will add more example pages for the other groupings when I have time... (although as mentioned a number of the grouped categories already have examples in them) - but not character classes, which I've added now.

If something requires clarification let me know and I'll rephrase...

Re: Regex for LiveCode

Posted: Sat Aug 31, 2024 12:00 pm
by kaveh1000
Hi Stam

I am a bit late to the party, but interested in what you are doing, and happy to contribute. You are aware that LiveCode can search for regex patterns but cannot use backreferencing, e.g. \1\2, etc. This is a major gap. someone mentioned Thierry, who has implemented full regex replace and it works well, but as it is not open, I cannot be dependent on it. So I have written my own (basic) replace with backreferencing.

Something I am trying to work out right now is whether LiveCode can use backreferencing in the search pattern. For instance if I want to find duplicated lines in this string:

Code: Select all

...
one
two
two
three
...
I would search for

Code: Select all

^(.+)\n\1
I think not. Again something I would do very quickly in another system needs extra work in LiveCode.

It would be wonderful if LiveCode could have full built in regex search and replace, and I am happy to contribute if I can.

Finally, I agree wholeheartedly with regex101. I use it all the time and the fact that you can save each pattern is amazing, e.g. my example here:https://regex101.com/r/bf0xuA/1.

Re: Regex for LiveCode

Posted: Sat Aug 31, 2024 12:50 pm
by stam
kaveh1000 wrote:
Sat Aug 31, 2024 12:00 pm
Something I am trying to work out right now is whether LiveCode can use backreferencing in the search pattern. For instance if I want to find duplicated lines in this string:

Code: Select all

...
one
two
two
three
...
I would search for

Code: Select all

^(.+)\n\1
I think not. Again something I would do very quickly in another system needs extra work in LiveCode.
Your regex will only find a duplicate if it immediately followed by an identical value - perhaps that's what you're specifically needing and you can always order the data so it ends up like that.

However, if you're just searching for duplicates you'd ideally use global mode /g, which isn't available in PCRE regex but is in php/javascript flavours. Hence of course it's available in regex101.

You can code around this issue with livecode. I posted a workaround for this on the wiki. It returns an array with a the found text, the start and end chars of each found text. It's fairly quick. Perhaps that can help with your particular issue? (I know it's not the same, but may be helpful...)
The code is here: https://github.com/stam66/regexPrimerFo ... workaround

PS: I wasn't aware I couldn't use the \# functionality in LiveCode - thanks for pointing that out. I must never have had to use this in LC :-/ I also now see that backreferencing by name also doesn't work - the following is equivalent, but doesn't work in LC:

Code: Select all

^(?'duplicate'.+)\n\k'duplicate'
It is not great that they advertise PCRE compatibility but doesn't include all PCRE features :/
Regarding the wiki, I'll leave that in as a reference, but will include a warning that these is not supported in livecode. Grrrrrr....

I'll be honest, I haven't tested these to see if they work in LC. I shamelessly used stuff from other sites and just presumed that since LiveCode is uses PCRE regex, these should work. - can you see any other issues?
Please do feel free to contribute - might be easier to raise an issue on GitHub

Stam

Re: Regex for LiveCode

Posted: Sat Aug 31, 2024 2:04 pm
by kaveh1000
Thanks for the very quick reply.

My present requirement is very simple. I have the following text:

Code: Select all

Collection	47
Collection	●
CollectionIdentifier	50
CollectionIdentifier	●
CollectionIDType	51
CollectionIDType	●
CollectionSequence	52
CollectionSequenceNumber	55
CollectionSequenceType	53
CollectionSequenceTypeName	54
CollectionType	48
CollectionType	●
I want to find duplicate lines with numbers and bullets. Here is my regex:

Code: Select all

^(.+)\t[0-9]+\n\1\t●
and on regex101: https://regex101.com/r/dLvXoE/1

It's a shame I can't use that directly in LiveCode but I will certainly look at your workaround. Ping me any time!