Page 1 of 2

Regex not working

Posted: Thu Dec 17, 2020 5:26 pm
by micro04
I can not get the Rexex expression to work correctly.
I am tying to find lines matching the pattern X12345Y12345 from a very long list. The pattern [X] will works, but I want to make sure I match
X followed by a 5 digit mumber then Y followed by a 5 digit number. Found a regex test web site, but Livecode version of Regex seems different.


Code: Select all

filter lines of field "fld_text" matching "[X]\d{5}" into field "fld_results"

Re: Regex not working

Posted: Thu Dec 17, 2020 6:39 pm
by dunbarx
Hi.

I am no regex guy at all, but the old fashioned way would be:

Code: Select all

on mouseUp
   repeat with y = 1 to 10000
      put "X" &  random(88888) + 10000 & "Y" &  random(88888) + 10000  into line y of temp
   end repeat
   put "X12345Y12345" into line random(10000) of temp
   
   ---actual routine starts here
   
   put 1 into tIndex
   repeat for each line tLine in temp
      if tLine= "X12345Y12345" then
         exit repeat
      end if
      add 1 to tIndex
   end repeat
   answer tIndex
end mouseUp
Not sure how long your list is. The list above is 10,000 lines. only a second or two was needed to find the line of interest

The entire first section is just to create a list with one valid entry somewhere in it. You can simply substitute (pseudo):

Code: Select all

put yourData into temp
for that section.

Craig

Re: Regex not working

Posted: Thu Dec 17, 2020 11:08 pm
by micro04
Thanks for replying.
I do not thinks this would work for what I am trying to do.
I need to find lines of data that match this general pattern. I am not trying to find specifically X12345Y12345. I need to find any line that only contains
X{5 digit number]Y{5 digit number}.
examples:
X89888Y45678
X45898Y34833


I need Regex to find all lines that match the pattern.The field I am reading also contains other information that I do not want in the final list.

Re: Regex not working

Posted: Thu Dec 17, 2020 11:47 pm
by dunbarx
Hi.

Piece of cake. Before all this fancy regex stuff, we actually worked for a living.

This handler looks for a properly formatted string.

Code: Select all

on mouseUp
   --insert your data into temp
   
   put 1 into tIndex
   repeat for each line tLine in temp
      if char 1 of tLine = "X" and char 2 to 6 of tLine is a number and char 7 of tLine = "Y" and char 8 to 12 of tLine is a number\
      and the length of tLine = 12 then
         exit repeat
      end if
      add 1 to tIndex
   end repeat
   answer tIndex
end mouseUp
Craig

Re: Regex not working

Posted: Fri Dec 18, 2020 12:05 am
by dunbarx
Here is a test handler that you can step through to see how it worked in the old days. To be fair, the "filter" command and regex itself is much more modern and faster.

Code: Select all

on mouseUp
  put "X123Y123" into line 1 of temp
   put "X1234567Y123" into line 2 of temp
   put "X12345Y12345" into line 3 of temp
   put "X123Y1234567" into line 4 of temp
   
   put 1 into tIndex
   breakpoint
   repeat for each line tLine in temp
      if char 1 of tLine = "X" and char 2 to 6 of tLine is a number and char 7 of tLine = "Y" and \
      char 8 to 12 of tLine is a number and the length of tLine = 12 then exit repeat
      add 1 to tIndex
   end repeat
   answer tIndex
end mouseUp
Craig

Re: Regex not working

Posted: Fri Dec 18, 2020 12:41 am
by micro04
That should help.
Still want to find out why the regex was not working. Livecode regex must be different from other programming languages.

Re: Regex not working

Posted: Fri Dec 18, 2020 1:23 am
by dunbarx
A guy named "Thierry" might just chime in soon.

Craig

Re: Regex not working

Posted: Fri Dec 18, 2020 4:41 am
by hpsh
seems this works for me
-- Sent when the mouse is released after clicking
-- pMouseButton specifies which mouse button was pressed

Code: Select all

-- Sent when the mouse is released after clicking
-- pMouseButton specifies which mouse button was pressed
on mouseUp pMouseButton
   put empty into field "result"
   repeat for each line tLine in field "input"
      if matchtext(tLine,"[X]\d{5}") then
            put tLine&cr after field "result"
   end if
   
            end repeat
   end mouseUp
using 2 scrolling fields named input and result, hope it helps but this is written at 4 am :-)

edited because I still don´t get the difference between am and pm at age 52 LOL

Re: Regex not working

Posted: Fri Dec 18, 2020 5:11 am
by hpsh
darn it, had to check this filter thingy

Code: Select all

-- Sent when the mouse is released after clicking
-- pMouseButton specifies which mouse button was pressed
on mouseUp pMouseButton
   local mText
   put field "input" into mText
   filter lines of mText matching regex"[X]\d{5}"
   put mText into field "result"
end mouseUp
happy coding folks :-)

Re: Regex not working

Posted: Fri Dec 18, 2020 11:27 am
by micro04
Thanks,
Got it working, must have the word regex in the program line.

Code: Select all

 filter lines of field "fld_text" matching regex "[X]\d{5}[Y]\d{5}" into field "fld_results"

Re: Regex not working

Posted: Fri Dec 18, 2020 2:47 pm
by dunbarx
How long does the regex solution take to find a line among 10,000?

Craig

Re: Regex not working

Posted: Fri Dec 18, 2020 6:36 pm
by FourthWorld
dunbarx wrote:
Fri Dec 18, 2020 2:47 pm
How long does the regex solution take to find a line among 10,000?
Comparative benchmarks with real-world uses would be very interesting.

Many years ago I ran one which favored looping chunk expressions, though I don't imagine that would be any sort of universal rule.

With enough comparisons we may be able to discern patterns that can guide us to the most efficient option for a given type of task.

Re: Regex not working

Posted: Fri Dec 18, 2020 7:04 pm
by jacque
Mark Waddingham once told me there's no specific answer. Execution time depends on the content of the data, length of the data, and structure of the regex. If timing is important, you'd need to test both methods per each example.

Re: Regex not working

Posted: Fri Dec 18, 2020 7:38 pm
by FourthWorld
jacque wrote:
Fri Dec 18, 2020 7:04 pm
Mark Waddingham once told me there's no specific answer. Execution time depends on the content of the data, length of the data, and structure of the regex. If timing is important, you'd need to test both methods per each example.
Exactly. It's not the algo, but the application of the algo to a certain type of problem. But they aren't random, they follow patterns.

I've done enough comparative benchmarking between arrays and chunks to have a fairly useful sense of when to use each. With enough benchmarking of regex vs chunks, similarly useful guidance may emerge.

Re: Regex not working

Posted: Fri Dec 18, 2020 8:26 pm
by hpsh
for me, 10000 lines of random text, with some hit and misses, takes 5 ticks with the filter variation, and something like 160 with the for each
but if the hits are put into a string, and after that is put into the result field it is pretty much the same

so seems to me it pretty fast, but yeah, the more lines, the slower it will go