Determining Matching Words In A String.

The place to discuss anything and everything about running your LiveCode on Android

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Googie85
Posts: 226
Joined: Tue Aug 05, 2014 10:07 am

Determining Matching Words In A String.

Post by Googie85 » Wed Apr 16, 2025 7:18 am

Hiii Guys!!

I am trying to find a string within a variable and determine the lines that the matching string occurs. I will try and explain myself clearly. View the following code to get an example:

Code: Select all

put "StringX" into line 1 in TempVar
put "StringY" into line 2 in TempVar
put "StringX" into line 3 in TempVar
put "StringY" into line 4 in TempVar
put "StringX" into line 5 in TempVar
put "StringY" into line 6 in TempVar
put "StringX" into line 7 in TempVar
I use:

Code: Select all

put lineoffset("StringX",TempVar) into AnotherVar
answer AnotherVar
When I use the above line, it would say "4". What would be the best way to be able to determine the exact lines of the matching String in "TempVar"? For example:

Line 1
line 3
Line 5
line 7

I hope I have described this post enough for people to understand!

Many Thanks,

Googie.

Klaus
Posts: 14177
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Determining Matching Words In A String.

Post by Klaus » Wed Apr 16, 2025 9:05 am

Hi Googie,

not sure what you want exactly.
Get all the lines in one script, with a repeat loop?
Or do you additionally want to have the string "Line" in TempVar, too?

Best

Klaus

SparkOut
Posts: 2943
Joined: Sun Sep 23, 2007 4:58 pm

Re: Determining Matching Words In A String.

Post by SparkOut » Wed Apr 16, 2025 12:15 pm

I think Googie is looking for a function to return all the line numbers at once
Pseudcode

Code: Select all

put 0 into tSkip // start at the beginning 
put 1 into tLine // force at least one check through the loop
repeat while tLine is not zero
   put lineOffset ("StringX",TempVar,tSkip) into tLine 
   put tLine & cr after AnotherVar
   put tLine into tSkip
end repeat
Answer AnotherVar
Needs tidying, not tested, on a phone to view forum

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10305
Joined: Wed May 06, 2009 2:28 pm

Re: Determining Matching Words In A String.

Post by dunbarx » Wed Apr 16, 2025 1:54 pm

Try this:

Code: Select all

on mouseup
   put "StringX" into line 1 in TempVar
   put "StringY" into line 2 in TempVar
   put "StringX" into line 3 in TempVar
   put "StringY" into line 4 in TempVar
   put "StringX" into line 5 in TempVar
   put "StringY" into line 6 in TempVar
   put "StringX" into line 7 in TempVar
   
   repeat with y = 1 to the number of lines of tempVar
      if "StringX" is in line y of tempVar then put y & comma after accum
   end repeat
   answer accum
end mouseup
There are lots of ways to do this, like using the "lineOffset function.

Craig

stam
Posts: 3061
Joined: Sun Jun 04, 2006 9:39 pm

Re: Determining Matching Words In A String.

Post by stam » Wed Apr 16, 2025 3:26 pm

I prefer not using loops, so my preferred way would be to filter arrays.

Code: Select all

function searchAllLines pSource, pFilter
    split pSource by return
    filter elements of pSource with "*" & pFilter & "*"
    return the keys of pSource
end searchAllLines


So for example for a text in a variable 'sourceText':
Line 1
Line 2
Lne 3 // should not be found
Line 4
If we want to search for "Line", the following code:

Code: Select all

get searchAllLines(sourceText, "line") 
will return in the 'it' variable:
1
2
4

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10305
Joined: Wed May 06, 2009 2:28 pm

Re: Determining Matching Words In A String.

Post by dunbarx » Wed Apr 16, 2025 4:51 pm

Stam.

I ran a timing test between this:

Code: Select all

on mouseup
   put "StringX" into line 1 in TempVar
   put "StringY" into line 2 in TempVar
   put "StringX" into line 3 in TempVar
   put "StringY" into line 4 in TempVar
   put "StringX" into line 5 in TempVar
   put "StringY" into line 6 in TempVar
   put "StringX" into line 7 in TempVar
   
   repeat 1000000  -- create a 7 million line variable
      put TempVar & return after temp
   end repeat
   
   put the ticks into ff
   repeat with y = 1 to the number of lines of temp
      if "StringX" is in line y of tempVar then put y & comma after accum
   end repeat
   
   answer the ticks - ff  -- 61 ticks
end mouseup
and this:

Code: Select all

on mouseup
    put "StringX" into line 1 in TempVar
   put "StringY" into line 2 in TempVar
   put "StringX" into line 3 in TempVar
   put "StringY" into line 4 in TempVar
   put "StringX" into line 5 in TempVar
   put "StringY" into line 6 in TempVar
   put "StringX" into line 7 in TempVar
   
   repeat 1000000  -- create a 7 million line variable
      put TempVar & return after temp
   end repeat
   
    put the ticks into ff
get searchAllLines(temp, "StringX") 
   
   answer the ticks - ff -- 738 ticks
end mouseup

function searchAllLines pSource, pFilter
    split pSource by return
    filter elements of pSource with "*" & pFilter & "*"
    return the keys of pSource
end searchAllLines
61 ticks vs. 738 ticks. Did I compare the two methods, a loop and a regex-based "filter" method fairly? If so, the loop is 12 times faster. I would have thought the opposite, and to boot, the loop is a "repeat with..." type.

Craig
Last edited by dunbarx on Thu Apr 17, 2025 4:27 pm, edited 1 time in total.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10305
Joined: Wed May 06, 2009 2:28 pm

Re: Determining Matching Words In A String.

Post by dunbarx » Wed Apr 16, 2025 8:19 pm

Stam.

Also, I got a colored spinning beachball in the "Filter" version, though it was erratic, which usually means it is not fatal, just LC working hard.

I thought seeing that object indicated that a handler was running hard, usually a loop of some sort. I did not expect it to appear during the "lower level" execution of a single line of "upper level" code.

I may not know what I am talking about.

Craig

stam
Posts: 3061
Joined: Sun Jun 04, 2006 9:39 pm

Re: Determining Matching Words In A String.

Post by stam » Fri Apr 18, 2025 10:29 pm

Craig.
dunbarx wrote:
Wed Apr 16, 2025 4:51 pm
61 ticks vs. 738 ticks. Did I compare the two methods, a loop and a regex-based "filter" method fairly? If so, the loop is 12 times faster. I would have thought the opposite, and to boot, the loop is a "repeat with..." type.
I'm afraid you've made an error in your script

your code says:
if "StringX" is in line y of tempVar then put y & comma after accum
It should be:
if "StringX" is in line y of temp then put y & comma after accum
You code starts querying tempVar which only has 7 lines - all other instructions are ignored past that point. Since there is no line 8, 9, 10, etc in tempVar it just skips swiftly to the next iteration which is the reason it takes a whopping 61 ticks for the 7 lines.

In my testing, the Filter method (which is not regex btw) takes 11 seconds to search the 7 million lines. Yes you get a beach ball for a few seconds (I presume this is the conversion of string to array), but it is vastly quicker than the loop.

After correcting your loop, I interrupted the loop method after a few minutes as I got bored... the "y" counter variable had only reached 449,249.
I later tried again and let it run while I was watching stand up comedy on Netflix and after about 30 mins the counter had only reached about 1.5 million of the 7 million lines.

I'm afraid as yet I've not had the patience to wait for your method to finish searching the 7 million lines, so I'll stick with arrays and filter personally ;)

Stam

stam
Posts: 3061
Joined: Sun Jun 04, 2006 9:39 pm

Re: Determining Matching Words In A String.

Post by stam » Sat Apr 19, 2025 12:22 am

A test stack is attached.

The 7 million line string to search is created as global variable on clicking the only enabled button at startup.

The Filter button uses the method I proposed - it takes around 11 seconds to check on my system.
The Loop button is uses the method Craig proposes - I've not had the patience to wait for it to finish - after 30 mins it had searched around 20% of the 7 million line string.

Or maybe I too have made a mistake - please do post a correction if needed.
Attachments
loop vs filter array.livecode.zip
(1.46 KiB) Downloaded 207 times

strongbow
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 146
Joined: Mon Jul 31, 2006 1:39 am
Contact:

Re: Determining Matching Words In A String.

Post by strongbow » Sat Apr 19, 2025 10:27 am

I modified your stack a little Stam to improve the speed a little using "repeat for each" with an index for the loop, and also break down the filter timing to get some info on how long split and filter take.

Got the results below:

loop found: 4000000 in 5419 milliseconds

split took: 10471 millisecs
filter and split took: 25337 millisecs
filter found: 4000000 in 31501 milliseconds

On a 2019 Macbook Pro.

HTH.
Attachments
loop vs filter array 2.livecode.zip
(1.6 KiB) Downloaded 196 times

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10305
Joined: Wed May 06, 2009 2:28 pm

Re: Determining Matching Words In A String.

Post by dunbarx » Sat Apr 19, 2025 7:53 pm

Stam.
I'm afraid you've made an error in your script
Yep :oops: Sloppy. Anyway, I told you I assumed it should not have been so.

Craig

stam
Posts: 3061
Joined: Sun Jun 04, 2006 9:39 pm

Re: Determining Matching Words In A String.

Post by stam » Sat Apr 19, 2025 10:12 pm

strongbow wrote:
Sat Apr 19, 2025 10:27 am
I modified your stack a little Stam to improve the speed a little
Thanks Strongbow,
in truth I wasn't setting out to prove the fastest method, I had just responded because Craig's comparison seemed weird.

I knew repeat for each type loop was much faster than the repeat with loop, having the latter faster than array methods didn't sound right.

Having said that, I'm extremely impressed by the performance of the repeat for each loop - it is consistently faster although there is a palpable difference only in the millions of lines.

To make a fairer comparison, I made both into functions:

Code: Select all

function searchAllLines pSource, pFilter
   split pSource by return   
   filter elements of pSource with "*" & pFilter & "*"   
   return the keys of pSource
end searchAllLines

Code: Select all

function loopAllLInes pSource, pFilter
    local tFound, y
    repeat for each line tLine in pSource
        add 1 to y
        if pFilter is in tLine then put y & return after tFound
    end repeat
    return tFound
end loopAllLInes
At 7 million lines, the repeat for each loop takes 3 seconds and the filter (as previously) 11 seconds on average.
At an order of magnitude less, the difference is much smaller but the repeat for loop version is consistently faster... if only by a 0.5-3 seconds depending on size of text to search.

Thanks
Stam

Post Reply