Page 1 of 1
Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 7:18 am
by Googie85
Hiii Guys!!
I am trying to find a string within a variable and determine the lines that the matching string occurs. I will try and explain myself clearly. View the following code to get an example:
Code: Select all
put "StringX" into line 1 in TempVar
put "StringY" into line 2 in TempVar
put "StringX" into line 3 in TempVar
put "StringY" into line 4 in TempVar
put "StringX" into line 5 in TempVar
put "StringY" into line 6 in TempVar
put "StringX" into line 7 in TempVar
I use:
Code: Select all
put lineoffset("StringX",TempVar) into AnotherVar
answer AnotherVar
When I use the above line, it would say "4". What would be the best way to be able to determine the exact lines of the matching String in "TempVar"? For example:
Line 1
line 3
Line 5
line 7
I hope I have described this post enough for people to understand!
Many Thanks,
Googie.
Re: Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 9:05 am
by Klaus
Hi Googie,
not sure what you want exactly.
Get all the lines in one script, with a repeat loop?
Or do you additionally want to have the string "Line" in TempVar, too?
Best
Klaus
Re: Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 12:15 pm
by SparkOut
I think Googie is looking for a function to return all the line numbers at once
Pseudcode
Code: Select all
put 0 into tSkip // start at the beginning
put 1 into tLine // force at least one check through the loop
repeat while tLine is not zero
put lineOffset ("StringX",TempVar,tSkip) into tLine
put tLine & cr after AnotherVar
put tLine into tSkip
end repeat
Answer AnotherVar
Needs tidying, not tested, on a phone to view forum
Re: Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 1:54 pm
by dunbarx
Try this:
Code: Select all
on mouseup
put "StringX" into line 1 in TempVar
put "StringY" into line 2 in TempVar
put "StringX" into line 3 in TempVar
put "StringY" into line 4 in TempVar
put "StringX" into line 5 in TempVar
put "StringY" into line 6 in TempVar
put "StringX" into line 7 in TempVar
repeat with y = 1 to the number of lines of tempVar
if "StringX" is in line y of tempVar then put y & comma after accum
end repeat
answer accum
end mouseup
There are lots of ways to do this, like using the "lineOffset function.
Craig
Re: Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 3:26 pm
by stam
I prefer not using loops, so my preferred way would be to filter arrays.
Code: Select all
function searchAllLines pSource, pFilter
split pSource by return
filter elements of pSource with "*" & pFilter & "*"
return the keys of pSource
end searchAllLines
So for example for a text in a variable 'sourceText':
Line 1
Line 2
Lne 3 // should not be found
Line 4
If we want to search for "Line", the following code:
Code: Select all
get searchAllLines(sourceText, "line")
will return in the 'it' variable:
1
2
4
Re: Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 4:51 pm
by dunbarx
Stam.
I ran a timing test between this:
Code: Select all
on mouseup
put "StringX" into line 1 in TempVar
put "StringY" into line 2 in TempVar
put "StringX" into line 3 in TempVar
put "StringY" into line 4 in TempVar
put "StringX" into line 5 in TempVar
put "StringY" into line 6 in TempVar
put "StringX" into line 7 in TempVar
repeat 1000000 -- create a 7 million line variable
put TempVar & return after temp
end repeat
put the ticks into ff
repeat with y = 1 to the number of lines of temp
if "StringX" is in line y of tempVar then put y & comma after accum
end repeat
answer the ticks - ff -- 61 ticks
end mouseup
and this:
Code: Select all
on mouseup
put "StringX" into line 1 in TempVar
put "StringY" into line 2 in TempVar
put "StringX" into line 3 in TempVar
put "StringY" into line 4 in TempVar
put "StringX" into line 5 in TempVar
put "StringY" into line 6 in TempVar
put "StringX" into line 7 in TempVar
repeat 1000000 -- create a 7 million line variable
put TempVar & return after temp
end repeat
put the ticks into ff
get searchAllLines(temp, "StringX")
answer the ticks - ff -- 738 ticks
end mouseup
function searchAllLines pSource, pFilter
split pSource by return
filter elements of pSource with "*" & pFilter & "*"
return the keys of pSource
end searchAllLines
61 ticks vs. 738 ticks. Did I compare the two methods, a loop and a regex-based "filter" method fairly? If so, the loop is 12 times faster. I would have thought the opposite, and to boot, the loop is a "repeat with..." type.
Craig
Re: Determining Matching Words In A String.
Posted: Wed Apr 16, 2025 8:19 pm
by dunbarx
Stam.
Also, I got a colored spinning beachball in the "Filter" version, though it was erratic, which usually means it is not fatal, just LC working hard.
I thought seeing that object indicated that a handler was running hard, usually a loop of some sort. I did not expect it to appear during the "lower level" execution of a single line of "upper level" code.
I may not know what I am talking about.
Craig
Re: Determining Matching Words In A String.
Posted: Fri Apr 18, 2025 10:29 pm
by stam
Craig.
dunbarx wrote: ↑Wed Apr 16, 2025 4:51 pm
61 ticks vs. 738 ticks. Did I compare the two methods, a loop and a regex-based "filter" method fairly? If so, the loop is 12 times faster. I would have thought the opposite, and to boot, the loop is a "repeat with..." type.
I'm afraid you've made an error in your script
your code says:
if "StringX" is in line y of tempVar then put y & comma after accum
It should be:
if "StringX" is in line y of temp then put y & comma after accum
You code starts querying
tempVar which only has 7 lines - all other instructions are ignored past that point. Since there is no line 8, 9, 10, etc in tempVar it just skips swiftly to the next iteration which is the reason it takes a whopping 61 ticks for the 7 lines.
In my testing, the Filter method (which is
not regex btw) takes 11 seconds to search the 7 million lines. Yes you get a beach ball for a few seconds (I presume this is the conversion of string to array), but it is
vastly quicker than the loop.
After correcting your loop, I interrupted the loop method after a few minutes as I got bored... the "y" counter variable had only reached 449,249.
I later tried again and let it run while I was watching stand up comedy on Netflix and after about 30 mins the counter had only reached about 1.5 million of the 7 million lines.
I'm afraid as yet I've not had the patience to wait for your method to finish searching the 7 million lines, so I'll stick with arrays and filter personally
Stam
Re: Determining Matching Words In A String.
Posted: Sat Apr 19, 2025 12:22 am
by stam
A test stack is attached.
The 7 million line string to search is created as global variable on clicking the only enabled button at startup.
The Filter button uses the method I proposed - it takes around 11 seconds to check on my system.
The Loop button is uses the method Craig proposes - I've not had the patience to wait for it to finish - after 30 mins it had searched around 20% of the 7 million line string.
Or maybe I too have made a mistake - please do post a correction if needed.
Re: Determining Matching Words In A String.
Posted: Sat Apr 19, 2025 10:27 am
by strongbow
I modified your stack a little Stam to improve the speed a little using "repeat for each" with an index for the loop, and also break down the filter timing to get some info on how long split and filter take.
Got the results below:
loop found: 4000000 in 5419 milliseconds
split took: 10471 millisecs
filter and split took: 25337 millisecs
filter found: 4000000 in 31501 milliseconds
On a 2019 Macbook Pro.
HTH.
Re: Determining Matching Words In A String.
Posted: Sat Apr 19, 2025 7:53 pm
by dunbarx
Stam.
I'm afraid you've made an error in your script
Yep

Sloppy. Anyway, I told you I assumed it should not have been so.
Craig
Re: Determining Matching Words In A String.
Posted: Sat Apr 19, 2025 10:12 pm
by stam
strongbow wrote: ↑Sat Apr 19, 2025 10:27 am
I modified your stack a little Stam to improve the speed a little
Thanks Strongbow,
in truth I wasn't setting out to prove the fastest method, I had just responded because Craig's comparison seemed weird.
I knew
repeat for each type loop was much faster than the
repeat with loop, having the latter faster than array methods didn't sound right.
Having said that, I'm extremely impressed by the performance of the repeat for each loop - it is consistently faster although there is a palpable difference only in the millions of lines.
To make a fairer comparison, I made both into functions:
Code: Select all
function searchAllLines pSource, pFilter
split pSource by return
filter elements of pSource with "*" & pFilter & "*"
return the keys of pSource
end searchAllLines
Code: Select all
function loopAllLInes pSource, pFilter
local tFound, y
repeat for each line tLine in pSource
add 1 to y
if pFilter is in tLine then put y & return after tFound
end repeat
return tFound
end loopAllLInes
At 7 million lines, the
repeat for each loop takes 3 seconds and the
filter (as previously) 11 seconds on average.
At an order of magnitude less, the difference is much smaller but the repeat for loop version is consistently faster... if only by a 0.5-3 seconds depending on size of text to search.
Thanks
Stam