Page 2 of 2
Re: [SOLVED] Getting Index of Characters
Posted: Sun Dec 04, 2022 10:31 pm
by Xero
Thanks Stam.
When I revised the code I ended up taking the same approach with less elegant code.
Thanks for the explanation Thierry,
When I revised the solution, I worked out that the offset kept counting from the start of the block of text that I was searching so I had to separate the paragraphs. Now I understand what you were saying!
Thanks all, great learning experience for working with text!
Re: [SOLVED] Getting Index of Characters
Posted: Wed Dec 07, 2022 3:38 am
by rkriesel
Xero wrote: Sun Dec 04, 2022 10:31 pm
Thanks all, great learning experience for working with text!
There's more learning nearby: reconsider the algorithm. It scans the whole text for each given letter. For a thousand input lines of 100 characters each, and for 26 letters, it reads at least 2.5 million characters. An alternative approach reads only enough lines to find 3 of each letter, which is likely far under a thousand characters.
Try it yourself, or see this:
Code: Select all
function samples pText, pChars, pSampleCountLimit
local tParagraphIndex, tCharIndex, tSamples
repeat for each paragraph tParagraph in pText
add 1 to tParagraphIndex
put 0 into tCharIndex
repeat for each char tChar in tParagraph
add 1 to tCharIndex
if tChar is in pChars then
if pSampleCountLimit is empty or number of lines in tSamples[ tChar ] < pSampleCountLimit then
put tParagraphIndex, tCharIndex & cr after tSamples[ tChar ]
end if
end if
end repeat
filter keys of tSamples where number of lines in tSamples[ each ] < pSampleCountLimit into it
if it is empty then
exit repeat
end if
end repeat
return tSamples
end samples
If you post your version, I'll post a performance comparison for input line counts 10, 100, 1000, and so forth. How many lines do you expect? How long are they?
-- Dick
Re: [SOLVED] Getting Index of Characters
Posted: Wed Dec 07, 2022 6:27 am
by Xero
Thanks Dick,
Part of the original idea was to limit the number of searches as per your code. I have a nominated number in my stack so you can change the number of returned responses, but am just working through how to integrate it into my code.
I can imagine this code being used on large texts (entire books possibly), so limiting it was definitely a requirement to make sure time didn't blow out.
X