Page 1 of 1
Occurrences
Posted: Tue Jul 07, 2009 7:33 pm
by ivanw
I'm trying to find the max number of occurrences of each keyword in a list of keywords. So far I've been thinking of looping each line and then checking whether that line is contained within each of the other lines in the list.
e.g.
fox
brown fox
quick brown fox
quick brown fox
after calculations would be:
fox 3
brown fox 2
quick brown fox 2
Would this be possible with arrays? Any other suggestions would be greatly appreciated.
Thanks,
Ivan
Posted: Thu Jul 09, 2009 8:49 am
by Klaus
Hi Ivan,
yep, arrays are a good way to solve this!
Like this:
Code: Select all
...
put fld "keywords" into tList
put empty into tArray
repeat for each line i in tList
add 1 to tArray[i]
end repeat
## Build a new list with: Name of string TAB number of occurrences
put keys of tArray into tKeys
repeat fore each line k in tKeys
put k & TAB tArray[k] & CR after list_of_occurrences
end repeat
delete char -1 of list_of_occurrences
### Do what you want with list_of_occurrences
...
Should be pretty fast ("repeat for each" is insanely fast!), even for looooong lists
Best
Klaus
Posted: Thu Jul 09, 2009 11:40 am
by SparkOut
But I'm not sure that's exactly what's wanted is it? It counts the number of times a line matches, but not a substring in each line.
I'm not really clear on the instructions but I thought that the list:
fox
brown fox
quick brown fox
quick brown fox
would contain the line "fox" four times (once in each line). The "brown fox" line appears once on its own and twice more in the subsequent lines. The "quick brown fox" line appears twice. So by my interpretation the results should be 4, 3, 2, rather than 3, 2, 2. So here's an amended version that will do what I thought it should, but I'm not at all certain that it's what is desired.
Code: Select all
put fld "keywords" into tList
put empty into tArray
repeat for each line i in tList
add 0 to tArray[i] --initialise the array with the right keys but don't count the lines yet
end repeat
## Build a new list with: Name of string TAB number of occurrences
put keys of tArray into tKeys
repeat for each line k in tKeys
repeat for each line i in tList
if k is in i then
add 1 to tArray[k]
end if
end repeat
put k & TAB & tArray[k] & CR after list_of_occurrences
end repeat
delete char -1 of list_of_occurrences
### Do what you want with list_of_occurrences
-- or ignore it completely and just use the array keys and value to represent the count of each keyword/phrase
Oh, you also ought to do some whitespace trimming and error checking so that you don't get "duplicate" array keys created because there's a trailing space at the end of one of the lines, for example. And maybe sort the results list or the keys of the array that you'll be using to work with.
Posted: Thu Jul 09, 2009 12:18 pm
by Klaus
Oh yes, after reading this again it looks you are correct, SparkOut.
Sorry Ivan, take SparkOuts solution

Posted: Fri Jul 10, 2009 5:19 am
by ivanw
Many thanks Klaus & SparkOut
Indeed there was a typo in my original example - I'll test this solution and let you know how it goes.