I'm trying to find the max number of occurrences of each keyword in a list of keywords. So far I've been thinking of looping each line and then checking whether that line is contained within each of the other lines in the list.
e.g.
fox
brown fox
quick brown fox
quick brown fox
after calculations would be:
fox 3
brown fox 2
quick brown fox 2
Would this be possible with arrays? Any other suggestions would be greatly appreciated.
Thanks,
Ivan
Occurrences
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
Hi Ivan,
yep, arrays are a good way to solve this!
Like this:
Should be pretty fast ("repeat for each" is insanely fast!), even for looooong lists 
Best
Klaus
yep, arrays are a good way to solve this!
Like this:
Code: Select all
...
put fld "keywords" into tList
put empty into tArray
repeat for each line i in tList
add 1 to tArray[i]
end repeat
## Build a new list with: Name of string TAB number of occurrences
put keys of tArray into tKeys
repeat fore each line k in tKeys
put k & TAB tArray[k] & CR after list_of_occurrences
end repeat
delete char -1 of list_of_occurrences
### Do what you want with list_of_occurrences
...

Best
Klaus
But I'm not sure that's exactly what's wanted is it? It counts the number of times a line matches, but not a substring in each line.
I'm not really clear on the instructions but I thought that the list:
fox
brown fox
quick brown fox
quick brown fox
would contain the line "fox" four times (once in each line). The "brown fox" line appears once on its own and twice more in the subsequent lines. The "quick brown fox" line appears twice. So by my interpretation the results should be 4, 3, 2, rather than 3, 2, 2. So here's an amended version that will do what I thought it should, but I'm not at all certain that it's what is desired.Oh, you also ought to do some whitespace trimming and error checking so that you don't get "duplicate" array keys created because there's a trailing space at the end of one of the lines, for example. And maybe sort the results list or the keys of the array that you'll be using to work with.
I'm not really clear on the instructions but I thought that the list:
fox
brown fox
quick brown fox
quick brown fox
would contain the line "fox" four times (once in each line). The "brown fox" line appears once on its own and twice more in the subsequent lines. The "quick brown fox" line appears twice. So by my interpretation the results should be 4, 3, 2, rather than 3, 2, 2. So here's an amended version that will do what I thought it should, but I'm not at all certain that it's what is desired.
Code: Select all
put fld "keywords" into tList
put empty into tArray
repeat for each line i in tList
add 0 to tArray[i] --initialise the array with the right keys but don't count the lines yet
end repeat
## Build a new list with: Name of string TAB number of occurrences
put keys of tArray into tKeys
repeat for each line k in tKeys
repeat for each line i in tList
if k is in i then
add 1 to tArray[k]
end if
end repeat
put k & TAB & tArray[k] & CR after list_of_occurrences
end repeat
delete char -1 of list_of_occurrences
### Do what you want with list_of_occurrences
-- or ignore it completely and just use the array keys and value to represent the count of each keyword/phrase