Page 1 of 2
fastest way to do a count
Posted: Fri Dec 09, 2011 8:37 pm
by adventuresofgreg
Hi. I have a very large list of numbers and for each number, i would like to count the number of times it repeats itself in the list.
I am currently using lineoffset(thenumber,thelist), then deleting the line out of the list and repeating until lineoffset is 0.
but this can be quite slow over 10's of thousands of lines
Re: fastest way to do a count
Posted: Fri Dec 09, 2011 9:36 pm
by bn
Hi Greg,
try this on a field that contains your numbers assuming each line is 1 number.
make a second field for the result. It will display each number and a tab and the number of occurences of that number.
Code: Select all
on mouseUp
put field 1 into tData
if tData is "" then exit mouseUp
repeat for each line aLine in tData
add 1 to tArray[aLine]
end repeat
combine tArray by return and tab
set the itemDelimiter to tab
sort tArray by item 2 of each
put tArray into field 2
end mouseUp
only works if each line consists of only one number, could be changed if data format is different but you would have to say so.
I append a tiny stack that creates 20000 lines of random number in the range 1 to 30 and puts it into field 1 and then you can count the number of occurences of each number in field 1
Kind regads
Bernd
Re: fastest way to do a count
Posted: Fri Dec 09, 2011 9:49 pm
by adventuresofgreg
Thanks Bernd: That looks like it would work, but it is a bit more complicated than I specified:
The number is actually a list of comma delimited numbers and there is a second word to each number that needs to be averaged
ie: the list would look like:
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
5,7,3,6,4,5,2,9 -.623
2,9,5,6,5,6,7,4 .543
3,3,9,4,8,1,2,1 -.023
for each line, I need to count the occurances of the first word, and then, for all matching 1st words, I need to calculate an average of the seconds words.
Re: fastest way to do a count
Posted: Fri Dec 09, 2011 9:55 pm
by bn
Hi Greg,
I am not shure I get what you mean.
Could you give an example not only of the data structure but also of the averaging bit. What do mean by first word: the first item = the first number?
Kind regards
bernd
Re: fastest way to do a count
Posted: Fri Dec 09, 2011 10:17 pm
by adventuresofgreg
Hi Bernd:
Here is a sample list. Each line consists of 2 words. word 1 is a comma delimited group of numbers, and the second word is a number
word 1 word 2
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
4,3,5,6,8,1,2,9 -.623
4,3,5,6,8,1,2,9 .543
3,3,9,4,8,1,2,1 -.023
So, for the first line, I want to count the number of times word 1 appears in the entire list. The answer = 3 in this case. And, I want to calculate an average for the second words for all matching 1st words - like for this example: average(.085,-.623,.543)
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 12:04 am
by bn
Hi Greg,
try the stack I attach.
It still uses arrays and does an arithmetic mean = average. (the sum of word 2 divided by number of occurences). In my testing it worked. It does the calculation on all word 1 even if it occurs only once. You could exclude that in the code.
Tell me how it goes and how fast it is.
Kind regards
Bernd
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 1:00 am
by adventuresofgreg
Hi Bernd: Yes - that looks really good. Thanks a ton! I'll run a test on my 100,000 line file and time it. It should be much faster that my script. One problem.. before counting the occurances of word 1, we need to delete word 1 from the list so that it doesn't count itself. I could just subtract 1 from the count, but this line's 2nd word number cannot be included in the average. I'm not sure how to do that aside from deleting the line from the list before activating the count script.
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 1:15 am
by bn
Hi Gregg,
in the example you gave you do count the first occurrence:
word 1 word 2
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
4,3,5,6,8,1,2,9 -.623
4,3,5,6,8,1,2,9 .543
3,3,9,4,8,1,2,1 -.023
So, for the first line, I want to count the number of times word 1 appears in the entire list. The answer = 3 in this case. And, I want to calculate an average for the second words for all matching 1st words - like for this example: average(.085,-.623,.543)
I am a little confused. Do you want to exclude word 2 of every first occurrence of word 1? In your example you did use all 3 word 2 values for the average:
so you actually took into account the first occurrence of 4,3,5,6,8,1,2,9
Kind regards
Bernd
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 1:16 am
by adventuresofgreg
Correct. Sorry - In my example, I forgot to delete it from the list before counting and averaging.
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 1:52 am
by bn
Hi Gregg,
I gave it a try.
Now the count will be 0 if a word 1 only occurred once, 1 if it occurred twice etc.
The averages will be based on word 2 second occurrence to nth occurrence divided by occurrence - 1
If a word 1 only shows up once the average will be word 2 (you could change that)
Please test extensively before using in "production". It has gotten a bit more complicated.
Edit: I cleaned up the attachement and tested it and it seems to work allright.
Kind regards
Bernd
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 5:02 pm
by adventuresofgreg
Thanks Bernd. I'll take a look. I think a slightly less complicated way would be to just include the word 1, and it's average, then subtract it out from the final sum before calculating the average. I'll play around with it. Thanks!
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 5:04 pm
by bn
Hi Gregg,
I just edited my post and uploaded a cleaned up version of the stack.
You may want to have a look.
Kind regards
Bernd
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 5:13 pm
by bn
Hi Gregg,
apparently we were online at the same time, just wanted to point you to the cleaned up version which I recommend. (countOccOfNumbersAndAveragesIIII.livecode.zip)
Since you did not really describe your usecase I had to guess at what you wanted. I think you can easily change the code to suit your needs. If not, just describe what exactly you want to achieve and what you want changed and I see what I can do.
Kind regards
Bernd
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 5:25 pm
by adventuresofgreg
Hi Bernd: I incorporated your new version and ran it - BLINDINGLY fast! I compared the results to my script and they match. Nice work. Thanks again.
Re: fastest way to do a count
Posted: Sat Dec 10, 2011 5:39 pm
by bn
Hi Gregg,
glad the results are the same
would you care to estimate/measure how long your version of the script takes and how long the new version takes on your data.
I know that my version takes around a second for 100,000 lines.
Kind regards
Bernd