fastest way to do a count
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
fastest way to do a count
Hi. I have a very large list of numbers and for each number, i would like to count the number of times it repeats itself in the list.
I am currently using lineoffset(thenumber,thelist), then deleting the line out of the list and repeating until lineoffset is 0.
but this can be quite slow over 10's of thousands of lines
I am currently using lineoffset(thenumber,thelist), then deleting the line out of the list and repeating until lineoffset is 0.
but this can be quite slow over 10's of thousands of lines
Re: fastest way to do a count
Hi Greg,
try this on a field that contains your numbers assuming each line is 1 number.
make a second field for the result. It will display each number and a tab and the number of occurences of that number.
only works if each line consists of only one number, could be changed if data format is different but you would have to say so.
I append a tiny stack that creates 20000 lines of random number in the range 1 to 30 and puts it into field 1 and then you can count the number of occurences of each number in field 1
Kind regads
Bernd
try this on a field that contains your numbers assuming each line is 1 number.
make a second field for the result. It will display each number and a tab and the number of occurences of that number.
Code: Select all
on mouseUp
put field 1 into tData
if tData is "" then exit mouseUp
repeat for each line aLine in tData
add 1 to tArray[aLine]
end repeat
combine tArray by return and tab
set the itemDelimiter to tab
sort tArray by item 2 of each
put tArray into field 2
end mouseUp
I append a tiny stack that creates 20000 lines of random number in the range 1 to 30 and puts it into field 1 and then you can count the number of occurences of each number in field 1
Kind regads
Bernd
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
Re: fastest way to do a count
Thanks Bernd: That looks like it would work, but it is a bit more complicated than I specified:
The number is actually a list of comma delimited numbers and there is a second word to each number that needs to be averaged
ie: the list would look like:
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
5,7,3,6,4,5,2,9 -.623
2,9,5,6,5,6,7,4 .543
3,3,9,4,8,1,2,1 -.023
for each line, I need to count the occurances of the first word, and then, for all matching 1st words, I need to calculate an average of the seconds words.
The number is actually a list of comma delimited numbers and there is a second word to each number that needs to be averaged
ie: the list would look like:
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
5,7,3,6,4,5,2,9 -.623
2,9,5,6,5,6,7,4 .543
3,3,9,4,8,1,2,1 -.023
for each line, I need to count the occurances of the first word, and then, for all matching 1st words, I need to calculate an average of the seconds words.
Re: fastest way to do a count
Hi Greg,
I am not shure I get what you mean.
Could you give an example not only of the data structure but also of the averaging bit. What do mean by first word: the first item = the first number?
Kind regards
bernd
I am not shure I get what you mean.
Could you give an example not only of the data structure but also of the averaging bit. What do mean by first word: the first item = the first number?
Kind regards
bernd
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
Re: fastest way to do a count
Hi Bernd:
Here is a sample list. Each line consists of 2 words. word 1 is a comma delimited group of numbers, and the second word is a number
word 1 word 2
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
4,3,5,6,8,1,2,9 -.623
4,3,5,6,8,1,2,9 .543
3,3,9,4,8,1,2,1 -.023
So, for the first line, I want to count the number of times word 1 appears in the entire list. The answer = 3 in this case. And, I want to calculate an average for the second words for all matching 1st words - like for this example: average(.085,-.623,.543)
Here is a sample list. Each line consists of 2 words. word 1 is a comma delimited group of numbers, and the second word is a number
word 1 word 2
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
4,3,5,6,8,1,2,9 -.623
4,3,5,6,8,1,2,9 .543
3,3,9,4,8,1,2,1 -.023
So, for the first line, I want to count the number of times word 1 appears in the entire list. The answer = 3 in this case. And, I want to calculate an average for the second words for all matching 1st words - like for this example: average(.085,-.623,.543)
Re: fastest way to do a count
Hi Greg,
try the stack I attach.
It still uses arrays and does an arithmetic mean = average. (the sum of word 2 divided by number of occurences). In my testing it worked. It does the calculation on all word 1 even if it occurs only once. You could exclude that in the code.
Tell me how it goes and how fast it is.
Kind regards
Bernd
try the stack I attach.
It still uses arrays and does an arithmetic mean = average. (the sum of word 2 divided by number of occurences). In my testing it worked. It does the calculation on all word 1 even if it occurs only once. You could exclude that in the code.
Tell me how it goes and how fast it is.
Kind regards
Bernd
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
Re: fastest way to do a count
Hi Bernd: Yes - that looks really good. Thanks a ton! I'll run a test on my 100,000 line file and time it. It should be much faster that my script. One problem.. before counting the occurances of word 1, we need to delete word 1 from the list so that it doesn't count itself. I could just subtract 1 from the count, but this line's 2nd word number cannot be included in the average. I'm not sure how to do that aside from deleting the line from the list before activating the count script.
Re: fastest way to do a count
Hi Gregg,
in the example you gave you do count the first occurrence: so you actually took into account the first occurrence of 4,3,5,6,8,1,2,9
Kind regards
Bernd
in the example you gave you do count the first occurrence:
I am a little confused. Do you want to exclude word 2 of every first occurrence of word 1? In your example you did use all 3 word 2 values for the average:word 1 word 2
4,3,5,6,8,1,2,9 .085
3,1,4,6,1,1,8,9 .07
4,3,5,6,8,1,2,9 -.623
4,3,5,6,8,1,2,9 .543
3,3,9,4,8,1,2,1 -.023
So, for the first line, I want to count the number of times word 1 appears in the entire list. The answer = 3 in this case. And, I want to calculate an average for the second words for all matching 1st words - like for this example: average(.085,-.623,.543)
Code: Select all
average(.085,-.623,.543)
Kind regards
Bernd
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
Re: fastest way to do a count
Correct. Sorry - In my example, I forgot to delete it from the list before counting and averaging.
Re: fastest way to do a count
Hi Gregg,
I gave it a try.
Now the count will be 0 if a word 1 only occurred once, 1 if it occurred twice etc.
The averages will be based on word 2 second occurrence to nth occurrence divided by occurrence - 1
If a word 1 only shows up once the average will be word 2 (you could change that)
Please test extensively before using in "production". It has gotten a bit more complicated.
Edit: I cleaned up the attachement and tested it and it seems to work allright.
Kind regards
Bernd
I gave it a try.
Now the count will be 0 if a word 1 only occurred once, 1 if it occurred twice etc.
The averages will be based on word 2 second occurrence to nth occurrence divided by occurrence - 1
If a word 1 only shows up once the average will be word 2 (you could change that)
Please test extensively before using in "production". It has gotten a bit more complicated.
Edit: I cleaned up the attachement and tested it and it seems to work allright.
Kind regards
Bernd
Last edited by bn on Sat Dec 10, 2011 5:03 pm, edited 1 time in total.
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
Re: fastest way to do a count
Thanks Bernd. I'll take a look. I think a slightly less complicated way would be to just include the word 1, and it's average, then subtract it out from the final sum before calculating the average. I'll play around with it. Thanks!
Re: fastest way to do a count
Hi Gregg,
I just edited my post and uploaded a cleaned up version of the stack.
You may want to have a look.
Kind regards
Bernd
I just edited my post and uploaded a cleaned up version of the stack.
You may want to have a look.
Kind regards
Bernd
Re: fastest way to do a count
Hi Gregg,
apparently we were online at the same time, just wanted to point you to the cleaned up version which I recommend. (countOccOfNumbersAndAveragesIIII.livecode.zip)
Since you did not really describe your usecase I had to guess at what you wanted. I think you can easily change the code to suit your needs. If not, just describe what exactly you want to achieve and what you want changed and I see what I can do.
Kind regards
Bernd
apparently we were online at the same time, just wanted to point you to the cleaned up version which I recommend. (countOccOfNumbersAndAveragesIIII.livecode.zip)
Since you did not really describe your usecase I had to guess at what you wanted. I think you can easily change the code to suit your needs. If not, just describe what exactly you want to achieve and what you want changed and I see what I can do.
Kind regards
Bernd
-
- Posts: 349
- Joined: Tue Oct 28, 2008 1:23 am
- Contact:
Re: fastest way to do a count
Hi Bernd: I incorporated your new version and ran it - BLINDINGLY fast! I compared the results to my script and they match. Nice work. Thanks again.
Re: fastest way to do a count
Hi Gregg,
glad the results are the same
would you care to estimate/measure how long your version of the script takes and how long the new version takes on your data.
I know that my version takes around a second for 100,000 lines.
Kind regards
Bernd
glad the results are the same

would you care to estimate/measure how long your version of the script takes and how long the new version takes on your data.
I know that my version takes around a second for 100,000 lines.
Kind regards
Bernd