Page 1 of 1

Repeat and count ... but fastest! (='.'=)

Posted: Thu May 04, 2017 6:43 pm
by Mariasole
Hello to all!!!
I'm doing a little experiment. :shock:

I have a list of words and phrases:

hello
hello
hello
ciao
miao
miao
miao
miao
uurgh!
uurg!
song
song and sing
song and sing




I would like to count the duplicates of this list and put them in order in another list.


4|miao
3|hello
2|song and sing
1|ciao
1|uurgh!


Simple, I said! But in fact if the number of elements of the list to be analyzed increases, the job becomes very slow. :o

Is there an alternative to mine algo? :roll:

Thanks for those who want to give me some help!

Here is my proud code!:

Code: Select all

on mouseUp
   
   // a. initialize...
   put empty into LineMatchCount
   put empty into field "RepeatList"
   
   // b, load field into variable in memory for fastest ;)
   put field "OriginalList" into tOriginalList
   
   // c. Repeat for each line
   repeat for each line tLine in tOriginalList
      
      -- copy tLine into tMatch
      put tLine into tMatch
      
      -- repeat in list for count duplicate line
      repeat for each line yLine in tOriginalList
         
         -- if match then add 1 to LineMatchCount
         if yLine is tMatch then
            add 1 to LineMatchCount
         end if
       
      end repeat
      
     -- put the number of repeat and the key line & return
      put LineMatchCount & "|" & tLine after field "RepeatList"
      put return  after field "RepeatList" 
      
      -- initialize variable
      put empty into LineMatchCount
      
   end repeat
   
   // d. Dedupe RepeatList
   -- I still have to work!
   
   
   //e. sort list RepeatList
   -- I still have to work!
   
   
   
   
   
   
   
end mouseUp





Peace and love at all!

(='.'=)
Mariasole

Re: Repeat and count ... but fastest! (='.'=)

Posted: Thu May 04, 2017 7:52 pm
by SparkOut
Cara Maria

I am not certain whether it is the fastest way but my preferred duplicate count/stripping method is like this:

Code: Select all

on mouseUp
   put field "OriginalList" into tOriginalList
   repeat for each line tLine in tOriginalList
      add 1 to tMatchChecker[(tLine)]
   end repeat
   combine tMatchChecker using return and comma
   sort tMatchChecker numeric descending by item 2 of each
   put tMatchChecker into field "RepeatList"
end mouseUp
It would need a little tweak to make the output match your format, but should be fairly efficient

Re: Repeat and count ... but fastest! (='.'=)

Posted: Thu May 04, 2017 7:59 pm
by FourthWorld
This doesn't produce the exact same result, but should be much faster:

Code: Select all

on mouseUp
   put fld "OriginalList" into tList
   repeat for each line tLine in tList
      add 1 to tCountsA[tLine]
   end repeat
   combine tCountsA with cr and "|"
   put tCountsA into fld "RepeatList"
end mouseUp
Perhaps the output format would be useful for what you're doing? Maybe more useful by reducing duplicates? If nothing else, hopefully of at least some value by introducing arrays as a way of managing these sorts of things more efficiently than with delimited strings.

Re: Repeat and count ... but fastest! (='.'=)

Posted: Sat May 06, 2017 1:32 am
by jiml
Mariasole,

If you want form of the output to match your original post:
4|miao
3|hello
2|song and sing
1|ciao
1|uurgh!
then you can tweak SparkOut and Richard's code to this:

Code: Select all

on mouseUp
   put field "OriginalList" into tOriginalList
   repeat for each line tLine in tOriginalList
      add 1 to tMatchChecker[(tLine)]
   end repeat
   combine tMatchChecker by cr and tab
   split tMatchChecker by column
   put tMatchChecker[2] into tMatchCheckerOut[1]
   put tMatchChecker[1] into tMatchCheckerOut[2]
   set columndelimiter to "|"
   combine tMatchCheckerOut using column
   sort tMatchCheckerOut numeric descending by word 1 of each
   put tMatchCheckerOut into field "RepeatList"
end mouseUp

So, with your original input:
hello
hello
hello
ciao
miao
miao
miao
miao
uurgh!
uurg!
song
song and sing
song and sing
That script will produce this output:
4|miao
3|hello
2|song and sing
1|ciao
1|song
1|uurg!
1|uurgh!
NOTE: there is a "combine" bug introduced in LC 9.0 DP-2 which won't be fixed until LC 9.0 DP-7.
So that code currently works in LC versions below 9.0 DP-2

BUG http://quality.livecode.com/show_bug.cgi?id=19411

Jim Lambert

Re: Repeat and count ... but fastest! (='.'=)

Posted: Sat May 06, 2017 1:35 am
by jiml
actually this line is shorter and all that is necessary for the sort:

Code: Select all

 sort tMatchCheckerOut numeric descending
JimL

Re: Repeat and count ... but fastest! (='.'=)

Posted: Sat May 06, 2017 1:39 am
by FourthWorld
column split - nice work, Jim!

Re: Repeat and count ... but fastest! (='.'=)

Posted: Mon May 08, 2017 1:44 pm
by Mariasole
Thank you SparkOut, Richard and jim!
Thank you very much for everyone!
Now I will test the different solutions and above all I will study them! :D
And of course, if I find something interesting I'll let you know! 8)

Mariasole
Peace and love!

(='.'=)
Mariasole

Re: Repeat and count ... but fastest! (='.'=)

Posted: Mon May 08, 2017 11:20 pm
by [-hh]
Nothing new, only a simpler 'reversed combine':

The only problem here is that combine has the wrong order, it places the count into the second item. And it is also buggy. So you simply have to do your own.

Code: Select all

on mouseUp
   put field "OriginalList" into tOriginalList
   repeat for each line tLine in tOriginalList
      add 1 to tMatchChecker[tLine] -- contains the count
   end repeat
   set the itemdelimiter to "|"
   ## start: 'own combine' with the order of items reversed
   repeat for each key tLine in tMatchChecker
      put cr & tMatchChecker[tLine] & "|" & tLine after tRepeatList
   end repeat
   delete char 1 of tRepeatList -- the first cr
   ## end: 'own combine'
   sort tRepeatList by item 2 of each -- secondary sort, don't forget this
   sort tRepeatList numeric descending by item 1 of each -- primary sort
   put tRepeatList into field "RepeatList"
end mouseUp