Need for Speed

zink137 · Post by **zink137** » Sun Sep 11, 2022 9:20 pm

Repeat with zz=the number of lines of Big down to 1 --delete duplicates
if line zz of Big=line zz-1 of Big then delete line zz-1 of Big
end Repeat

Code: Select all

put empty into zz
   repeat forever    --delete duplicates
      add 1 to zz
      if zz=the number of lines of Big then exit repeat
      repeat forever
         if line zz of Big=line zz+1 of Big then delete line zz of Big
          if line zz of Big≠line zz+1 of Big then exit repeat
      end repeat
      if zz=the number of lines of Big then exit repeat
   end repeat

The second code is about a magnitude faster than the first.

SparkOut · Post by **SparkOut** » Sun Sep 11, 2022 9:58 pm

Did you try the array method? Or Richmond's split/combine?
Your second code still has to count lines, which is a much greater overhead than the other suggested methods. "A magnitude faster" means reducing the time taken from hours to ... minutes?
I think you might be surprised by the other methods, maybe measuring in milliseconds. Or maybe I'll be surprised - either way, your feedback from a comparison would be very interesting.

zink137 · Post by **zink137** » Sun Sep 11, 2022 10:59 pm

SparkOut,

The array thing is over my head and this project isn't important enough to learn it (arrays) at the present.

The second code is 10 times faster than the first--yes, hours to minutes.

I'm assuming the counting thing is always from the beginning and most of the duplicates are at the beginning, so the second code starts at the beginning instead of the end. Generating the list for 50,000 and then sorting it, puts 50,000 1's at the beginning and then 50,000 2's and so on, the number of duplicates decreasing until the end, the largest number for which there is no duplicate and by-the-way it's 121,012,864. The list for 50k is just over 5 million digits long.

What this is, is taking any positive integer and if it's odd multiply by 3 and add 1. If it's even divide by 2. And keep doing that to the result. So you get a sequence and the sequence for each integer is different. Collatz's conjecture ( I don't know how to make the link thing work, but it's at wikipedia) is that all sequences eventually converge to 1. But it's never been proven.

I'm creating a super sequence, a sequence of sequences, putting all the sequences up to 50K together, removing all the duplicates and seeing what remains. Each sequence is vaguely like a random walk, except it's not random.

Thanks,

Nelson

richmond62 · Post by **richmond62** » Sun Sep 11, 2022 11:04 pm

I posted the Wikipedia link a while ago.

zink137 · Post by **zink137** » Sun Sep 11, 2022 11:20 pm

richmond62,

How did you do it (making a link)?

Nelson

rkriesel · Post by **rkriesel** » Mon Sep 12, 2022 4:29 am

FourthWorld wrote: Sun Sep 11, 2022 1:06 am Good call on using both split and combine...

agreed, in general, but here split can be adequate without combine:

Code: Select all

function dedupeViaSplit pLines -- not sustaining the sequence
   split pLines by cr as set
   return the keys of pLines
end dedupeViaSplit

-- Dick

scott_morrow · Post by **scott_morrow** » Mon Sep 12, 2022 6:49 am

I've made some changes to Richmond's stack so that it can compare the three methods for removing duplicates suggested here:
1) Array Keys -- fastest
2) Split / Combine -- next fastest
3) repeat for each -- just a bit slower than split/combine

scott_morrow · Post by **scott_morrow** » Mon Sep 12, 2022 7:26 am

I didn't see Dick's function, which left out the "combine" step, until after I posted Richmond's updated test stack. However, I went back and compared these two functions and the one with "combine" ran significantly faster. (sorting had no effect) I assume it is the "as set" form but am not familiar enough with split to understand why that might be.

Code: Select all

function dedupeViaSplit pLines
   split pLines by cr as set
   -- return the keys of pLines
   put (the keys of pLines) into tDeDupedLines
   sort tDeDupedLines numeric
   return tDeDupedLines
end dedupeViaSplit


function DOUBLETROUBLE WXYZ
   split WXYZ by return and space
   combine WXYZ by return and null
   sort WXYZ numeric
   return WXYZ
end DOUBLETROUBLE

LiveCode Forums.

Need for Speed

Re: Need for Speed

Re: Need for Speed

Re: Need for Speed

Re: Need for Speed

Re: Need for Speed

Re: Need for Speed

Re: Need for Speed

Re: Need for Speed