Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!
put empty into zz
repeat forever --delete duplicates
add 1 to zz
if zz=the number of lines of Big then exit repeat
repeat forever
if line zz of Big=line zz+1 of Big then delete line zz of Big
if line zz of Big≠line zz+1 of Big then exit repeat
end repeat
if zz=the number of lines of Big then exit repeat
end repeat
The second code is about a magnitude faster than the first.
Did you try the array method? Or Richmond's split/combine?
Your second code still has to count lines, which is a much greater overhead than the other suggested methods. "A magnitude faster" means reducing the time taken from hours to ... minutes?
I think you might be surprised by the other methods, maybe measuring in milliseconds. Or maybe I'll be surprised - either way, your feedback from a comparison would be very interesting.
The array thing is over my head and this project isn't important enough to learn it (arrays) at the present.
The second code is 10 times faster than the first--yes, hours to minutes.
I'm assuming the counting thing is always from the beginning and most of the duplicates are at the beginning, so the second code starts at the beginning instead of the end. Generating the list for 50,000 and then sorting it, puts 50,000 1's at the beginning and then 50,000 2's and so on, the number of duplicates decreasing until the end, the largest number for which there is no duplicate and by-the-way it's 121,012,864. The list for 50k is just over 5 million digits long.
What this is, is taking any positive integer and if it's odd multiply by 3 and add 1. If it's even divide by 2. And keep doing that to the result. So you get a sequence and the sequence for each integer is different. Collatz's conjecture ( I don't know how to make the link thing work, but it's at wikipedia) is that all sequences eventually converge to 1. But it's never been proven.
I'm creating a super sequence, a sequence of sequences, putting all the sequences up to 50K together, removing all the duplicates and seeing what remains. Each sequence is vaguely like a random walk, except it's not random.
I've made some changes to Richmond's stack so that it can compare the three methods for removing duplicates suggested here:
1) Array Keys -- fastest
2) Split / Combine -- next fastest
3) repeat for each -- just a bit slower than split/combine
I didn't see Dick's function, which left out the "combine" step, until after I posted Richmond's updated test stack. However, I went back and compared these two functions and the one with "combine" ran significantly faster. (sorting had no effect) I assume it is the "as set" form but am not familiar enough with split to understand why that might be.
function dedupeViaSplit pLines
split pLines by cr as set
-- return the keys of pLines
put (the keys of pLines) into tDeDupedLines
sort tDeDupedLines numeric
return tDeDupedLines
end dedupeViaSplit
function DOUBLETROUBLE WXYZ
split WXYZ by return and space
combine WXYZ by return and null
sort WXYZ numeric
return WXYZ
end DOUBLETROUBLE
Elementary Software
...now with 20% less chalk dust!