Need for Speed

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

zink137
Posts: 6
Joined: Sat Sep 10, 2022 10:19 pm

Need for Speed

Post by zink137 » Sat Sep 10, 2022 10:41 pm

I’m generating large sets of integers (millions) which takes just a few seconds and then sorting them numerically (another couple seconds) and then deleting all the multiples which takes hours.

Repeat with zz=the number of lines of BigTemp down to 1 --delete duplicates
if line zz of BigTemp=line zz-1 of BigTemp then delete line zz-1 of BigTemp
end Repeat

How can the deletion process be speeded up? “Lines” deletes about twice as fast as “words”

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sat Sep 10, 2022 10:46 pm

Can I ask a few questions first?

1. How are you generating this large list numbers?

2. What is the advantage of generating a list that can contain duplicates so you then have to 'weed the garden'?

zink137
Posts: 6
Joined: Sat Sep 10, 2022 10:19 pm

Re: Need for Speed

Post by zink137 » Sat Sep 10, 2022 11:00 pm

on mouseUp
put the milliseconds into timeOn
put empty into fld 1

repeat with xx=2 to 50000
put xx into zz
repeat until zz=1

if zz mod 2 =1 then
put ((zz*3)+1) into zz
put zz &return after BigTemp
else
put zz/2 into zz
put zz &return after BigTemp
end if

end repeat
end repeat

put BigTemp into fld 1
put the number of lines of BigTemp&return&return before fld 1

put (the milliseconds-timeOn)/1000&" seconds"

end mouseUp

This is part of an investigation into the Collatz numbers.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sat Sep 10, 2022 11:13 pm

Tangentially . . .

Why is SORT not very good?
-
SShot 2022-09-11 at 1.10.59.png
-
with this:

Code: Select all

on mouseUp
   sort lines of fld "TANK" descending
end mouseUp
results in this:
-
SShot 2022-09-11 at 1.11.17.png
-
AND even I, with my limited knowledge of Mathematics, know that 989 is bigger than 99.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sat Sep 10, 2022 11:15 pm

https://en.wikipedia.org/wiki/Collatz_conjecture

Well, well, well: one learns something every day. 8)

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sat Sep 10, 2022 11:18 pm

OK:

Code: Select all

on mouseUp
   sort lines of fld "TANK" descending numeric
end mouseUp
numeric

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sat Sep 10, 2022 11:22 pm

So, over here [MacOS 13, 8 GB RAM] your script took just over 35 seconds to generate your list of numbers.

zink137
Posts: 6
Joined: Sat Sep 10, 2022 10:19 pm

Re: Need for Speed

Post by zink137 » Sat Sep 10, 2022 11:39 pm

My iMac usually generates the list in under 10 seconds. A little over 5 million integers, it's kind of astonishing.

But that's all beside the point. How to eliminate duplicates in a similar timely fashion?

SparkOut
Posts: 2943
Joined: Sun Sep 23, 2007 4:58 pm

Re: Need for Speed

Post by SparkOut » Sun Sep 11, 2022 12:02 am

Firstly, the reason for the slow iteration through your loop is that in the "repeat with <index>" version the engine has to keep track of every one of those millions of lines or words and count through the list every time to reach the index.
Generally it is far better speedwise to use the "repeat for each <word/item/line>" version. You can't then directly manipulate the original list, so in the processing part you would normally make your checks and append correct or desired results to a new variable. In this case you might

Code: Select all

put empty into tPrevTemp
put empty into tNewBigTemp  --not necessary if these are really transient variables 
repeat for each line tTemp in BigTemp 
  if tTemp is not tPrevTemp then
  .  put tTemp & cr after tNewBigTemp
  .  put tTemp into tPrevTemp
  end if
end repeat
--use tNewBigTemp in place of BigTemp instead
Personally I find the best way of deduplicating a list is to add 1 to the contents of an array key where the keys are the values being deduplicated (works with numbers or text, whatever).
So when you have your zz variable, instead of appending it to BigTemp you would

Code: Select all

add 1 to BigTempArray[(zz)]
What that means is that the unique elements will have a value of 1, and the non unique ones will have a higher value but still only the one key. The values ate irrelevant, so you can just

Code: Select all

put the keys of BigTempArray into tBigTemp
sort tBigTemp numeric ascending 
Last edited by SparkOut on Sun Sep 11, 2022 12:08 am, edited 1 time in total.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Need for Speed

Post by FourthWorld » Sun Sep 11, 2022 12:07 am

It would be interesting to see a comparison with the array option that used to be included as an example of finding unique values, e.g.:

Code: Select all

function UniqueVals pList
  repeat for each line tLine in pList
     put tLine into tA[tLine]
  end repeat
  get the keys of tA
  sort lines of it numeric
  return it
end UniqueVals
Edit: Sparkout beat me to it. :) Would be good to know results just the same.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sun Sep 11, 2022 12:17 am

Hmm: got a bit bogged down:

Code: Select all

on mouseup
   put empty into fld "fLINES2"
   put empty into fld "TANK2"
   put fld "TANK" into XYZ
   get DOUBLETROUBLE(XYZ)
   put it after fld "TANK2"
   put ((the number of lines in fld "TANK2") && "lines >>>") into fld "fLINES2"
end mouseup

function DOUBLETROUBLE WXYZ
   split WXYZ by return and comma
   combine WXYZ by return and comma
   sort WXYZ numeric
   return WXYZ
end DOUBLETROUBLE
-
SShot 2022-09-11 at 2.18.40.png

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sun Sep 11, 2022 12:25 am

Removed link as improved version below.

BE WARNED: On a Mac, once it has generated the Collatz numbers the stack BLOATS to about 89 MB. :?
Last edited by richmond62 on Sun Sep 11, 2022 12:51 am, edited 1 time in total.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sun Sep 11, 2022 12:34 am

The commas are a curse.

This IS better (aesthetically at least, as it bungs a space at the end of each number rather than a comma):
-

Code: Select all

on mouseup
   set the vis of img "COOKING" to true
   put empty into fld "fLINES2"
   put empty into fld "TANK2"
   put fld "TANK" into XYZ
   get DOUBLETROUBLE(XYZ)
   put it after fld "TANK2"
   put ((the number of lines in fld "TANK2") && "lines >>>") into fld "fLINES2"
   set the vis of img "COOKING" to false
end mouseup

function DOUBLETROUBLE WXYZ
   split WXYZ by return and space
   combine WXYZ by return and space
   sort WXYZ numeric
   return WXYZ
end DOUBLETROUBLE
UNFORTUNATELY one cannot do this:

Code: Select all

function DOUBLETROUBLE WXYZ
   split WXYZ by return and space
   combine WXYZ by return
   sort WXYZ numeric
   return WXYZ
end DOUBLETROUBLE

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10078
Joined: Fri Feb 19, 2010 10:17 am

Re: Need for Speed

Post by richmond62 » Sun Sep 11, 2022 12:42 am

Ah, got it at last . . . 8)

Code: Select all

function DOUBLETROUBLE WXYZ
   split WXYZ by return and space
   combine WXYZ by return and null
   sort WXYZ numeric
   return WXYZ
end DOUBLETROUBLE
-
SShot 2022-09-11 at 2.50.33.png
Attachments
Number Gen.livecode.zip
Stack.
(185.94 KiB) Downloaded 174 times

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Need for Speed

Post by FourthWorld » Sun Sep 11, 2022 1:06 am

Good call on using both split and combine. I was thinking of the old example for getting frequency counts, but that's not needed here.

The old scripting adage abides:
- Know the engine
- Trust the engine
- Use the engine
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Post Reply