How to Speed Up This Code?

Simon · Post by **Simon** » Fri Jul 31, 2015 9:51 pm

What CPU are you running?

Intel Core i5-3317U @ 1.7GHz

Just laugh so I can justify an upgrade

Simon

sritcp · Post by **sritcp** » Fri Jul 31, 2015 10:20 pm

FourthWorld wrote:.... we need to double-check it for accuracy, as the results are very different from the output of the first two. In my tests the output from this one is a subset of the others, and most of the extras in the first two are duplicate fax numbers, but not all.

All three methods return 256 fax numbers, 240 after removing duplicates.
I haven't checked if the numbers are different (I can't quickly think of a short code), but given the similarity of algorithms, I doubt they are.

Regards,
Sri

sritcp · Post by **sritcp** » Fri Jul 31, 2015 10:35 pm

The results are identical for all three methods.

Code: Select all

on mouseUp
   local i, t1, t2, t3
   -- put field "Output" into t1 /* output of "repeat for each ..." with list method */
   put field "Output2" into t2  /* output of lineOffset method */
   put field "Output3" into t3  /* output of "repeat for each ..." with array method */
   put 1 into i
   repeat for each line thisLine in t3
      if line i of t2 is not thisLine then put return & i after msg
      add 1 to i
   end repeat
   put return & "Done" after msg
end mouseUp

Regards,
Sri

FourthWorld · Post by **FourthWorld** » Fri Jul 31, 2015 11:33 pm

I also get identical results when searching for the title "Special Education Director", but different results when searching for "Superintendent".

sritcp · Post by **sritcp** » Sat Aug 01, 2015 12:18 am

FourthWorld wrote:I also get identical results when searching for the title "Special Education Director", but different results when searching for "Superintendent".

Yes. That's because
a) There is also an Assistant Superintendent listed in the data
b) which are caught by the code construction of the first two methods:
if "Superintendent" is in tLine ... ("repeat for each ...." -- list version), and
lineOffset( searchFor, tData, L2skip) .... (lineOffset version)
c) but not in my "repeat for each ....." -- array version, which requires an exact match as in
if tElement is "Superintendent" .......

I think we are good.

Regards,
Sri

FourthWorld · Post by **FourthWorld** » Sat Aug 01, 2015 12:34 am

sritcp wrote:
FourthWorld wrote:I also get identical results when searching for the title "Special Education Director", but different results when searching for "Superintendent".
Yes. That's because
a) There is also an Assistant Superintendent listed in the data
b) which are caught by the code construction of the first two methods:
if "Superintendent" is in tLine ... ("repeat for each ...." -- list version), and
lineOffset( searchFor, tData, L2skip) .... (lineOffset version)
c) but not in my "repeat for each ....." -- array version, which requires an exact match as in
if tElement is "Superintendent" .......

I think we are good.

We are indeed. When I change "is" to "is in" in my copy of the array test all comes out equally.

Well done.

golive · Post by **golive** » Sat Aug 01, 2015 1:10 am

I would think that Livecode could optimize their code more to reduce the time penalty that repeat with has (when compared to repeat for).

FourthWorld · Post by **FourthWorld** » Sat Aug 01, 2015 4:12 am

Sri, how does this one work for you?:

Code: Select all

local sData

on mouseUp
   put "Special Education Director" into tTitle
   put fld "RawData" into sData
   -- Test 1 -- lineoffset:
   put the millisecs into t
   put rexSolution(tTitle) into r1
   put the millisecs - t into t1
   --  Test 2 -- repeat for each:
   put the millisecs into t
   put altSolution(tTitle) into r2
   put the millisecs - t into t2
   -- Test 3 -- split:
   put the millisecs into t
   put test3Solution(tTitle) into r3
   put the millisecs - t into t3
   -- Test 4 -- offset:
   put the millisecs into t
   put test4Solution(tTitle) into r4
   put the millisecs - t into t4
   --
   -- Show times, and whether results match:
   put "Test 1 - lineoffset:   "& t1 &" ms" & \
         cr& "Test 2 - repeat for each: "& t2 &" ms" & \
         cr& "Test 3 - split as array: "& t3 &" ms" & \
         cr& "Test 4 - offset chars: "& t4 &" ms "& \
         cr&"Results match: "& (r1 = r4)
   --
   put the number of lines of r4 &cr& r4 into fld "out1"
   put the number of lines of r2 &cr& r2 into fld "out2"
   put the number of lines of r3 &cr& r3 into fld "out3"
end mouseUp

-- Test 1 - lineoffset:
function rexSolution searchFor
   local L2skip = 0
   repeat while true
      put lineOffset( searchFor, sData, L2skip) into N
      if N = 0 then exit repeat
      put line (N + 3 + L2skip) of sData &cr  after tFound
      add (N+3) to L2skip
   end repeat
   return CleanResult(tFound)
end rexSolution

-- Test 2 - repeat for each
function altSolution searchFor
   put 1 into i
   repeat for each line tLine in sData
      if searchFor is in tLine then
         put line (i+3) of sData & cr after tFound
      end if
      add 1 to i
   end repeat
   return CleanResult(tFound)
end altSolution

-- Test 3 - split
function test3Solution searchFor
   put sData into tArray
   split tArray by return
   put 1 into i
   repeat for each element tElement in tArray
      if tElement is searchFor then put tArray[i + 3] & return after tOutput
      add 1 to i
   end repeat
   return CleanResult(tOutput)
end test3Solution

-- Test 4 - offset
function test4Solution searchFor
   put 0 into tStart
   put len(searchFor) + 2 into tSLen
   repeat forever 
      put offset(cr& searchFor &cr, sData, tStart) into tOS
      if tOS = 0 then exit repeat
      add tOS+tSLen to tStart
      put offset(cr, sData, tStart) into tOS
      add tOS+1 to tStart
      put offset(cr, sData, tStart) into tOS
      add tOS+2 to tStart
      put offset(cr, sData, tStart) into tEnd
      add tStart-1 to tEnd
      put char tStart-1 to tEnd of sData into s
      put s &cr after tOutput
   end repeat
   return CleanResult(tOutput)
end test4Solution


function CleanResult s
   filter s without "Fax: "
   filter s without "Fax: -- "
   sort lines of s
   return s
end CleanResult

rkriesel · Post by **rkriesel** » Sat Aug 01, 2015 6:29 am

Richard, thanks for the script.

Test 4 is a pleasant surprise, but slightly less so when test 5 quickens the "repeat for each" example by over an order of magnitude. Here are the results on my machine:

Code: Select all

Test 1 - lineoffset:   1124 ms
Test 2 - repeat for each: 583 ms
Test 3 - split as array: 117 ms
Test 4 - offset chars: 7 ms 
Test 5 - repeat for each: 53 ms 
Results match: true true true true

Here's Test 5:

Code: Select all

-- Test 5 - repeat for each
function test5Solution searchFor
    put 1 into i
    repeat for each line tLine in sData
        if i is j then
            put tLine & cr after tFound
        else if searchFor is in tLine then
            put i+3 into j
        end if
        add 1 to i
    end repeat
    return CleanResult(tFound)
end test5Solution

Here's the revised mouseUp:

Code: Select all

local sData

on mouseUp
    put empty into fld "out1"
    put empty into fld "out2"
    put empty into fld "out3"
    put empty into fld "out4"
    
    put "Special Education Director" into tTitle
    put fld "RawData" into sData
    -- Test 1 -- lineoffset:
    put the millisecs into t
    put rexSolution(tTitle) into r1
    put the millisecs - t into t1
    --  Test 2 -- repeat for each:
    put the millisecs into t
    put altSolution(tTitle) into r2
    put the millisecs - t into t2
    -- Test 3 -- split:
    put the millisecs into t
    put test3Solution(tTitle) into r3
    put the millisecs - t into t3
    -- Test 4 -- offset:
    put the millisecs into t
    put test4Solution(tTitle) into r4
    put the millisecs - t into t4
    --  Test 5 -- repeat for each:
    put the millisecs into t
    put test5Solution(tTitle) into r5
    put the millisecs - t into t5
    --
    -- Show times, and whether results match:
    put "Test 1 - lineoffset:   "& t1 &" ms" & \
          cr& "Test 2 - repeat for each: "& t2 &" ms" & \
          cr& "Test 3 - split as array: "& t3 &" ms" & \
          cr& "Test 4 - offset chars: "& t4 &" ms "& \
    cr& "Test 5 - repeat for each: "& t5 &" ms "& \
    cr&"Results match: "& (r1 = r2) && (r1 = r3) && (r1 = r4) && (r1 = r5)
    --
    put the number of lines of r4 &cr& r4 into fld "out1"
    put the number of lines of r2 &cr& r2 into fld "out2"
    put the number of lines of r3 &cr& r3 into fld "out3"
    put the number of lines of r5 &cr& r5 into fld "out4"
end mouseUp

Note that it now tests all the results, and that it needs field "out4" that wasn't there before.

Thanks again, Richard.

-- Dick

FourthWorld · Post by **FourthWorld** » Sat Aug 01, 2015 6:53 am

I like yours better than mine, Dick. But I often feel that way about your code.

Mine is faster but at a high cost in complexity. It's difficult to read, and difficult to modify for any other case. I undertook it only because I was thinking about my earlier comment about bytes being the only thing the machine understands, and looking at this from a purely byte-centric view.

Yours is much more readable, and much more adaptable for other cases. Algorithmically, more memorable.

sritcp · Post by **sritcp** » Sat Aug 01, 2015 4:22 pm

Richard and Dick, Thank you both for the script.
The results on my machine are:

Code: Select all

Test 1 - lineoffset:   266 ms
Test 2 - repeat for each: 119 ms
Test 3 - split as array: 50 ms
Test 4 - offset chars: 3 ms 
Test 5 - repeat for each: 15 ms

Richard, your "char offset" time is freaky! Closer to machine logic, faster the time but less flexible the code. I get it.
Dick, your version is deceptively simple but fast. Really enjoyed it.

I am gaining some solid learning here.

Thanks,
Sri

mwieder · Post by **mwieder** » Mon Aug 03, 2015 11:43 pm

I am gaining some solid learning here.

As am I.
I am really pleased with the way this community-driven approach to different ways of solving a particular problem is going.

Dick- that's a sneaky^H^H^Hclever way of avoiding an extra line copy. I like that a lot.

Richard - I had thought of trying to do this with an array, but couldn't figure out how the key-value pairs would work. I forgot that splitting text by a cr would assign numerical keys and thus allow the n+3 indexing to work. That's excellent, and I'm quite upset that offset is so fast, especially given that you're executing four offsets and four adds each time through the loop. But I suppose the fact that you're leaping over blocks of text with the first offset more than makes up for that, since there are only n iterations of the loop instead of one per line.

FourthWorld · Post by **FourthWorld** » Tue Aug 04, 2015 1:22 am

mwieder wrote:Richard - I had thought of trying to do this with an array, but couldn't figure out how the key-value pairs would work. I forgot that splitting text by a cr would assign numerical keys and thus allow the n+3 indexing to work. That's excellent, and I'm quite upset that offset is so fast, especially given that you're executing four offsets and four adds each time through the loop. But I suppose the fact that you're leaping over blocks of text with the first offset more than makes up for that, since there are only n iterations of the loop instead of one per line.

The array solution was Sri's - a very good one, easily generizable to other circumstances.

The offset solution occurred to me only because I've done so much benchmarking with arrays that I've become aware of the overhead of setting them up - whether split or arrayDecode, turning a bytestream into an array is relatively expensive, since it requires examining the bytestream in detail to create the array. Fortunately for Sri's algo the time saved in getting i+2 more than makes up the difference.

The offset solution would only be beneficial when the target elements are a subset of the whole; the smaller the faster, since as you noted the only advantage it offers is bypassing all the CR compares needed to create the array with a substring search instead. I would imagine that it the number of found elements were even close to have of the full data set the overhead of explicitly managing the lines between the found substring and the actual target phone number would be greater than even the setup cost for the array.

The curious thing about creating an array with split using a single char is that apparently you can use "repeat for each element" and reliably work on elements in ascending sort order of the key. This would not have occurred to me, since I'm accustomed to the keys being hashed in such a way that accessing them isn't sequential; the array normally has no sense of order at all.

I'm guessing that the engine has a special case for arrays in which keys are not only integers but also in unbroken sequence, and in those cases either implements them differently, perhaps as some sort of linked list, or does an internal sort. Do you know why keys comprised of sequential integers would behave so differently from other keys?

And how long has this been like this? It would seem to solve a major complaint about working with associative arrays for those who expect to be able to work with them as indexed arrays.

mwieder · Post by **mwieder** » Tue Aug 04, 2015 5:58 am

The array solution was Sri's - a very good one, easily generizable to other circumstances.

Kudos to Sri then - I haven't been paying close enough attention to the conversation.

The curious thing about creating an array with split using a single char is that apparently you can use "repeat for each element" and reliably work on elements in ascending sort order of the key. This would not have occurred to me, since I'm accustomed to the keys being hashed in such a way that accessing them isn't sequential; the array normally has no sense of order at all.

I'm guessing that the engine has a special case for arrays in which keys are not only integers but also in unbroken sequence, and in those cases either implements them differently, perhaps as some sort of linked list, or does an internal sort. Do you know why keys comprised of sequential integers would behave so differently from other keys?

And how long has this been like this? It would seem to solve a major complaint about working with associative arrays for those who expect to be able to work with them as indexed arrays.

I'm guessing that you and I are victims of our old habits. We expect that hashes (associative arrays) are unordered because they *are* unordered.
Splitting a variable into an array still gives us an unordered array, it's just that the keys are numeric, in order of the strings in the variable.
So it's a nice side effect that we can treat the array *as if it were ordered*, and thus we can take a numeric key, manipulate its value, and use it as a new key. I wouldn't have thought of this either because I'm used to things not working this way, but it's quite clever.

FourthWorld · Post by **FourthWorld** » Tue Aug 04, 2015 6:31 am

I just did some more testing, and it seems my hypothesis about sequential integers play out:

Code: Select all

on mouseUp
   -- Build a list like this:
   -- a <tab> a
   -- b <tab> b
   --...etc
   -- And another like this:
   -- 1 <tab> a
   -- 2 <tab> b
   -- ...etc
   -- And a third like this:
   -- 2 <tab> a
   -- 4 <tab> b
   -- ...etc
   put "a,b,c,d" into s
   put 0 into i
   repeat for each item tLine in s
      add 1 to i 
      put tLine &tab& tLine &cr after tA1 -- string keys
      put i & tab& tLine &cr after tA2   -- integer keys
      put i*2 &tab& tLine &cr after tA3 --non-sequential int keys
   end repeat
   --
   -- Turn each list into an array:
   split tA1 by cr and tab
   split tA2 by cr and tab
   split tA3 by cr and tab
   --
   -- Make list by traversing elements with string keys:
   repeat for each element tElem in tA1
      put tElem &cr after tList1
   end repeat
   --
   -- Make list by traversing elements with integer keys:
   repeat for each element tElem in tA2
      put tElem &cr after tList2
   end repeat
   --
   -- Make list by traversing elements with non-sequential int keys:
   repeat for each element tElem in tA3
      put tElem &cr after tList3
   end repeat
   --
   -- Display lists:
   put tList1 &cr& tList2 &cr& tList3
end mouseUp

Although all three output lists are made using the same "repeat for each element", only the one with the sequential-integer keys is in an anticipatable order, while the others exhibit the side-effects of hashing we'd expect.

So does this mean the engine's doing a sort for that case, or storing that those keys differently?

Interestingly, in all three cases querying "the keys" returns an unordered list.

LiveCode Forums.

How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?

Re: How to Speed Up This Code?