Code: Select all
Test 1 - lineoffset: 399 ms
Test 2 - repeat for each: 178 ms
Test 3 - split as array: 51 ms
Test 4 - offset chars: 3 ms
Test 5 - repeat for each: 15 ms
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
Code: Select all
Test 1 - lineoffset: 399 ms
Test 2 - repeat for each: 178 ms
Test 3 - split as array: 51 ms
Test 4 - offset chars: 3 ms
Test 5 - repeat for each: 15 ms
Code: Select all
local sData
on mouseUp
put empty into fld "out1"
put empty into fld "out2"
put empty into fld "out3"
put empty into fld "out4"
put empty into fld "out5"
put "Special Education Director" into tTitle
put fld "RawData" into sData
-- Test 1 -- lineoffset:
put the millisecs into t
put rexSolution(tTitle) into r1
put the millisecs - t into t1
-- Test 2 -- repeat for each:
put the millisecs into t
put altSolution(tTitle) into r2
put the millisecs - t into t2
-- Test 3 -- split:
put the millisecs into t
put test3Solution(tTitle) into r3
put the millisecs - t into t3
-- Test 4 -- offset:
put the millisecs into t
put test4Solution(tTitle) into r4
put the millisecs - t into t4
-- Test 5 -- repeat for each:
put the millisecs into t
put test5Solution(tTitle) into r5
put the millisecs - t into t5
-- Test 6 -- lineDelimiter and itemDelimiter
put the millisecs into t
put test6Solution(tTitle) into r6
put the millisecs - t into t6
--
-- Show times, and whether results match:
put "Test 1 - lineoffset: "& t1 &" ms" & \
cr& "Test 2 - repeat for each: "& t2 &" ms" & \
cr& "Test 3 - split as array: "& t3 &" ms" & \
cr& "Test 4 - offset chars: "& t4 &" ms "& \
cr& "Test 5 - repeat for each: "& t5 &" ms "& \
cr & "Test 6 - change delimiters: " & t6 & " ms" & \
cr&"Results match: "& (r1 = r2) && (r1 = r3) && (r1 = r4) && (r1 = r5) && (r1 = r6) --into field "fRes"
--
put the number of lines of r4 &cr& r4 into fld "out1"
put the number of lines of r2 &cr& r2 into fld "out2"
put the number of lines of r3 &cr& r3 into fld "out3"
put the number of lines of r5 &cr& r5 into fld "out4"
put the number of lines of r6 & cr & r6 into field "out5"
end mouseUp
-- Test 1 - lineoffset:
function rexSolution searchFor
local L2skip = 0
repeat while true
put lineOffset( searchFor, sData, L2skip) into N
if N = 0 then exit repeat
put line (N + 3 + L2skip) of sData &cr after tFound
add (N+3) to L2skip
end repeat
return CleanResult(tFound)
end rexSolution
-- Test 2 - repeat for each
function altSolution searchFor
put 1 into i
repeat for each line tLine in sData
if searchFor is in tLine then
put line (i+3) of sData & cr after tFound
end if
add 1 to i
end repeat
return CleanResult(tFound)
end altSolution
-- Test 3 - split
function test3Solution searchFor
put sData into tArray
split tArray by return
put 1 into i
repeat for each element tElement in tArray
if tElement is searchFor then put tArray[i + 3] & return after tOutput
add 1 to i
end repeat
return CleanResult(tOutput)
end test3Solution
-- Test 4 - offset
function test4Solution searchFor
put 0 into tStart
put len(searchFor) + 2 into tSLen
repeat forever
put offset(cr& searchFor &cr, sData, tStart) into tOS
if tOS = 0 then exit repeat
add tOS+tSLen to tStart
put offset(cr, sData, tStart) into tOS
add tOS+1 to tStart
put offset(cr, sData, tStart) into tOS
add tOS+2 to tStart
put offset(cr, sData, tStart) into tEnd
add tStart-1 to tEnd
put char tStart-1 to tEnd of sData into s
put s &cr after tOutput
end repeat
return CleanResult(tOutput)
end test4Solution
-- Test 5 - repeat for each
function test5Solution searchFor
put 1 into i
repeat for each line tLine in sData
if i is j then
put tLine & cr after tFound
else if searchFor is in tLine then
put i+3 into j
end if
add 1 to i
end repeat
return CleanResult(tFound)
end test5Solution
function test6Solution searchFor
put sData into tData -- needed to not mess up sData because of delete line 1
put "Fax: " into tFax
set the lineDelimiter to searchFor
delete line 1 of tData
set the itemDelimiter to tFax
repeat for each line aLine in tData
put item 2 of aLine into tTest
set the itemDelimiter to cr
put tFax & item 1 of tTest & cr after tFound
set the itemDelimiter to tFax
end repeat
set the lineDelimiter to LF
set the itemDelimiter to comma
return CleanResult(tFound)
end test6Solution
function CleanResult s
filter s without "Fax: "
filter s without "Fax: -- "
sort lines of s
return s
end CleanResult
Code: Select all
Test 1 - lineoffset: 1509 ms
Test 2 - repeat for each: 774 ms
Test 3 - split as array: 184 ms
Test 4 - offset chars: 12 ms
Test 5 - repeat for each: 69 ms
Test 6 - change delimiters: 11 ms
Results match: true true true true true
This is the part that interests me the most -- relating tool effectiveness to data characteristics. Evidently, "char offset" works best for "needles in haystack" data such as the current one: 250 "founds" among 35,000 lines.FourthWorld wrote:.... I would imagine that it the number of found elements were even close to have of the full data set the overhead of explicitly managing the lines between the found substring and the actual target phone number would be greater than even the setup cost for the array.....
Another clue is the TRANSPOSE function. It works only when both sets of keys (of a two-dimensional array) are sequential integers. (The elements have to be numbers, too).FourthWorld wrote:..... I'm guessing that the engine has a special case for arrays in which keys are not only integers but also in unbroken sequence, and in those cases either implements them differently, perhaps as some sort of linked list, or does an internal sort..........
Just wondering: Other things remaining the same, is LC7.x slower than LC6.7 ? (I know that the file sizes are larger).bn wrote:...... the times using LC 7.1 DP1 or LC 7.0.6 using a Mac are .....
Code: Select all
Test 1 - lineoffset: 16680 ms
Test 2 - repeat for each: 7621 ms
Test 3 - split as array: 78 ms
Test 4 - offset chars: 29 ms
Test 5 - repeat for each: 26 ms
Code: Select all
Test 1 - lineoffset: 92645 ms
Test 2 - repeat for each: 45076 ms
Test 3 - split as array: 409 ms
Test 4 - offset chars: 186 ms
Test 5 - repeat for each: 204 ms
Test 6 - change delimiters: 165 ms
Code: Select all
LC 6.7.7rc1
Test 1 - lineoffset: 199 ms
Test 2 - repeat for each: 86 ms
Test 3 - split as array: 38 ms
Test 4 - offset chars: 2 ms
Test 5 - repeat for each: 9 ms
LC 7.1dp1
Test 1 - lineoffset: 1253 ms
Test 2 - repeat for each: 659 ms
Test 3 - split as array: 153 ms
Test 4 - offset chars: 8 ms
Test 5 - repeat for each: 54 ms
Code: Select all
function test6Solution searchFor
put sData into tData -- needed to not mess up sData because of delete line 1
set the lineDelimiter to searchFor
delete line 1 of tData
set the itemDelimiter to cr
repeat for each line aLine in tData
put item 4 of aLine & cr after tFound
end repeat
set the lineDelimiter to LF
set the itemDelimiter to comma
return CleanResult(tFound)
end test6Solution
That's both the fascination and bane of benchmarking: we often find that the fastest solution for a given problem will depend on the specifics of that problem. As with so many other things in life, there's rarely a single "best" solution. Often I'll just settle for a compromise in performance if the solution will be used in multiple contexts, but if it's a one-off that needs to run fast it can sometimes be worth the extra effort to tailor an algo for the data being acted in.sritcp wrote:This is the part that interests me the most -- relating tool effectiveness to data characteristics..FourthWorld wrote:.... I would imagine that it the number of found elements were even close to have of the full data set the overhead of explicitly managing the lines between the found substring and the actual target phone number would be greater than even the setup cost for the array.....
Code: Select all
filter tList with ("*" &tab& "Sri"& tab)
Code: Select all
filter tList with ("*" &tab& "*" & tab& "*" &tab* "*" &tab* "*" &tab& "*" &tab& "*" &tab& "*" &tab& "*" &tab& "Sri" &tab)
As Bernd's results show, the code base refactoring needed for Unicode and other foundational elements benefiting v8 have yielded performance degradation nearly across the board. There are some things which are a bit faster, but many which are slower.I wonder how Bernd's "search string as delimiter" method will hold up when we increase the proportion of the "founds". (By the way, I still use LC 6.7; so it came as a -- pleasant -- surprise to learn that LC 7 permits the use of arbitrary strings as delimiters. I can imagine very complex and specific searches using this feature).
Transpose is another special case, but I would guess it's very different from sequential integers in arrays under the hood.sritcp wrote:Another clue is the TRANSPOSE function. It works only when both sets of keys (of a two-dimensional array) are sequential integers. (The elements have to be numbers, too).FourthWorld wrote:..... I'm guessing that the engine has a special case for arrays in which keys are not only integers but also in unbroken sequence, and in those cases either implements them differently, perhaps as some sort of linked list, or does an internal sort..........
The penalty seems to be huge.Fourth World wrote:As Bernd's results show, the code base refactoring needed for Unicode and other foundational elements benefiting v8 have yielded performance degradation nearly across the board.
A couple of the engineers have expressed interest in being able to turn off Unicode as well, so I have the impression that Unicode alone isn't the whole difference here.sritcp wrote:The penalty seems to be huge.Fourth World wrote:As Bernd's results show, the code base refactoring needed for Unicode and other foundational elements benefiting v8 have yielded performance degradation nearly across the board.
I wish the unicode feature in LC 7+ could be turned "on" and "off" as necessary. (Wasn't there a feature request?)
I guess it is too deeply integrated into the code to do that.
So, how is LC8 expected to fare as compared with LC7?FourthWorld wrote:.... I would expect it may take the LiveCode team a few more versions as well, and with the extra bonus points of having done an exceptional job of maintaining backward compatibility.
LC 8 is LC 7, with the addition of a new language for component authors, LiveCode Builder (often affectionately call "LCB").sritcp wrote:FourthWorld wrote:So, how is LC8 expected to fare as compared with LC7?
Faster, or even more trade-offs?Sri
Code: Select all
function test7Solution searchFor
set the lineDelimiter to cr & searchFor & cr -- note requires LC v7+
set the itemDelimiter to cr
repeat for each line aLine in sData
put item 3 of aLine & cr after tFound
end repeat
return CleanResult( item 2 to -1 of tFound ) & cr -- note: ignore the first item to obviate deleting a line
end test7Solution
Code: Select all
Test 1 - lineoffset: 1130 ms
Test 2 - repeat for each: 592 ms
Test 3 - split as array: 127 ms
Test 4 - offset chars: 8.6 ms
Test 5 - repeat for each: 66 ms
Test 6 - change delimiters: 7.2 ms
Test 7 - delimiters redux: 7 ms
Test 8 - offset redux: 8.3 ms
Results match: true true true true true true true
Code: Select all
local sData
on mouseUp
put empty into fld "out1"
put empty into fld "out2"
put empty into fld "out3"
put empty into fld "out4"
put empty into fld "out5"
put empty into fld "out6"
put empty into fld "out7"
put "Special Education Director" into tTitle
put fld "RawData" into sData
-- Test 1 -- lineoffset:
put the millisecs into t
put rexSolution(tTitle) into r1
put the millisecs - t into t1
-- Test 2 -- repeat for each:
put the millisecs into t
put altSolution(tTitle) into r2
put the millisecs - t into t2
-- Test 3 -- split:
put the millisecs into t
put test3Solution(tTitle) into r3
put the millisecs - t into t3
-- Test 4 -- offset:
put the long millisecs into t
put test4Solution(tTitle) into r4
put round(the long millisecs - t,1) into t4
-- Test 5 -- repeat for each:
put the millisecs into t
put test5Solution(tTitle) into r5
put the millisecs - t into t5
if version() > 6 then
-- Test 6 -- change delimiters:
put the long millisecs into t
put test6Solution(tTitle) into r6
put round(the long millisecs - t,1) into t6
-- Test 7 -- change delimiters redux:
put the long millisecs into t
put test7Solution(tTitle) into r7
put round(the long millisecs - t,1) into t7
end if
-- Test 8 -- offset redux:
put the long millisecs into t
put test8Solution(tTitle) into r8
put round(the long millisecs - t,1) into t8
--
-- Show times, and whether results match:
if version() >= 7 then
put "Test 1 - lineoffset: "& t1 &" ms" & \
cr& "Test 2 - repeat for each: "& t2 &" ms" & \
cr& "Test 3 - split as array: "& t3 &" ms" & \
cr& "Test 4 - offset chars: "& t4 &" ms "& \
cr& "Test 5 - repeat for each: "& t5 &" ms "& \
cr& "Test 6 - change delimiters: "& t6 &" ms "& \
cr& "Test 7 - delimiters redux: "& t7 &" ms "& \
cr& "Test 8 - offset redux: "& t8 &" ms "& \
cr&"Results match: "& (r1 = r2) && (r1 = r3) && (r1 = r4) && (r1 = r5) && (r1 = r6) && (r1 = r7) && (r1 = r8) & \
cr after msg
else
put "Test 1 - lineoffset: "& t1 &" ms" & \
cr& "Test 2 - repeat for each: "& t2 &" ms" & \
cr& "Test 3 - split as array: "& t3 &" ms" & \
cr& "Test 4 - offset chars: "& t4 &" ms " & \
cr& "Test 5 - repeat for each: "& t5 &" ms " & \
cr& "Test 6 - change delimiters: skipped in versions before 7" & \
cr& "Test 7 - delimiters redux: skipped in versions before 7" & \
cr& "Test 8 - offset redux: "& t8 &" ms " & \
cr&"Results match: "& (r1 = r2) && (r1 = r3) && (r1 = r4) && (r1 = r5) && (r1 = r6) && (r1 = r7) && (r1 = r8) & \
cr after msg
end if
--
put the number of lines of r2 &cr& r2 into fld "out1"
put the number of lines of r3 &cr& r3 into fld "out2"
put the number of lines of r4 &cr& r4 into fld "out3"
put the number of lines of r5 &cr& r5 into fld "out4"
put the number of lines of r6 &cr& r6 into fld "out5"
put the number of lines of r7 &cr& r7 into fld "out6"
put the number of lines of r8 &cr& r8 into fld "out7"
end mouseUp
-- Test 1 - lineoffset:
function rexSolution searchFor
local L2skip = 0
repeat while true
put lineOffset( searchFor, sData, L2skip) into N
if N = 0 then exit repeat
put line (N + 3 + L2skip) of sData &cr after tFound
add (N+3) to L2skip
end repeat
return CleanResult(tFound)
end rexSolution
-- Test 2 - repeat for each
function altSolution searchFor
put 1 into i
repeat for each line tLine in sData
if searchFor is in tLine then
put line (i+3) of sData & cr after tFound
end if
add 1 to i
end repeat
return CleanResult(tFound)
end altSolution
-- Test 3 - split
function test3Solution searchFor
put sData into tArray
split tArray by return
put 1 into i
repeat for each element tElement in tArray
if tElement is searchFor then put tArray[i + 3] & return after tOutput
add 1 to i
end repeat
return CleanResult(tOutput)
end test3Solution
-- Test 4 - offset
function test4Solution searchFor
put 0 into tStart
put len(searchFor) + 2 into tSLen
repeat forever
put offset(cr& searchFor &cr, sData, tStart) into tOS
if tOS = 0 then exit repeat
add tOS+tSLen to tStart
put offset(cr, sData, tStart) into tOS
add tOS+1 to tStart
put offset(cr, sData, tStart) into tOS
add tOS+2 to tStart
put offset(cr, sData, tStart) into tEnd
add tStart-1 to tEnd
put char tStart-1 to tEnd of sData into s
put s &cr after tOutput
end repeat
return CleanResult(tOutput)
end test4Solution
-- Test 5 - repeat for each
function test5Solution searchFor
put 1 into i
repeat for each line tLine in sData
if i is j then
put tLine & cr after tFound
else if searchFor is tLine then
put i+3 into j
end if
add 1 to i
end repeat
return CleanResult(tFound)
end test5Solution
-- Test 6 - change delimiters
function test6Solution searchFor
-- put sData into tData -- needed to not mess up sData because of delete line 1
-- put "Fax: " into tFax
-- set the lineDelimiter to searchFor
-- delete line 1 of tData
-- set the itemDelimiter to tFax
-- repeat for each line aLine in tData
-- put item 2 of aLine into tTest
-- set the itemDelimiter to cr
-- put tFax & item 1 of tTest & cr after tFound
-- set the itemDelimiter to tFax
-- end repeat
-- set the lineDelimiter to LF
-- set the itemDelimiter to comma
-- return CleanResult(tFound)
put sData into tData -- needed to not mess up sData because of delete line 1
set the lineDelimiter to searchFor -- note requires at least LC v7
delete line 1 of tData
set the itemDelimiter to cr
repeat for each line aLine in tData
put item 4 of aLine & cr after tFound
end repeat
set the lineDelimiter to LF
set the itemDelimiter to comma
return CleanResult(tFound)
end test6Solution
-- Test 7 - delimiters redux
function test7Solution searchFor
set the lineDelimiter to cr & searchFor & cr -- note requires at least LC v7
set the itemDelimiter to cr
repeat for each line aLine in sData
put item 3 of aLine & cr after tFound
end repeat
return CleanResult( item 2 to -1 of tFound ) & cr -- note: ignore the first item to obviate deleting it
end test7Solution
-- Test 8 - offset redux
function test8Solution searchFor
put length(searchFor) into tLength
put 0 into tStart
repeat forever
get offset(cr & searchFor & cr, sData, tStart)
if it is 0 then exit repeat
add tLength + 2 + it to tStart
add offset(cr, sData, tStart)+1 to tStart
add offset(cr, sData, tStart)+1 to tStart
get offset(cr, sData, tStart)
put char tStart to tStart + it of sData after tOutput
add it to tStart
end repeat
return CleanResult(tOutput)
end test8Solution
function CleanResult s
filter s without "Fax: "
filter s without "Fax: -- "
sort lines of s
return s
end CleanResult