Here are ways to improve performance for processing large strings
Posted: Fri Oct 04, 2019 9:48 am
I nearly abandoned the development of a commercial app using LiveCode because it was too slow for processing large strings (e.g., 8000 lines). I then found some ways to get the performance I needed, so I wanted to share some tips. I hope these tips will be added to the Dictionary and other documentation.
Of course, if you're doing data analysis where only you need to see the result, then waiting four seconds for the answer is no big deal. But if you're deploying a commercial app, you can't expect your users to wait an annoying four seconds every time the app needs to process data. Fortunately, there are solutions.
Benchmarking performance
When trying different strategies to see how long they each one takes to process, time it with the milliseconds:
Use repeat for each.. instead of repeat with...
For my 8000-line string, it took 3.9 seconds to process with repeat with...:
The same processing takes only 0.04 seconds with repeat for each:
Note that repeat for each... doesn't affect the original variable, so it's necessary to store each processed line in a new variable.
Use variables instead of fields
If your data is in a field, put it into a variable and then process the variable. Processing variables is way faster than processing fields.
Save the processed data so it doesn't have to be processed again, or pre-process it
In most cases, repeat for each (above) is so stupidly fast it will solve all your problems. When it doesn't, here are some other ideas to consider.
Once you've processed data, you can save it somewhere so you don't have to process it again in the future, such as a field, custom property, or SQLite database. If the result of the calculations will change when the user adds new data, save the old result anyway, and then just add the calculation for the newly-added data to the old result. That way you process only a small amount of data rather than the whole dataset.
Another trick is to PRE-process the data. For example, let's say you have 800,000 records in a database, and the user wants to change the date format. The slow way is to pull all 800,000 records out of the database, reformat the date for each record and save it back to the database. Now say the user changes his mind and chooses *another* date format. Same as before, you pull all the records, process them, and then save them.
Instead of this, when the user enters a new record, you could save it using three different date fields, one for the three different date options the user can choose, each in a different field. (e.g., 1/15/19; Jan 15, 2019; or 2019-01-15). Then when the user chooses a different date format, you can just pull all the records out of the database with the appropriate field, and not have to process anything. This will use a little more disk space, so that's a tradeoff.
Process in the background
If the user doesn't need the processed data right away, you can process it in the background with LiveCode's on idle handler.
Use an older version of LiveCode?
AxWald reports below that his version 6.7.10 of LiveCode runs twice as fast as his version 9.5.
Process large strings in slices
This section is probably obsolete. I wrote it before I learned that repeat for each is way faster than repeat with.... I'm retaining it to illustrate the concept.
As a string gets longer, it takes exponentially longer to process it. That is, a string that's 8 times longer than a small string doesn't take 8 times longer to process, it can take 65 times longer to process! For example, when I run a loop to process a certain number of lines, this is how long it takes:
I think I read long ago that the reason for the slowdown is that when LiveCode processes lines, it starts counting from the very time for each iteration. Assuming this is true, then when you target, say, line 4256 of a string, LiveCode doesn't zoom into line 4256, it starts with line 1 and counts through until it gets to line 4256. So, when traversing a large string, it's fast at first, but then progressively bogs down.
My solution here was to split up the string into 1000-line slices, process each slice, and then string them together. Doing that, I'm able to process 8000 records in only 0.9 seconds, over 4 times faster than processing a single 8000-line string.
You might think that you could process the first line of the string, save it somewhere else, delete it, and then process the next line in the string, which is now the first line of the string, because you deleted the previously-first line after you processed it and saved it. The idea is that you'd always be processing the first line, which should be fast. Unfortunately this won't work if you have to save each line, because as you tack each processed line onto the end of some new string or a field (even if you've set lockscreen and lockmessages to true), that container gets bigger and LiveCode presumably has to start counting from the top of the container every time you tack on to the end of it. (And by the way, it takes about twice as long to tack into the end of a field than to a string variable.)
What if the order of the data didn't matter? Could you write each processed line to the *beginning* of a new string so that LiveCode wouldn't have to count from the top? I tried that, and for whatever reason, it doesn't save any time.
Of course, if you're doing data analysis where only you need to see the result, then waiting four seconds for the answer is no big deal. But if you're deploying a commercial app, you can't expect your users to wait an annoying four seconds every time the app needs to process data. Fortunately, there are solutions.
Benchmarking performance
When trying different strategies to see how long they each one takes to process, time it with the milliseconds:
Code: Select all
on mouseup
put the milliseconds into startTime
put field theHugeString into stringToProcess
repeat with counter = 1 to the number of lines of stringToProcess
--Do some processing here
end repeat
put the milliseconds - startTime
end mouseup
Use repeat for each.. instead of repeat with...
For my 8000-line string, it took 3.9 seconds to process with repeat with...:
Code: Select all
repeat with counter = 1 to the number of lines in hugeString
delete item 3 to 7 of line counter of hugeString
end repeat
Code: Select all
repeat for each line theLine in hugeString
delete item 3 to 7 of theLine
put theLine & cr after output
end repeat
Use variables instead of fields
If your data is in a field, put it into a variable and then process the variable. Processing variables is way faster than processing fields.
Save the processed data so it doesn't have to be processed again, or pre-process it
In most cases, repeat for each (above) is so stupidly fast it will solve all your problems. When it doesn't, here are some other ideas to consider.
Once you've processed data, you can save it somewhere so you don't have to process it again in the future, such as a field, custom property, or SQLite database. If the result of the calculations will change when the user adds new data, save the old result anyway, and then just add the calculation for the newly-added data to the old result. That way you process only a small amount of data rather than the whole dataset.
Another trick is to PRE-process the data. For example, let's say you have 800,000 records in a database, and the user wants to change the date format. The slow way is to pull all 800,000 records out of the database, reformat the date for each record and save it back to the database. Now say the user changes his mind and chooses *another* date format. Same as before, you pull all the records, process them, and then save them.
Instead of this, when the user enters a new record, you could save it using three different date fields, one for the three different date options the user can choose, each in a different field. (e.g., 1/15/19; Jan 15, 2019; or 2019-01-15). Then when the user chooses a different date format, you can just pull all the records out of the database with the appropriate field, and not have to process anything. This will use a little more disk space, so that's a tradeoff.
Process in the background
If the user doesn't need the processed data right away, you can process it in the background with LiveCode's on idle handler.
Use an older version of LiveCode?
AxWald reports below that his version 6.7.10 of LiveCode runs twice as fast as his version 9.5.
Process large strings in slices
This section is probably obsolete. I wrote it before I learned that repeat for each is way faster than repeat with.... I'm retaining it to illustrate the concept.
As a string gets longer, it takes exponentially longer to process it. That is, a string that's 8 times longer than a small string doesn't take 8 times longer to process, it can take 65 times longer to process! For example, when I run a loop to process a certain number of lines, this is how long it takes:
Code: Select all
# of lines # of seconds
----------- -----------------
1000 0.06
2000 0.27
3000 0.58
4000 0.99
5000 1.50
6000 2.20
7000 3.00
8000 3.90
My solution here was to split up the string into 1000-line slices, process each slice, and then string them together. Doing that, I'm able to process 8000 records in only 0.9 seconds, over 4 times faster than processing a single 8000-line string.
Code: Select all
on mouseup
put field theHugeString into stringToProcess
put empty into field output
repeat with sets = 1 to 8
put line 1 to 1000 of stringToProcess into subset
delete line 1 to 1000 of stringToProcess
repeat with counter = 1 to the number of lines of subset
-- Do some processing here
end repeat
put subset after field output
end repeat
end mouseup
What if the order of the data didn't matter? Could you write each processed line to the *beginning* of a new string so that LiveCode wouldn't have to count from the top? I tried that, and for whatever reason, it doesn't save any time.