Performance badly degrading with growing string variable

AxWald · Post by **AxWald** » Tue May 24, 2016 10:43 pm

Hi,

Thierry wrote:First, comment your "put myCounter" and *THIS* will speed up your loop.

OMG - I had that in initially for the long runs, when 10min waiting time was to expect for the big file. And left it in, for comparability. Missing that I'd not need it anymore, after all the time got cut down from 813392 ms to 250805 ms. I tested again (the last, fast version without the "put myCounter"):

6.7.10:
5000 replacements / 79 ms; 0.0158 ms/record
67648 replacements / 1088 ms; 0.016083 ms/record

8.0:
5000 replacements / 156 ms; 0.0312 ms/record
67648 replacements / 2098 ms; 0.031013 ms/record

Now isn't that fast? The (pre-cleaned) big file here is 12.848.647 Bytes, has 67.648 lines/records and in each 3 of it's 38 fields are modified. Who needs Assembler when you have LC? ;-)))

Thierry wrote:Second, I think there is a slight error in your date conversion (item 12).
You need to unquote and quote again the date.

Result looks like intended. For reference, here the first record in its 3 incarnations:

Code: Select all

Original exported from MySQL:
"172";"30";"55";"1";"A product name    xyz";"";"";"0000055";"6";"3,8500";"";"2014-10-02";"1";"2";"angelegt";"";"";"2014-10-05 19:14:29";"0,00";"0,00";"0,00";"0";"0";"0,00";\N;"0,00";"0,00";"0,00";"0,00";"0,00";"";"0,00";"0";"0";"0";"";"0";"0"

Pre-Cleaned (numtochar(17) changed to "#" for readability):
172#30#55#1#A product name    xyz###0000055#6#3,8500##2014-10-02#1#2#angelegt###2014-10-05 19:14:29#0,00#0,00#0,00#0#0#0,00##0,00#0,00#0,00#0,00#0,00##0,00#0#0#0##0#0

Fully done ([#]changed fields[#] marked like this):
172;30;55;1;[#]A product name xyz[#];;;0000055;6;[#]3.8500[#];;[#]10/02/14[#];1;2;angelegt;;;2014-10-05 19:14:29;0,00;0,00;0,00;0;0;0,00;;0,00;0,00;0,00;0,00;0,00;;0,00;0;0;0;;0;0

No quotes at all anymore, ready for further processing. These 3 changes are just a demo of what is often done, in real work I had omitted most of the unused fields, and hand-tailored the output to my needs. But for a speed test this should be sufficient. Me too lazy :)

Thanks, and have fun!

Appendum: Times for 6.7.10 32-bit in LUbuntu-64 LTS:
5000 replacements / 55 ms; 0.011 ms/record
67648 replacements / 739 ms; 0.010924 ms/record
Whow!

Havanna · Post by **Havanna** » Thu Jun 09, 2016 1:40 pm

AxWald wrote:
I knew it. Thinking helps sometimes. New code:
New times, with direct output to a file:
5000 replacements / 18351 ms; 3.6702 ms/record
67648 replacements / 250805 ms; 3.707501 ms/record

Voilà! Degradation removed! \o/ \o/ \o/

Thx AxWald for your extensive testing!
I was used to think of files as very slow in comarison to memory.
I suspect the amazing effect has something to do with LC and the OS caching file operations very efficiently …
That helps a lot in performance optimization!

Havanna · Post by **Havanna** » Thu Jun 09, 2016 1:58 pm

FourthWorld wrote: @Havanna: it may be helpful if you're in a position to provide a link to the data and an example of the desired output. I can't help but wonder if there's a way to do replacements across the entire data set without using any loops at all.

Richard, you can find one of these files at http://ellvis.de/LC/csv.zip
The necessary changes in this case:

field 10 (AufgabepunktFluegelstellung) must keep the comma, but if it contains a '+' the data splits there, first part remains, second part goes to an additional field that contains 0 in all records witout '+'

Other table data contain string fields that must retain commas as well as float values where comma (german) has to be replaced by a point.

That does not look easy to do without loops …

FourthWorld · Post by **FourthWorld** » Thu Jun 09, 2016 3:28 pm

Perhaps I don't understand the requirements well, but it would seem the filter command could at least narrow things down considerably in one move:

Code: Select all

filter tRawDataFromFile with "*+*"

AxWald · Post by **AxWald** » Tue Jun 14, 2016 11:10 am

Hi,

Havanna wrote:you can find one of these files at http://ellvis.de/LC/csv.zip

I gave this to my trusty old MS Access 2K3, it ate it with a bit of chewing, and the resulting data definition looks like this:

: the structure as recognized by Access; 56-ADD_tabledef.png (13.89 KiB) Viewed 5150 times

Is this correct?

Havanna wrote:The necessary changes in this case:
field 10 (AufgabepunktFluegelstellung) must keep the comma, but if it contains a '+' the data splits there, first part remains, second part goes to an additional field that contains 0 in all records witout '+'

Sorted by [AufgabepunktFluegelstellung] it looks like this:

: the table data (part of ...)

As I understand you, the "+" should vanish, and the "11,5" should go to another field. Which one?

Havanna wrote:That does not look easy to do without loops …

I may be plain wrong, but to my understanding there's no easy way to work with table data & conditional formatting/ changing of these without loops at all - this isn't plain char replacing where you just run through a file & do some pointer arithmetic. You need to know what field you're in, and what line you're in, and to know, depending of where you are, what to do ...
Maybe you can do this, writing half a book of code, and then the compiled output would be rather equal to what you could achieve with optimized loops ;-)

Anyways, I'd rather work with what we have, and this are the many great ways LC offers us to handle chunks. As shown above, with a bit of optimization the results can be lightning fast, so doing it in a multi-step way wouldn't hurt. First sanitizing data for quick access, then changing it as desired (even if more then 1 step is required), then formatting it for nice output.

Such stuff I'm doing all day long, so I know it can be done.
(Actually it's about 25% of what I do in LC. Other 25% are concatenating strange chunks of text to make SQL statements of it, and the remaining 50% is the boring and gruesome grind of making the GUI ...)

Give us more input. And have fun!

("An scheena Dag no!" nach Ellwanga!)

Havanna · Post by **Havanna** » Wed Jun 22, 2016 4:26 pm

AxWald wrote:I gave this to my trusty old MS Access 2K3

Your trusty A has done well, Access is where the data came from

just fyi the part after + goes to a new field that is introduced when processing the first line (csv Fields).
But that is not a problem really, I prefer doing things in steps and have done so, processing lines and fields data runs fine.

It was just this exponential time consumption when appending snips to large strings that got me confused. Quite a surprise to see your results when writing to file instead of memory

I could cut processing to similar times by splitting the string appending (yes, another loop) add my snips to string 'medium' for 5000 lines, then append 'medium' to 'result' and empty 'medium' then go around again.

The no loop question was raised by Richard, but I also think it will not easily work with this kind of data.

Thanx for your considerations
(and "An scheena Dag no!" aus Ellwanga!)

LiveCode Forums.

Performance badly degrading with growing string variable

Re: Performance badly degrading with growing string variable

Re: Performance badly degrading with growing string variable

Re: Performance badly degrading with growing string variable

Re: Performance badly degrading with growing string variable

Re: Performance badly degrading with growing string variable

Re: Performance badly degrading with growing string variable