I'm not a complete beginner but hope this is a simple question.
I have a script that checks whether each line in a variable (nuData) containing tab-delimited data is present or not in another variable (oldData) that also contains lines of tab-delimited records. If the record appears in both variables the script deletes it from nuData. The records look like this.
60 Meters
15[tab][tab][tab]6.59[tab][tab] [tab]John Smith[tab]USA[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
15[tab][tab][tab]6.59[tab][tab] [tab]Tom Doe[tab]GBR[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
The data are numbered lists on a website that change over time so I need to make the first item (itemDelimiter is tab) empty before checking against oldData.
My script works as follows. I've used this approach many times but this time it's REAALLLY slow; much slower than I'd expect even considering that the variables contain many lines. Platform is Mac OSX.
put field "A" into nuData
put field "B" into oldData
set itemDelimiter to tab
repeat with lineCt = (the number of lines of nuData) down to 2
put line lineCt of nuData into tRecord
if the number of items of tRecord > 1 then -- only check records with 1+ items
put empty into item 1 of tRecord
if offset(tRecord, oldData) > 0 then
delete line lineCt of nuData
end if -- record occurs in both variables
end if -- record contains 1+ items
end repeat
put nuData into field "A"
Is there a faster way to do this? Perhaps something using filter? My reservation is that I've had other scripts where filter with/without simply doesn't work. I am finding that a check of a 50,000-line file takes on the order of half an hour, which doesn't square with the speed I usually get from routines like this.
Thanks in advance,
Sieg
faster check than offset(x,variable)?
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
-
- VIP Livecode Opensource Backer
- Posts: 29
- Joined: Mon Apr 09, 2012 8:53 pm
Re: faster check than offset(x,variable)?
Hi Sieg,
It seems "repeat for each" is faster than "repeat with"
Best regards
Jean-Marc
It seems "repeat for each" is faster than "repeat with"
Best regards
Jean-Marc
https://alternatic.ch
-
- VIP Livecode Opensource Backer
- Posts: 29
- Joined: Mon Apr 09, 2012 8:53 pm
Re: faster check than offset(x,variable)?
Thank you, Jean-Marc.
Great info to have but now I'm thinking there is simply something wrong with this stack. I ran a test using "repeat for each" on 194 lines of data. It took a couple minutes. Normally even with "repeat with" it should run through a short list like that in the blink of an eye. I'll try setting it up in a fresh stack.
Sieg
Great info to have but now I'm thinking there is simply something wrong with this stack. I ran a test using "repeat for each" on 194 lines of data. It took a couple minutes. Normally even with "repeat with" it should run through a short list like that in the blink of an eye. I'll try setting it up in a fresh stack.
Sieg
Re: faster check than offset(x,variable)?
You could moreover use arrays and try the following.
Please compare first the result to your's. Hope it's exactly your's.
Would be moreover interesting to know the difference in used time with the method above.
p.s. You can't delete lines in a repeat for each loop (which enumerates upwards) of these lines.
Code: Select all
on mouseUp
put the millisecs into m1
lock screen; lock messages
set cursor to watch
put line 2 to -1 of field "A" into nuData
put field "B" into oldData
set itemdel to tab
repeat for each line L in oldData
if L is empty then next repeat
put 1 into tCount[item 2 to -1 of L]
end repeat
put 0 into x
repeat for each line L in nuData
add 1 to x
if the num of items of L <= 1 then next repeat
if tCount[item 2 to -1 of L] <> 1 then
put L into tRest[x] -- incl. first item
end if
end repeat
set itemdel to comma
## sorting retains the original order of nudata
put the keys of tRest into ks; sort ks numeric
repeat for each line k in ks
put cr & tRest[k] after tRest2
end repeat
put (line 1 of fld "A") & tRest2 into fld "A"
put the millisecs - m1 into fld "timing"
unlock screen; unlock messages
end mouseUp
Would be moreover interesting to know the difference in used time with the method above.
p.s. You can't delete lines in a repeat for each loop (which enumerates upwards) of these lines.
shiftLock happens