faster check than offset(x,variable)?
Posted: Thu Feb 23, 2017 10:21 pm
I'm not a complete beginner but hope this is a simple question.
I have a script that checks whether each line in a variable (nuData) containing tab-delimited data is present or not in another variable (oldData) that also contains lines of tab-delimited records. If the record appears in both variables the script deletes it from nuData. The records look like this.
60 Meters
15[tab][tab][tab]6.59[tab][tab] [tab]John Smith[tab]USA[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
15[tab][tab][tab]6.59[tab][tab] [tab]Tom Doe[tab]GBR[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
The data are numbered lists on a website that change over time so I need to make the first item (itemDelimiter is tab) empty before checking against oldData.
My script works as follows. I've used this approach many times but this time it's REAALLLY slow; much slower than I'd expect even considering that the variables contain many lines. Platform is Mac OSX.
put field "A" into nuData
put field "B" into oldData
set itemDelimiter to tab
repeat with lineCt = (the number of lines of nuData) down to 2
put line lineCt of nuData into tRecord
if the number of items of tRecord > 1 then -- only check records with 1+ items
put empty into item 1 of tRecord
if offset(tRecord, oldData) > 0 then
delete line lineCt of nuData
end if -- record occurs in both variables
end if -- record contains 1+ items
end repeat
put nuData into field "A"
Is there a faster way to do this? Perhaps something using filter? My reservation is that I've had other scripts where filter with/without simply doesn't work. I am finding that a check of a 50,000-line file takes on the order of half an hour, which doesn't square with the speed I usually get from routines like this.
Thanks in advance,
Sieg
I have a script that checks whether each line in a variable (nuData) containing tab-delimited data is present or not in another variable (oldData) that also contains lines of tab-delimited records. If the record appears in both variables the script deletes it from nuData. The records look like this.
60 Meters
15[tab][tab][tab]6.59[tab][tab] [tab]John Smith[tab]USA[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
15[tab][tab][tab]6.59[tab][tab] [tab]Tom Doe[tab]GBR[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
The data are numbered lists on a website that change over time so I need to make the first item (itemDelimiter is tab) empty before checking against oldData.
My script works as follows. I've used this approach many times but this time it's REAALLLY slow; much slower than I'd expect even considering that the variables contain many lines. Platform is Mac OSX.
put field "A" into nuData
put field "B" into oldData
set itemDelimiter to tab
repeat with lineCt = (the number of lines of nuData) down to 2
put line lineCt of nuData into tRecord
if the number of items of tRecord > 1 then -- only check records with 1+ items
put empty into item 1 of tRecord
if offset(tRecord, oldData) > 0 then
delete line lineCt of nuData
end if -- record occurs in both variables
end if -- record contains 1+ items
end repeat
put nuData into field "A"
Is there a faster way to do this? Perhaps something using filter? My reservation is that I've had other scripts where filter with/without simply doesn't work. I am finding that a check of a 50,000-line file takes on the order of half an hour, which doesn't square with the speed I usually get from routines like this.
Thanks in advance,
Sieg