LiveCode Forums.

Posted: **Thu Feb 23, 2017 10:21 pm**

I'm not a complete beginner but hope this is a simple question.

I have a script that checks whether each line in a variable (nuData) containing tab-delimited data is present or not in another variable (oldData) that also contains lines of tab-delimited records. If the record appears in both variables the script deletes it from nuData. The records look like this.

60 Meters
15[tab][tab][tab]6.59[tab][tab] [tab]John Smith[tab]USA[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan
15[tab][tab][tab]6.59[tab][tab] [tab]Tom Doe[tab]GBR[tab]6 Mar 95[tab]175/73[tab]1[tab][tab]Iowa City IA[tab]20 Jan

The data are numbered lists on a website that change over time so I need to make the first item (itemDelimiter is tab) empty before checking against oldData.

My script works as follows. I've used this approach many times but this time it's REAALLLY slow; much slower than I'd expect even considering that the variables contain many lines. Platform is Mac OSX.

put field "A" into nuData
put field "B" into oldData
set itemDelimiter to tab
repeat with lineCt = (the number of lines of nuData) down to 2
put line lineCt of nuData into tRecord
if the number of items of tRecord > 1 then -- only check records with 1+ items
put empty into item 1 of tRecord
if offset(tRecord, oldData) > 0 then
delete line lineCt of nuData
end if -- record occurs in both variables
end if -- record contains 1+ items
end repeat
put nuData into field "A"

Is there a faster way to do this? Perhaps something using filter? My reservation is that I've had other scripts where filter with/without simply doesn't work. I am finding that a check of a 50,000-line file takes on the order of half an hour, which doesn't square with the speed I usually get from routines like this.

Thanks in advance,

Sieg

Posted: **Thu Feb 23, 2017 10:57 pm**

Hi Sieg,
It seems "repeat for each" is faster than "repeat with"
Best regards
Jean-Marc

Posted: **Fri Feb 24, 2017 1:10 am**

Thank you, Jean-Marc.

Great info to have but now I'm thinking there is simply something wrong with this stack. I ran a test using "repeat for each" on 194 lines of data. It took a couple minutes. Normally even with "repeat with" it should run through a short list like that in the blink of an eye. I'll try setting it up in a fresh stack.

Sieg

Posted: **Fri Feb 24, 2017 2:21 am**

You could moreover use arrays and try the following.

Code: Select all

on mouseUp
   put the millisecs into m1
   lock screen; lock messages
   set cursor to watch
   put line 2 to -1 of field "A" into nuData
   put field "B" into oldData
   set itemdel to tab
   repeat for each line L in oldData
      if L is empty then next repeat
      put 1 into tCount[item 2 to -1 of L]
   end repeat
   put 0 into x
   repeat for each line L in nuData
      add 1 to x
      if the num of items of L <= 1 then next repeat
      if tCount[item 2 to -1 of L] <> 1 then
         put L into tRest[x] -- incl. first item
      end if
   end repeat
   set itemdel to comma
   ## sorting retains the original order of nudata
   put the keys of tRest into ks; sort ks numeric
   repeat for each line k in ks
      put cr & tRest[k] after tRest2
   end repeat
   put (line 1 of fld "A") & tRest2 into fld "A"
   put the millisecs - m1 into fld "timing"
   unlock screen; unlock messages
end mouseUp

Please compare first the result to your's. Hope it's exactly your's.
Would be moreover interesting to know the difference in used time with the method above.

p.s. You can't delete lines in a repeat for each loop (which enumerates upwards) of these lines.

LiveCode Forums.

faster check than offset(x,variable)?

faster check than offset(x,variable)?

Re: faster check than offset(x,variable)?

Re: faster check than offset(x,variable)?

Re: faster check than offset(x,variable)?