Compare comma-delimited strings

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
RossG
Posts: 247
Joined: Thu Jan 08, 2015 7:38 am

Compare comma-delimited strings

Post by RossG » Wed Sep 28, 2016 11:37 pm

Other than comparing item-by-item is there a way to find
duplicates?

My prog produces sets of eight numbers from a larger
string and often produces duplicates so I might have

"1,2,3,4,5,6,7,8"
"1,2,3,4,5,6,7,8"

Could delete the commas and use the "=" function.

Any other ways?
Is age an excuse? Eighty-four and counting.
Programming powered by coffee.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Compare comma-delimited strings

Post by FourthWorld » Wed Sep 28, 2016 11:50 pm

You can still use "=" on the whole string, commas and all, as it'll treat it as a string.

If you want to reduce the list to only unique strings this would work:

Code: Select all

repeat for each line tLine in tList
  put tLine into tArray[tLine]
end repeat
put the keys of tArray into tUniqueList
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

RossG
Posts: 247
Joined: Thu Jan 08, 2015 7:38 am

Re: Compare comma-delimited strings

Post by RossG » Thu Sep 29, 2016 12:26 am

Richard
Thanks for those "magic" words.
I made a test stack and it didn't seem
to like the commas.
After reading your reply I tried it again
and it worked.
Darnedest thing.
Is age an excuse? Eighty-four and counting.
Programming powered by coffee.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Compare comma-delimited strings

Post by FourthWorld » Thu Sep 29, 2016 12:35 am

It's like any tech support issue, Ross: the moment you get someone to help you the problem no longer shows itself. :)
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Compare comma-delimited strings

Post by AxWald » Thu Sep 29, 2016 7:51 pm

Hi,

Fun fact: Above 24 integers in the line you're faster doing an SHA1 hash, and comparing this!

Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Compare comma-delimited strings

Post by FourthWorld » Thu Sep 29, 2016 8:31 pm

Ax, I'm not following: when comparing each line in a collection to look for duplicates, how would adding a call to the computationally-intensive SHA1digest function outperform the simpler "=" operator?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Compare comma-delimited strings

Post by AxWald » Fri Sep 30, 2016 9:56 am

Hi Richard,

I made a test:
When hashing the lines the computation time/ line stays roughly the same.
The computation time w/o hash rises the longer the line is.
At ~24 items in the line the times are equal.
Above 24 items the hashing is faster :)

See the attached q&d demo stack - change the script of the "load" btn to get different values for comparison. And you may manually add double lines, the pseudo-random part works too well often ...

Have fun!

PS: Assume the reason is that the hash always has the same length (40 chars) ...

Edit: For LC 8 the "magic number" is about 10 higher (~34), seems text comparison is better there. My stack was initially tested with 6.7.10.
Attachments
DoubleFinder.zip
A simple stack to test it
(1.55 KiB) Downloaded 222 times
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

RossG
Posts: 247
Joined: Thu Jan 08, 2015 7:38 am

Re: Compare comma-delimited strings

Post by RossG » Fri Sep 30, 2016 11:59 pm

Attached is my test stack with a file of number sets.

Any better solutions?
Attachments
Sets Deleter.zip
(1.52 KiB) Downloaded 221 times
Is age an excuse? Eighty-four and counting.
Programming powered by coffee.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10099
Joined: Fri Feb 19, 2010 10:17 am

Re: Compare comma-delimited strings

Post by richmond62 » Sat Oct 01, 2016 7:37 am

If Livecode is "feeling funny" about a comma-delimited list you could always replace the commas with something else:
StriptheW.png
Attachments
Strip the Willow.livecode.zip
Here's the stack.
(806 Bytes) Downloaded 214 times

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Compare comma-delimited strings

Post by FourthWorld » Sat Oct 01, 2016 5:29 pm

AxWald wrote:When hashing the lines the computation time/ line stays roughly the same.
The computation time w/o hash rises the longer the line is.
At ~24 items in the line the times are equal.
Above 24 items the hashing is faster :)
Thanks for that test stack. Makes sense when using strings longer than the ones Ross showed, since SHA1 will reduce the string to a 20-byte value, so once we get past a certain length the overhead of SHA1digest is more than offset by the savings in the shorter comparisons.

There may be some variance due to CPU speed and/or instruction set features, as here I get a slightly slower score for the hash option at 25 items, but bumping that up to 50 shows hashing the clear winner.

While I had your handy test stack in hand I got curious about performance differences across recent LC versions, discovering that LC v8 is roughly on par with v6 and much faster than v7 (the latter isn't surprising given an optimization for lineoffset and some other operations in v8 that allows for more specialized handling of different delimiter lengths with Unicode than was first implemented when Unicode premiered in LCv7).

FWIW here are my results, running under Ubuntu 14.04 on a Haswell G3220 @3 GHz:

LC v6.7
-----------------------
25 items - String - 0 hits: 220 ms
25 items - Hash - 0 hits: 236 ms
25 items - String - 26 hits: 231 ms
25 items - Hash - 26 hits: 251 ms
--
50 items - String - 0 hits: 428 ms
50 items - Hash - 0 hits: 235 ms
50 items - String - 7 hits: 426 ms
50 items - Hash - 7 hits: 230 ms

LC v7.0.4
-----------------------
25 items - String - 0 hits: 357 ms
25 items - Hash - 0 hits: 375 ms
25 items - String - 26 hits: 419 ms
25 items - Hash - 26 hits: 427 ms
--
50 items - String - 0 hits: 713 ms
50 items - Hash - 0 hits: 374 ms
50 items - String - 7 hits: 728 ms
50 items - Hash - 7 hits: 383 ms

LC v8.1.1 RC1
-----------------------
25 items - String - 0 hits: 192 ms
25 items - Hash - 0 hits: 208 ms
25 items - String - 26 hits: 243 ms
25 items - Hash - 26 hits: 254 ms
--
50 items - String - 0 hits: 432 ms
50 items - Hash - 0 hits: 203 ms
50 items - String - 7 hits: 451 ms
50 items - Hash - 7 hits: 224 ms
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Compare comma-delimited strings

Post by AxWald » Sun Oct 02, 2016 9:49 pm

Hi,
FourthWorld wrote:[...] discovering that LC v8 is roughly on par with v6 [...]
actually it's a bit faster, even faster than 6.5.1 ...

Btw., playing on another machine I found that that the "magic number" actually is machine dependent. And something strange:

LC 8.02 (stable) & 8.1.1 (rc1) give identical results, LC 8.1 (stable) is identical in string compare too, but significant slower (+ 20%) while hashing.

Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

Post Reply