OK. I give up (Word delimiting)

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Wed Aug 22, 2018 5:45 pm

Richard.

Yes, changing my parsing thinking from "the number of words" to "the number of trueWords" makes this go away.

Thanks.

Craig

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Wed Aug 22, 2018 6:05 pm

Spoke too soon.

The trueWord keyword carries, I suppose, a certain amount of unicode er, baggage. Asking for either the number of words, or the number of trueWords in the following string:

Code: Select all

(2 Lenses @ ") [L: 250
gives 4 in both cases. Asking for trueWord 4 gives 250. Asking for word 4 gives ") [L: 250. Something is trumping the very real spaces in that string. That snippet:

Code: Select all

") [L: 
although it contains spaces, is seen both as a single word and a single trueWord. I am trying, in this case to parse out the string "). It seems that simple spaces just do not cut it. I would have thought that the first task of trueWords is to always yield to spaces.

It is the quote that is the center of this issue.
Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: OK. I give up

Post by FourthWorld » Wed Aug 22, 2018 6:14 pm

trueWord uses the natural language rules now available to us in the IBM Unicode library to parse out what are usually true words.

This generally works well when parsing strings containing natural language.

But it seems of little or no value when parsing strings of arbitrary characters not at all like natural language.

If you want to parse by spaces only, maybe set the itemDel to space and parse by items.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Wed Aug 22, 2018 7:06 pm

Richard.

I thought I was the only one odd enough to set the itemDel to space. 8)

But in this case, I need to catch certain character strings, and drill into them.

I will find a workaround, but there still remains a glitch in either the way we think about words, or the way the dictionary describes them.

Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: OK. I give up

Post by FourthWorld » Wed Aug 22, 2018 7:56 pm

It seems like the only limitation the Dictionary that it doesn't discuss the edge case of no closing quote. If you can turn up how the HC team described that and include it in the report that'll help them avoid one more creative writing exercise.

But for the implementation itself, "word" seems very conformant with the Mother Tongue. And when we need something beyond what HC could do, we now also have trueWord.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

MaxV
Posts: 1580
Joined: Tue May 28, 2013 2:20 pm
Contact:

Re: OK. I give up

Post by MaxV » Wed Aug 22, 2018 8:23 pm

You can create your function for what you intend for word.
Just use repeat for each char... and with switch / case you can create any combination. :D
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Wed Aug 22, 2018 9:28 pm

Richard.

The most recent (!) HC Script Language guide gives spaces and returns as delimiters. Note that tabs are not mentioned, and for good reason. In HC tabs do not delimited words.

I will file a report to QCC complaining about the single quote issue, and see what they say.

FWIW, I simply changed the quote in the string to ASCII 210, and all is well. That character looks just fine when giving a measurement in inches.

Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: OK. I give up

Post by FourthWorld » Wed Aug 22, 2018 9:47 pm

I think I missed something. I thought the issue was about a string with an opening quote but no closing quote. How did tabs enter into this?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Wed Aug 22, 2018 10:02 pm

Richard,

No, you had mentioned:
But for the implementation itself, "word" seems very conformant with the Mother Tongue.
I tried all this in HC, discovered it also has the single quote malaise, and, unlike LC, HC does not support tabs as word delimiters. So HC is different is all I meant. And the fact that LC includes tabs as word delimiters ought to have broken some HC stacks that were ported over.

Craig

bogs
Posts: 5480
Joined: Sat Feb 25, 2017 10:45 pm

Re: OK. I give up

Post by bogs » Wed Aug 22, 2018 10:19 pm

Figures, I finally get back to this, and Craig answered his own question :roll:

Well, I'm posting a pic of my homework ANYWAY, just BECAUSE :P
SheepShaver_004.png
Hypercard homework :P
Image

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Wed Aug 22, 2018 10:43 pm

Filed report # 21513

Craig

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Thu Aug 23, 2018 2:34 pm

Bug confirmed.

Craig

bogs
Posts: 5480
Joined: Sat Feb 25, 2017 10:45 pm

Re: OK. I give up

Post by bogs » Thu Aug 23, 2018 5:08 pm

So, bug confirmed eh? Good job !

I added a 3rd box and eliminated any quotes, the results turned out quite different as you probably already knew, but closer to what I would suspect they should be.
Selection_003.png
No quote changes...
Selection_003.png (9.94 KiB) Viewed 7315 times
Image

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: OK. I give up

Post by FourthWorld » Thu Aug 23, 2018 8:25 pm

dunbarx wrote:
Wed Aug 22, 2018 10:43 pm
Filed report # 21513

Craig
Thanks. I don't mind seeing this changed; it's such an edge case I don't mind either way. But IIRC there was an earlier post confirming that the behavior we see in LC matches HC's behavior - is that correct? If so, I wonder if some old code may break as a result of improving this beyond what HC did.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10323
Joined: Wed May 06, 2009 2:28 pm

Re: OK. I give up

Post by dunbarx » Thu Aug 23, 2018 8:55 pm

Richard.

It may be a marginal issue, but if LC considers tabs to be word delimiters and HC does not, the number of words in a particular processed string may give unexpected results. I am constantly taking strings with tabs and counting words. HC would not have condoned that.

Craig

Post Reply