Page 1 of 2

Having the last word

Posted: Mon Aug 26, 2019 3:34 pm
by richmond62
Here's something I fell foul of for the first time today . . .

I had a field called "fDATA2" containing some text:

"A man who had been soaked in water, and smothered in mud, and"

as one does 8) , and wanted to chop its end off like this:

Code: Select all

on mouseUp
repeat until the last word of fld "fDATA2" is "water"
      if the last word of fld "fDATA2" is "water" then
         --do nix
      else
         delete the last word of fld "fDATA2"
      end if
      wait 20 ticks
   end repeat
end mouseUp
and "blow me down", but it emptied the whole field . . .

Why, forbye?

Because LiveCode did not 'see' the word "water", it did, however 'see' the word "water,", which
was a right pain in the bum because . . .

Any text analysis program I write to do this sort of thing will have to trawl its way through
a textField bunging spaces before any punctuation marks.

Re: Having the last word

Posted: Mon Aug 26, 2019 3:49 pm
by bogs
Why, forbye?
because with a period attached, "water" =/= "water."

Maybe instead of...

Code: Select all

repeat until the last word of fld "fDATA2" is "water"
it should be

Code: Select all

repeat until the last word of fld "fDATA2" contains "water"
(not tested) :D

Re: Having the last word

Posted: Mon Aug 26, 2019 3:50 pm
by richmond62
Probably . . . but while Thou wast being clever, I was mucking around like this:

Code: Select all

on mouseUp
   put empty into fld "fDATA2"
   put 1 into KOUNT
   repeat until char KOUNT of fld "fDATA" is empty
      switch char KOUNT of fld "fDATA"
         case "," 
            put " ," after fld "fDATA2"
            break
         case "." 
            put " ." after fld "fDATA2"
            break
         case ";" 
            put " ;" after fld "fDATA2"
            break
         case ":" 
            put " :" after fld "fDATA2"
            break
         case "!" 
            put " !" after fld "fDATA2"
            break
         case "?" 
            put " ?" after fld "fDATA2"
            break
         case ")" 
            put " )" after fld "fDATA2"
            break
         case "(" 
            put "( " after fld "fDATA2"
            break
         default
            put char KOUNT of fld "fDATA" after fld "fDATA2"
      end switch
      add 1 to KOUNT
   end repeat
end mouseUp

Re: Having the last word

Posted: Mon Aug 26, 2019 3:53 pm
by richmond62
contains

does work, BUT . . . it leaves a trailing punctuation mark.

Re: Having the last word

Posted: Mon Aug 26, 2019 4:01 pm
by bogs
Yes (tested it myself), BUT you can always at the end either ...
z.) ditch punctuation or
y.) simply grab characters 1 to 5 of the last word (and go on from there).

Either would be better than case'ing it to death.

Re: Having the last word

Posted: Mon Aug 26, 2019 4:25 pm
by richmond62
case'ing it to death
You have no concept of what that involves . . . my
Devawriter Pro contains something in the order
of 1000 switch statements, each containing about 3000 cases. 8)

At present Devawriter Pro is (very slowly)
going through a 'rationalisation' process so it is NOT such a "deadly CASE." :D

Re: Having the last word

Posted: Mon Aug 26, 2019 4:29 pm
by bogs
Ah, so like the title for this thread *should* have been ~
Raymond Burr as Perry Mason in "The case of the Devawriter" :P

Re: Having the last word

Posted: Mon Aug 26, 2019 4:31 pm
by richmond62
Not really: Indic writing systems don't feature commas . . .

. . . they are far, far more bizarre. 8)

Re: Having the last word

Posted: Mon Aug 26, 2019 4:37 pm
by jacque
For the original question, it would probably work if you use "trueword" instead of "word".

Re: Having the last word

Posted: Mon Aug 26, 2019 4:42 pm
by richmond62
Thanks: I'll give "trueword" a go. :D

Re: Having the last word

Posted: Wed Aug 28, 2019 8:12 pm
by richmond62
I haven't managed to get round to playing with "trueword" yet, but I have had some
fairly dirty thoughts . . .

1. How Anglo-Centric is "trueword"?

Well, let's try it with "De'ath" (this is a Huguenot name).

And, "just for fun" let's try it with "Жалба,",

And, because I am a sadistic old so-and-so "স্কুল" in the text:

"আমার নাম জন রিচমন্ড ম্যাথিউসন এবং আমি একজন স্কুলশিক্ষক যিনি বুলগেরিয়ায় থাকেন এবং কর্মরত।"

So . . . "trueword" works for "De'ath" and "Жалба,", but NOT for "স্কুল" (Bengali) because the word
is written with sandhi elision as "স্কুলশিক্ষক" where 'school' is elided with 'teacher'.

https://en.wikipedia.org/wiki/Sandhi

So, frankly, "trueword" is only sufficient for languages that employ European writing systems.

2. How good is trueword in texts that employ "different" punctuation system?

Well, for starters the whole thing would be useless for 'scriptura continua.' Leonardo da Vinci would have laughed.

I wonder how far "trueword" would get with the Greek άνω τελεία (as I don't remember
any Greek from when I was at school I'm not going to make a complete fool of myself here)?

¿I wonder about Spanish?

Well, well, well, it did OK with "¿Cuánto cuesta esa alfombra?" removing the "¿" from 'Cuánto."

That's impressive.


Re: Having the last word

Posted: Wed Aug 28, 2019 10:04 pm
by jacque
From the dictionary:
A trueWord is a word chunk, delimited by Unicode word breaks, as determined by the ICU Library. When there are no alphabetic or numeric characters between two word breaks, that string is not considered by LiveCode to be a trueWord.
The examples include Chinese and Russian, but I don't see any RTL languages there. ICU word breaks are defined here: http://userguide.icu-project.org/boundaryanalysis

Edit: A more detailed explanation: http://www.unicode.org/reports/tr29/#Word_Boundaries It talks about the difficulties for various languages. Hebrew, a RTL language, is apparently mostly compatible with the default rules but may require some special adjustments. Languages that do not normally include spaces or punctuation for word breaks present substantial problems.

Re: Having the last word

Posted: Mon Sep 02, 2019 7:46 pm
by richmond62
As 'trueword' cuts the mustard for the vast majority of writing systems
I wonder what the utility of 'word' is at all, and wonder whether it might not
be a good idea to transfer the functionality of 'trueword' to 'word' and then
remove 'trueword' altogether.

Re: Having the last word

Posted: Tue Sep 03, 2019 12:53 am
by FourthWorld
richmond62 wrote:
Mon Sep 02, 2019 7:46 pm
As 'trueword' cuts the mustard for the vast majority of writing systems
I wonder what the utility of 'word' is at all, and wonder whether it might not
be a good idea to transfer the functionality of 'trueword' to 'word' and then
remove 'trueword' altogether.
There was a long discussion about that on Use LiveCode list back when Mark Waddingham was putting Unicode in place.

Many options were discussed, but in the end it was determined that if what we now call trueWord were used with the "word" chuck type the impact to existing code would be vastly damaging.

So to preserve legacy code while allowing the new Unicode parsing, trueWord became its own chunk type.

And FWIW, after we all struggled with what to call this new chunk type, the winning suggestion came from yours truly. :)

Re: Having the last word

Posted: Tue Sep 03, 2019 7:15 am
by richmond62
Peut être 'd'être' ce n'est pas un mot!

"truewordOffset("d'être","Ce n'est pas tant d'être riche qui fait le bonheur, c'est de le devenir.") -- returns 5"