Page 1 of 2
Having the last word
Posted: Mon Aug 26, 2019 3:34 pm
by richmond62
Here's something I fell
foul of for the first time today . . .
I had a field called "fDATA2" containing some text:
"A man who had been soaked in water, and smothered in mud, and"
as one does

, and wanted to
chop its end off like this:
Code: Select all
on mouseUp
repeat until the last word of fld "fDATA2" is "water"
if the last word of fld "fDATA2" is "water" then
--do nix
else
delete the last word of fld "fDATA2"
end if
wait 20 ticks
end repeat
end mouseUp
and "blow me down", but it emptied the whole field . . .
Why, forbye?
Because
LiveCode did not 'see' the word "water", it did, however 'see' the word "water,", which
was a right pain in the bum because . . .
Any text analysis program I write to do this sort of thing will have to trawl its way through
a textField bunging spaces before any punctuation marks.
Re: Having the last word
Posted: Mon Aug 26, 2019 3:49 pm
by bogs
Why, forbye?
because with a period attached, "water" =/= "water."
Maybe instead of...
Code: Select all
repeat until the last word of fld "fDATA2" is "water"
it should be
Code: Select all
repeat until the last word of fld "fDATA2" contains "water"
(not tested)

Re: Having the last word
Posted: Mon Aug 26, 2019 3:50 pm
by richmond62
Probably . . . but while Thou wast being clever, I was mucking around like this:
Code: Select all
on mouseUp
put empty into fld "fDATA2"
put 1 into KOUNT
repeat until char KOUNT of fld "fDATA" is empty
switch char KOUNT of fld "fDATA"
case ","
put " ," after fld "fDATA2"
break
case "."
put " ." after fld "fDATA2"
break
case ";"
put " ;" after fld "fDATA2"
break
case ":"
put " :" after fld "fDATA2"
break
case "!"
put " !" after fld "fDATA2"
break
case "?"
put " ?" after fld "fDATA2"
break
case ")"
put " )" after fld "fDATA2"
break
case "("
put "( " after fld "fDATA2"
break
default
put char KOUNT of fld "fDATA" after fld "fDATA2"
end switch
add 1 to KOUNT
end repeat
end mouseUp
Re: Having the last word
Posted: Mon Aug 26, 2019 3:53 pm
by richmond62
contains
does work, BUT . . . it leaves a trailing punctuation mark.
Re: Having the last word
Posted: Mon Aug 26, 2019 4:01 pm
by bogs
Yes (tested it myself), BUT you can always at the end either ...
z.) ditch punctuation or
y.) simply grab characters 1 to 5 of the last word (and go on from there).
Either would be better than case'ing it to death.
Re: Having the last word
Posted: Mon Aug 26, 2019 4:25 pm
by richmond62
case'ing it to death
You have no concept of what that involves . . . my
Devawriter Pro contains something in the order
of 1000 switch statements, each containing about 3000 cases.
At present
Devawriter Pro is (very slowly)
going through a 'rationalisation' process so it is NOT such a "deadly CASE."

Re: Having the last word
Posted: Mon Aug 26, 2019 4:29 pm
by bogs
Ah, so like the title for this thread *should* have been ~
Raymond Burr as
Perry Mason in
"The case of the Devawriter" 
Re: Having the last word
Posted: Mon Aug 26, 2019 4:31 pm
by richmond62
Not really: Indic writing systems don't feature commas . . .
. . . they are far, far more bizarre.

Re: Having the last word
Posted: Mon Aug 26, 2019 4:37 pm
by jacque
For the original question, it would probably work if you use "trueword" instead of "word".
Re: Having the last word
Posted: Mon Aug 26, 2019 4:42 pm
by richmond62
Thanks: I'll give "trueword" a go.

Re: Having the last word
Posted: Wed Aug 28, 2019 8:12 pm
by richmond62
I haven't managed to get round to playing with "trueword" yet, but I have had some
fairly dirty thoughts . . .
1. How Anglo-Centric is "trueword"?
Well, let's try it with "De'ath" (this is a Huguenot name).
And, "just for fun" let's try it with "Жалба,",
And, because I am a sadistic old so-and-so "স্কুল" in the text:
"আমার নাম জন রিচমন্ড ম্যাথিউসন এবং আমি একজন স্কুলশিক্ষক যিনি বুলগেরিয়ায় থাকেন এবং কর্মরত।"
So . . . "trueword" works for "De'ath" and "Жалба,", but NOT for "স্কুল" (Bengali) because the word
is written with sandhi elision as "স্কুলশিক্ষক" where 'school' is elided with 'teacher'.
https://en.wikipedia.org/wiki/Sandhi
So, frankly, "trueword" is only sufficient for languages that employ European writing systems.
2. How good is trueword in texts that employ "different" punctuation system?
Well, for starters the whole thing would be useless for 'scriptura continua.' Leonardo da Vinci would have laughed.
I wonder how far "trueword" would get with the Greek άνω τελεία (as I don't remember
any Greek from when I was at school I'm not going to make a complete fool of myself here)?
¿I wonder about Spanish?
Well, well, well, it did OK with "¿Cuánto cuesta esa alfombra?" removing the "¿" from 'Cuánto."
That's impressive.
⸮
Re: Having the last word
Posted: Wed Aug 28, 2019 10:04 pm
by jacque
From the dictionary:
A trueWord is a word chunk, delimited by Unicode word breaks, as determined by the ICU Library. When there are no alphabetic or numeric characters between two word breaks, that string is not considered by LiveCode to be a trueWord.
The examples include Chinese and Russian, but I don't see any RTL languages there. ICU word breaks are defined here:
http://userguide.icu-project.org/boundaryanalysis
Edit: A more detailed explanation:
http://www.unicode.org/reports/tr29/#Word_Boundaries It talks about the difficulties for various languages. Hebrew, a RTL language, is apparently mostly compatible with the default rules but may require some special adjustments. Languages that do not normally include spaces or punctuation for word breaks present substantial problems.
Re: Having the last word
Posted: Mon Sep 02, 2019 7:46 pm
by richmond62
As 'trueword' cuts the mustard for the vast majority of writing systems
I wonder what the utility of 'word' is at all, and wonder whether it might not
be a good idea to transfer the functionality of 'trueword' to 'word' and then
remove 'trueword' altogether.
Re: Having the last word
Posted: Tue Sep 03, 2019 12:53 am
by FourthWorld
richmond62 wrote: ↑Mon Sep 02, 2019 7:46 pm
As 'trueword' cuts the mustard for the vast majority of writing systems
I wonder what the utility of 'word' is at all, and wonder whether it might not
be a good idea to transfer the functionality of 'trueword' to 'word' and then
remove 'trueword' altogether.
There was a long discussion about that on Use LiveCode list back when Mark Waddingham was putting Unicode in place.
Many options were discussed, but in the end it was determined that if what we now call trueWord were used with the "word" chuck type the impact to existing code would be vastly damaging.
So to preserve legacy code while allowing the new Unicode parsing, trueWord became its own chunk type.
And FWIW, after we all struggled with what to call this new chunk type, the winning suggestion came from yours truly.

Re: Having the last word
Posted: Tue Sep 03, 2019 7:15 am
by richmond62
Peut être 'd'être' ce n'est pas un mot!
"truewordOffset("d'être","Ce n'est pas tant d'être riche qui fait le bonheur, c'est de le devenir.") -- returns 5"