Page 1 of 1

corrections to article on Unicode

Posted: Sun May 15, 2011 5:00 am
by sp27
Devin Asay's article on Unicode at /spaces/lessons/buckets/1412/lessons/20441-Unicode needs these two corrections:

1) stated by Mark in response to my posting "word breaks in Russian Unicode text" of 14 May, 2011:

Code: Select all

set the unicodeText of fld "other" to word 1 to 2 of the unicodeText of fld "this"
should be:

Code: Select all

set the unicodeText of fld "other" to the unicodeText of word 1 to 2 of fld "this"
That's in Devin's section 4. Other uses of "word N of" should be corrected the same way. In other words, the word chunk (and apparently line chunk) has its own unicodeText property.

2) to be confirmed by someone with more experience than my first three day with LC:

At the end of Devin's section 2 (and elsewhere):

Code: Select all

put charToNum(char 1 to 2 of fld "russText")
should be:

Code: Select all

put charToNum(char 1 to 2 of the unicodeText of fld "russText")
HTH someone. I'm learning LC so I can port my Russian dicitonary from Adobe Director. Feel free to contact me, sp27@cornell.edu, if you think I may have stumbled on something that might save you time.

Slava

Re: corrections to article on Unicode

Posted: Sun May 15, 2011 9:49 am
by Mark
Hi Slava,

Does changing the chartonum function actually make a noticeable difference?

Mark

Re: corrections to article on Unicode

Posted: Sun May 15, 2011 6:54 pm
by sp27
From my tests it seems to make a difference when the character is in the ANSII 128 to 255 range, although I'm not sure why. For example, I got different values returned for the "smart" quotes, depending on whether I asked for char 1 to 2 of field "X" or for char 1 to 2 of the unicodeText of field "X". For characters where a single byte cannot be evaluated, like decimal 1072, the result seems to be the same. But as I'm still very new to this, I may be overlooking something... Thanks, Mark!

Re: corrections to article on Unicode

Posted: Mon May 16, 2011 10:03 am
by Mark
Hi Slava,

Actually, I could understand that char 1 to 2 of a field is not the same as char 1 to 2 of the unicodeText of that field. For the first 256 ASCII characters, a NULL is included in char 1 to 2 of the unicodeText. All characters of the unicodeText consist of two bytes. The NULL may seem invisible, but it is still there.

Best,

Mark