corrections to article on Unicode

sp27 · Post by **sp27** » Sun May 15, 2011 5:00 am

Devin Asay's article on Unicode at /spaces/lessons/buckets/1412/lessons/20441-Unicode needs these two corrections:

1) stated by Mark in response to my posting "word breaks in Russian Unicode text" of 14 May, 2011:

Code: Select all

set the unicodeText of fld "other" to word 1 to 2 of the unicodeText of fld "this"

should be:

Code: Select all

set the unicodeText of fld "other" to the unicodeText of word 1 to 2 of fld "this"

That's in Devin's section 4. Other uses of "word N of" should be corrected the same way. In other words, the word chunk (and apparently line chunk) has its own unicodeText property.

2) to be confirmed by someone with more experience than my first three day with LC:

At the end of Devin's section 2 (and elsewhere):

Code: Select all

put charToNum(char 1 to 2 of fld "russText")

should be:

Code: Select all

put charToNum(char 1 to 2 of the unicodeText of fld "russText")

HTH someone. I'm learning LC so I can port my Russian dicitonary from Adobe Director. Feel free to contact me, sp27@cornell.edu, if you think I may have stumbled on something that might save you time.

Slava

Mark · Post by **Mark** » Sun May 15, 2011 9:49 am

Hi Slava,

Does changing the chartonum function actually make a noticeable difference?

Mark

sp27 · Post by **sp27** » Sun May 15, 2011 6:54 pm

From my tests it seems to make a difference when the character is in the ANSII 128 to 255 range, although I'm not sure why. For example, I got different values returned for the "smart" quotes, depending on whether I asked for char 1 to 2 of field "X" or for char 1 to 2 of the unicodeText of field "X". For characters where a single byte cannot be evaluated, like decimal 1072, the result seems to be the same. But as I'm still very new to this, I may be overlooking something... Thanks, Mark!

Mark · Post by **Mark** » Mon May 16, 2011 10:03 am

Hi Slava,

Actually, I could understand that char 1 to 2 of a field is not the same as char 1 to 2 of the unicodeText of that field. For the first 256 ASCII characters, a NULL is included in char 1 to 2 of the unicodeText. All characters of the unicodeText consist of two bytes. The NULL may seem invisible, but it is still there.

Best,

Mark

LiveCode Forums.

corrections to article on Unicode

corrections to article on Unicode

Re: corrections to article on Unicode

Re: corrections to article on Unicode

Re: corrections to article on Unicode