Page 1 of 1

encoding a var versus UTF8

Posted: Mon Mar 24, 2014 3:02 pm
by atout66
Hi to all, me again :)

I don't understand how LC encode data...
What I do is I load an HTML page, keep it in a var called <webData> and read the source code. No problem.
The problem occurs like that:
I isolate a special part of <webData> in a var called <leMot>.
<leMot> shown in a field in UTF8 gives this result: <sa.ʁa.bɑ̃d> (sarabande). OK.
But if I ask LC to show me this <leMot> in the message box during debugging, I get : <sa.ʁa.bɑ̃d>.

May be you already guess what this means...
If I want to read the char length(leMot) - 1, I get <ƒ> when I expect <ɑ̃>, and the number of chars is also different !

Any idea how to solve this ?
Thanks in advance.

Re: encoding a var versus UTF8

Posted: Fri Mar 28, 2014 9:14 am
by atout66
Hi to all,

I answer to myself just to let know whose who could be interrested by that kind of troubles how I solved it.
I must say I'm not pround of myself, but I'm a beginner... so if some of you have a better idea, they're welcome :wink:

Instead of to look inside the var <leMot> directly in the script, i keep it into a field called "toCompare".
I've build a list of fields. Each of them as a special unicode text inside. In this example a field called "laRef01" contains the unicode char <ɑ̃>. And so on.

Then I start a loop throught each field ("laRef01", "laRef02"), compare with the field "toCompare", and that way, I can be sure which unicode char I'm dealing with.
Just to test:

Code: Select all

put the unicodePlainText of last char of field "toCompare" into leTest
   get the unicodePlainText of  last char of fld "laRef" -- <last char> is needed, even if the field contains only one char, because LC add a cr char at the end, and the test return FALSE.
   if leTest = IT then
      answer "They are similar"
   else
      answer "They are different"
   end if
HTH.

I open a other topic , to go fearther, about numToChar and charToNum (for ASCII values) for unicode characters ?
Regards.