Page 1 of 1

unidecode(uniencode()) removes characters??

Posted: Mon Oct 12, 2009 8:23 am
by hliljegren
If I do a:

Code: Select all

put unidecode(uniencode("åäö are som swedish characters", "utf8"))
the result is:

Code: Select all

 are som swedish characters
The swedish characters are stripped from the text! Am I missing something or what?

If I strip the "utf8" part (which converts to UTF-16 if I understand things correctly) everything works fine, but I really need to convert to UTF-8 as my database stores all fields in that format.

Should I file a bug towards runrev or towards my own knowledge? ;)

Posted: Mon Oct 12, 2009 10:56 am
by Mark
hliljegren,

Your script should be like this:

// convert to RunRev unicode (UTF16)
put uniencode("åäö") into myUnicode
// show unicode in field
set the unicodeText of fld 1 to myUnicode
// convert to UTF8
put unidecode(myUnicode,"UTF8") into myUTF8

What you get in myUTF8 is binary data. You can't put that into a field, but you can write it to a file, or your database, and tell a text editor to open it as UTF8.

Best regards,

Mark

Posted: Mon Oct 12, 2009 11:07 am
by Mark Smith
I think you need to move the "utf8" declaration into the unidecode call:

Code: Select all

unidecode(uniencode("åäö are som swedish characters"),"UTF8")
The way you had it will translate a string from utf8 to whatever your system's encoding is.

Best,

Mark Smith

Posted: Mon Oct 12, 2009 11:24 am
by hliljegren
Aha! Thanks!

If I understand your last post correct I can't translate a string directly to UTF-8 via uniencode. Instead uniencode ALWAYS translates to UTF-16.

So uniencode(myText, utf8) translates a UTF-8 encoded into UTF-16 and thus to get a string from a field to utf-8 format I need to encode it to UTF-16 and then "decode" it to UTF-8 which I then can send to my MySQL database.

Correct?

Posted: Mon Oct 12, 2009 2:09 pm
by Mark
Hi Mark,

From the docs:
If you don't specify a language, the uniDecode function returns the stringToDecode, with every second byte removed.
As far as I can see, this is independent from any encoding.

Best,

Mark