Page 1 of 1
unidecode(uniencode()) removes characters??
Posted: Mon Oct 12, 2009 8:23 am
by hliljegren
If I do a:
Code: Select all
put unidecode(uniencode("åäö are som swedish characters", "utf8"))
the result is:
The swedish characters are stripped from the text! Am I missing something or what?
If I strip the "utf8" part (which converts to UTF-16 if I understand things correctly) everything works fine, but I really need to convert to UTF-8 as my database stores all fields in that format.
Should I file a bug towards runrev or towards my own knowledge?

Posted: Mon Oct 12, 2009 10:56 am
by Mark
hliljegren,
Your script should be like this:
// convert to RunRev unicode (UTF16)
put uniencode("åäö") into myUnicode
// show unicode in field
set the unicodeText of fld 1 to myUnicode
// convert to UTF8
put unidecode(myUnicode,"UTF8") into myUTF8
What you get in myUTF8 is binary data. You can't put that into a field, but you can write it to a file, or your database, and tell a text editor to open it as UTF8.
Best regards,
Mark
Posted: Mon Oct 12, 2009 11:07 am
by Mark Smith
I think you need to move the "utf8" declaration into the unidecode call:
Code: Select all
unidecode(uniencode("åäö are som swedish characters"),"UTF8")
The way you had it will translate a string from utf8 to whatever your system's encoding is.
Best,
Mark Smith
Posted: Mon Oct 12, 2009 11:24 am
by hliljegren
Aha! Thanks!
If I understand your last post correct I can't translate a string directly to UTF-8 via uniencode. Instead uniencode ALWAYS translates to UTF-16.
So uniencode(myText, utf8) translates a UTF-8 encoded into UTF-16 and thus to get a string from a field to utf-8 format I need to encode it to UTF-16 and then "decode" it to UTF-8 which I then can send to my MySQL database.
Correct?
Posted: Mon Oct 12, 2009 2:09 pm
by Mark
Hi Mark,
From the docs:
If you don't specify a language, the uniDecode function returns the stringToDecode, with every second byte removed.
As far as I can see, this is independent from any encoding.
Best,
Mark