unidecode(uniencode()) removes characters??

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
hliljegren
Posts: 111
Joined: Sun Aug 23, 2009 7:48 am
Contact:

unidecode(uniencode()) removes characters??

Post by hliljegren » Mon Oct 12, 2009 8:23 am

If I do a:

Code: Select all

put unidecode(uniencode("åäö are som swedish characters", "utf8"))
the result is:

Code: Select all

 are som swedish characters
The swedish characters are stripped from the text! Am I missing something or what?

If I strip the "utf8" part (which converts to UTF-16 if I understand things correctly) everything works fine, but I really need to convert to UTF-8 as my database stores all fields in that format.

Should I file a bug towards runrev or towards my own knowledge? ;)

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Post by Mark » Mon Oct 12, 2009 10:56 am

hliljegren,

Your script should be like this:

// convert to RunRev unicode (UTF16)
put uniencode("åäö") into myUnicode
// show unicode in field
set the unicodeText of fld 1 to myUnicode
// convert to UTF8
put unidecode(myUnicode,"UTF8") into myUTF8

What you get in myUTF8 is binary data. You can't put that into a field, but you can write it to a file, or your database, and tell a text editor to open it as UTF8.

Best regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

Mark Smith
Posts: 179
Joined: Sat Apr 08, 2006 11:08 pm
Contact:

Post by Mark Smith » Mon Oct 12, 2009 11:07 am

I think you need to move the "utf8" declaration into the unidecode call:

Code: Select all

unidecode(uniencode("åäö are som swedish characters"),"UTF8")
The way you had it will translate a string from utf8 to whatever your system's encoding is.

Best,

Mark Smith

hliljegren
Posts: 111
Joined: Sun Aug 23, 2009 7:48 am
Contact:

Post by hliljegren » Mon Oct 12, 2009 11:24 am

Aha! Thanks!

If I understand your last post correct I can't translate a string directly to UTF-8 via uniencode. Instead uniencode ALWAYS translates to UTF-16.

So uniencode(myText, utf8) translates a UTF-8 encoded into UTF-16 and thus to get a string from a field to utf-8 format I need to encode it to UTF-16 and then "decode" it to UTF-8 which I then can send to my MySQL database.

Correct?

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Post by Mark » Mon Oct 12, 2009 2:09 pm

Hi Mark,

From the docs:
If you don't specify a language, the uniDecode function returns the stringToDecode, with every second byte removed.
As far as I can see, this is independent from any encoding.

Best,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

Post Reply