UTF8 problem after writing a text file

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
Zax
Posts: 519
Joined: Mon May 28, 2007 10:12 am
Contact:

UTF8 problem after writing a text file

Post by Zax » Mon Jan 03, 2011 1:49 pm

Hello,

I have an UTF-8 encoded text file and BBEdit well recognize the encoding.
Now I open this file with Rev Studio 4.0 with

Code: Select all

open file myFile for binary read
read from file myFile until EOF
put uniDecode(uniEncode(it,"UTF8")) into data
-- here some treatments on data...
put data into URL ("binfile:" & myFile)
After that, when I open again the file in BBEdit, BBEdit doesn't recognize UTF-8 encoding.
Can anyone tell me what am I doing wrong?

Thanks.

Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Contact:

Re: UTF8 problem after writing a text file

Post by Janschenkel » Mon Jan 03, 2011 2:33 pm

I'm afraid you're getting a little mixed up in the use of the uniDecode and uniEncode functions.

Code: Select all

open file myFile for binary read
read from file myFile until EOF
-- first convert from UTF8 to UTF16
put uniEncode(it,"UTF8") into data
-- here some treatments on data...
-- finally convert from UTF16 to UTF8
put uniDecode(data,"UTF8") into data
-- and write that to file
put data into URL ("binfile:" & myFile)
Also make sure to set the useUnicode local property when needed as you treat the data.

HTH,

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

Zax
Posts: 519
Joined: Mon May 28, 2007 10:12 am
Contact:

Re: UTF8 problem after writing a text file

Post by Zax » Mon Jan 03, 2011 3:14 pm

Thank you Jan for your reply.

I'm not familiar with these encoding problems but I found the "uniDecode(uniEncode(it,"UTF8"))" trick in the built-in Rev help and I have to say that it works well (on Mac OS at least).
My problem is when writing the file.

I tried your script but was unable to make it work, maybe I missed something.

Zax
Posts: 519
Joined: Mon May 28, 2007 10:12 am
Contact:

Re: UTF8 problem after writing a text file

Post by Zax » Tue Jan 04, 2011 1:42 pm

OK, I made a mistake. The following test script works for UTF-8 text files, with or without BOM

Code: Select all

  open file myFile for binary read
  read from file myFile until EOF
  close file myFile
  
  put uniEncode("__et voilà__") into addedString -- for testing purpose
   
  set the useUnicode to true
  put uniEncode(it,"UTF8") into data
  put addedString after char 50 of data -- text modification
  put uniDecode(data,"UTF8") into data
  
  put data into URL ("binfile:" & myFile)
But now problems are with non-UTF8 encoded files, Mac OS roman for example: accented characters are lost and output text file is now UTF8 encoded :(
So, is there a way to know how a text file is encoded before modifying it?

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: UTF8 problem after writing a text file

Post by Mark » Thu Jan 20, 2011 5:29 pm

Hi Zax,

The safest way to do this is to provide the user with an open file dialog, which includes a menu, e.g.

answer file "Choose a text file..." with type "MacRoman|txt|TEXT" or type "Windows Latin|txt|TEXT" or type "UTF8|txt|TEXT" or type "Rich Text Format|rtf|RTF "

After this command is executed, the result contains "MacRoman" or "WindowsLatin" etc. This way, you can ask the user what kind of file you're dealing with.

Best regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

Post Reply