Page 1 of 1
Getting question marks instead of Arabic
Posted: Sun Oct 19, 2014 7:11 pm
by Aradim
Hello,
I am trying to put XML Arabic content from internet to a file , but i get question marks for the Arabic unicode characters. I am using Livecode 7.0.
My commands:
put URL "domain name/content" into ss
put textDecode(ss,"UTF-8") into url "file:/Users/maradi/Desktop/arabictest.txt"
I appreciate the help.
Re: Getting question marks instead of Arabic
Posted: Sun Oct 19, 2014 8:04 pm
by FourthWorld
It may be that because Unicode is a binary (non-ASCII) format you'll need to use "binfile:" where you have "file:".
If that works then we have a philosophical point to discuss with the engine team: now that v7 aims to support Unicode transparently, should we consider making the "file" specifier as Unicode-savvy as everything else?
Re: Getting question marks instead of Arabic
Posted: Sun Oct 19, 2014 8:38 pm
by Aradim
Hi Richard,
I tried binfile: , i got the same , all Arabic replaced by ?????. The answer is yes, to the question .
Thanks for the help.
Re: Getting question marks instead of Arabic
Posted: Sun Oct 19, 2014 9:33 pm
by Aradim
Hello,
This work around did it , using field as intermediate :
put URL "
http://www.xxx.com/xxx/rss" into field 1
put textDecode(field 1,"UTF-8") into field 1
put textEncode(field 1,"UTF-8") into url "file:/Users/maradi/Desktop/arabictest.txt"
Re: Getting question marks instead of Arabic
Posted: Mon Oct 20, 2014 8:55 pm
by jacque
FourthWorld wrote:It may be that because Unicode is a binary (non-ASCII) format you'll need to use "binfile:" where you have "file:".
If that works then we have a philosophical point to discuss with the engine team: now that v7 aims to support Unicode transparently, should we consider making the "file" specifier as Unicode-savvy as everything else?
I'm pretty sure they've considered that, wouldn't it break all existing scripts?
Re: Getting question marks instead of Arabic
Posted: Mon Oct 20, 2014 9:03 pm
by jacque
Read Fraser's explanation on the blog:
http://livecode.com/blog/2014/04/02/exa ... ting-text/
Near the end he gives this example:
put url("binfile:input.txt") into tInputEncoded
put textDecode(tInputEncoded, "UTF-8") into tInput
…
put textEncode(tOutput, "UTF-8") into tOutputEncoded
put tOutputEncoded into url("binfile:output.txt")
If you are using the open file/socket/process syntax, you can have the conversion done for you:
open tFile for utf-8 text read
Unfortunately, the URL syntax does not offer the same convenience. It can, however, auto-detect the correct encoding to use in some circumstances: when reading from a file URL, the beginning of the file is examined for a “byte order mark” that specifies the encoding of the text. It also uses the encoding returned by the web server when HTTP URLs are used. If the encoding is not recognised, it assumes the platform’s native text encoding is used. As the native encodings do not support Unicode, it is usually better to be explicit when writing to files, etc.
EDIT: Oh, never mind, I see you're already doing that. I think I'd write to support and see if they think there's a problem somewhere.