Getting question marks instead of Arabic

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
Aradim
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 6
Joined: Tue Jun 30, 2009 3:28 pm

Getting question marks instead of Arabic

Post by Aradim » Sun Oct 19, 2014 7:11 pm

Hello,

I am trying to put XML Arabic content from internet to a file , but i get question marks for the Arabic unicode characters. I am using Livecode 7.0.

My commands:

put URL "domain name/content" into ss

put textDecode(ss,"UTF-8") into url "file:/Users/maradi/Desktop/arabictest.txt"

I appreciate the help.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10048
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Getting question marks instead of Arabic

Post by FourthWorld » Sun Oct 19, 2014 8:04 pm

It may be that because Unicode is a binary (non-ASCII) format you'll need to use "binfile:" where you have "file:".

If that works then we have a philosophical point to discuss with the engine team: now that v7 aims to support Unicode transparently, should we consider making the "file" specifier as Unicode-savvy as everything else?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Aradim
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 6
Joined: Tue Jun 30, 2009 3:28 pm

Re: Getting question marks instead of Arabic

Post by Aradim » Sun Oct 19, 2014 8:38 pm

Hi Richard,

I tried binfile: , i got the same , all Arabic replaced by ?????. The answer is yes, to the question .

Thanks for the help.

Aradim
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 6
Joined: Tue Jun 30, 2009 3:28 pm

Re: Getting question marks instead of Arabic

Post by Aradim » Sun Oct 19, 2014 9:33 pm

Hello,

This work around did it , using field as intermediate :

put URL "http://www.xxx.com/xxx/rss" into field 1
put textDecode(field 1,"UTF-8") into field 1
put textEncode(field 1,"UTF-8") into url "file:/Users/maradi/Desktop/arabictest.txt"

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7391
Joined: Sat Apr 08, 2006 8:31 pm
Contact:

Re: Getting question marks instead of Arabic

Post by jacque » Mon Oct 20, 2014 8:55 pm

FourthWorld wrote:It may be that because Unicode is a binary (non-ASCII) format you'll need to use "binfile:" where you have "file:".

If that works then we have a philosophical point to discuss with the engine team: now that v7 aims to support Unicode transparently, should we consider making the "file" specifier as Unicode-savvy as everything else?
I'm pretty sure they've considered that, wouldn't it break all existing scripts?
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7391
Joined: Sat Apr 08, 2006 8:31 pm
Contact:

Re: Getting question marks instead of Arabic

Post by jacque » Mon Oct 20, 2014 9:03 pm

Read Fraser's explanation on the blog: http://livecode.com/blog/2014/04/02/exa ... ting-text/

Near the end he gives this example:

put url("binfile:input.txt") into tInputEncoded
put textDecode(tInputEncoded, "UTF-8") into tInput

put textEncode(tOutput, "UTF-8") into tOutputEncoded
put tOutputEncoded into url("binfile:output.txt")

If you are using the open file/socket/process syntax, you can have the conversion done for you:

open tFile for utf-8 text read

Unfortunately, the URL syntax does not offer the same convenience. It can, however, auto-detect the correct encoding to use in some circumstances: when reading from a file URL, the beginning of the file is examined for a “byte order mark” that specifies the encoding of the text. It also uses the encoding returned by the web server when HTTP URLs are used. If the encoding is not recognised, it assumes the platform’s native text encoding is used. As the native encodings do not support Unicode, it is usually better to be explicit when writing to files, etc.

EDIT: Oh, never mind, I see you're already doing that. I think I'd write to support and see if they think there's a problem somewhere.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

Post Reply