Encodings - why platform specific?
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
Encodings - why platform specific?
I have a project where csv-files are delivered from a customer from the windows platform and then they are supposed to be processed to utf-8 encoded XML-files on an OSX machine.
The textDecode, textEncode and open file for encoding commands will filter the list of available encodings by the OS, though. So I cannot convert the customer file, of which I know it is CP1252-encoded, when my converter is running on OSX.
Why is that? It makes absolutely no sense to me to have those restrictions. It is very likely that one will encounter any encoding on any platform in the wild (dealing with it, for example with BOMs etc, is another topic that can be left to the programmer).
Is there any workaround?
The textDecode, textEncode and open file for encoding commands will filter the list of available encodings by the OS, though. So I cannot convert the customer file, of which I know it is CP1252-encoded, when my converter is running on OSX.
Why is that? It makes absolutely no sense to me to have those restrictions. It is very likely that one will encounter any encoding on any platform in the wild (dealing with it, for example with BOMs etc, is another topic that can be left to the programmer).
Is there any workaround?

-
- VIP Livecode Opensource Backer
- Posts: 10043
- Joined: Sat Apr 08, 2006 7:05 am
- Contact:
Re: Encodings - why platform specific?
It's often much easier to diagnose issues when we can see the code in question. Without the code, I can only guess that perhaps you're reading the files in text mode rather than binary.
Text mode is the default, and is a good choice for ASCII files since it automatically converts line endings from whatever conventions are specific to the platform the script is running on to LiveCode's internal line endings, which use the Unix convention of 0x10.
But for non-ASCII encodings you'll want to use the binary format, which reads the data unaltered, e.g.:
Or:
Text mode is the default, and is a good choice for ASCII files since it automatically converts line endings from whatever conventions are specific to the platform the script is running on to LiveCode's internal line endings, which use the Unix convention of 0x10.
But for non-ASCII encodings you'll want to use the binary format, which reads the data unaltered, e.g.:
Code: Select all
open file tSomeFilePath for binary read
read from file tSomeFilePath until EOF
put it into gSomeData
close file tSomeFilePath
Code: Select all
put url ("binfile:"& tSomeFilePath) into gSomeData
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Re: Encodings - why platform specific?
OK, so I have the raw text in gSomeData, and now I have to declare that it is Windows-Encoded (which LiveCode cannot know), and then I have to change the Encoding to utf-8 for my target xml-file.
But:
wouldn't work, as "CP1252" is not allowed to be used with textDecode on OSX. (Error message: "textDecode: could not decode data")
The Dictionary of 7.0.3 does not tell so, but the Release Notes 2/20/15 state, that CP1252 is Windows only.
In fact, this works for me:
but this code only works, when the stack is run on the Windows operating system, not when run on OSX.
It seems to me that I am never able to change the encoding of a Windows-file to something else on the Mac, and cannot quite understand the reason for that limitation.
But:
Code: Select all
put textDecode(gSomeData,"CP1252") into gSomeDataWithDefinedEncoding
The Dictionary of 7.0.3 does not tell so, but the Release Notes 2/20/15 state, that CP1252 is Windows only.
In fact, this works for me:
Code: Select all
open file tSomeFilePath for "CP1252" text read
read from file tSomeFilePath until EOF
put it into gSomeData
close file tSomeFilePath
...
(do some processing)
...
open file tTargetFilePath for "UTF-8" text write
write gSomeData to file tTargetFilePath
close file tTargetFilePath
It seems to me that I am never able to change the encoding of a Windows-file to something else on the Mac, and cannot quite understand the reason for that limitation.

Re: Encodings - why platform specific?
Can you just process the raw data without converting it? Then feed the resulting data directly to the UTF8 conversion? That is, read it in as binary data and work with that.
It would depend on the data, but you might also be able to read it in as Latin-1 which is similar except for a few code points. That’s not really a perfectly safe option though.
You might want to submit a feature request for this in the QCC too.
It would depend on the data, but you might also be able to read it in as Latin-1 which is similar except for a few code points. That’s not really a perfectly safe option though.
You might want to submit a feature request for this in the QCC too.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
Re: Encodings - why platform specific?
This won't work. Any encoding conversion needs to know the source encoding (with the one exception, that the source text contains only 7-bit-ASCII characters).Can you just process the raw data without converting it? Then feed the resulting data directly to the UTF8 conversion?
This is pretty much like working with colorspace definitions in images. You cannot convert into a colorspace if you do not know the orginating colorspace (a super-common error in image processing - a colorspace is applied without conversion, because the source has no colorspace embedded. The color assignments are arbitrary).
Latin-1 would be good, but there is no option for it. And ISO-8859-1 would be just as good, but is only available on Linux, not OSX.but you might also be able to read it in as Latin-1
Yes, I will do so! It should not be hard to implement - all the code is already there. Just some encodings have been limited to certain OSs for whatever reason.You might want to submit a feature request for this in the QCC too.
Thanks anyway!
Andreas
