Page 1 of 1
					
				CSV UTF-8
				Posted: Wed Mar 20, 2024 2:08 pm
				by matgarage
				Hello,
I'm trying to generate a CSV UTF-8 with Livecode.
I use CSV with an Adobe Illustrator plugin to generate variable data printing.
Using Excel's "CSV UTF-8" saving option, everything works correctly for French characters accents.
However, I can't generate a CSV with accents that is recognized by my plugin, when I do it with Livecode.
The file generated with Livecode is correctly displayed with accents in TextEdit on the Mac.
When I import it into the Illustrator plugin, it's always recognized with a "Western Europe (Windows)" encoding, and the accents are not managed.
The same file imported into Excel and saved as a CSV UTF-8 file is recognized with a "UNICODE" encoding and the accents are managed.
How can I generate a CSV in livecode with "UNICODE" encoding?
Here is my code :
Code: Select all
on mouseUp pMouseButton
   put "NOM;ENTREPRISE;TEXTE" & CR into tVariable
   put "Matthieu;Garage;Hespéridée"  after tVariable
   ask file "Choisissez la destination de l'export" with "BD_test"
   if the result is not "cancel" then
      put it into tFileName 
      put textencode(tVariable,"UTF-8") into tVariable
      put tVariable into URL ("file:" & tFileName & ".csv")
   end if
end mouseUp
In attachments the two CSV files from Livecode and Excel
 
			
					
				Re: CSV UTF-8
				Posted: Wed Mar 20, 2024 3:21 pm
				by Klaus
				Hi Mat,
obviously EXCEL add a BOM (Byte Order Mark) at the beginning of your file, but LC does not.
I think the Illustrator plugin is the culprit, since a BOM does NOT guarantee that the file is UTF8, but you could try this:
https://forums.livecode.com/viewtopic.php?f=7&t=22365
Best
Klaus
 
			
					
				Re: CSV UTF-8
				Posted: Wed Mar 20, 2024 4:32 pm
				by matgarage
				Hi Klaus
The BOM is the a solution for my issue.
My IT colleague had to set the "BOM" option when he export for me CSV file from PHP engine.
I have tried this on my code :
Code: Select all
 put textencode(tVariable,"UTF-8") into tVariable
      put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
      put tVariable into URL ("binfile:" & tFileName & ".csv")
But it's not working because (I think) I can see the "unknown" characters when I preview it in textEdit.
Illustrator ses that as a normal character and not as BOM.
I've tried with "binfile:" and with "file:" with not the same result but with BOM not "silent".
I don't understand what's wrong...
Thanks for your help
Mat
 
			
					
				Re: CSV UTF-8
				Posted: Wed Mar 20, 2024 10:37 pm
				by SparkOut
				I don't know anything about how using a Mac would affect this, but have in the past used text files with BOM generated from LC on Windows to be read by TV station tricaster in RTL languages.
I wonder whether the order of the steps of your file creation might be part of the issue? Just maybe
Code: Select all
put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
put textencode(tVariable,"UTF-8") into URL ("binfile:" & tFileName & ".csv")
 I don't know if that will be any use though.
 
			
					
				Re: CSV UTF-8
				Posted: Thu Mar 21, 2024 1:04 am
				by stam
				SparkOut wrote: ↑Wed Mar 20, 2024 10:37 pm
I don't know anything about how using a Mac would affect this, but have in the past used text files with BOM generated from LC on Windows to be read by TV station tricaster in RTL languages.
I wonder whether the order of the steps of your file creation might be part of the issue? Just maybe
Code: Select all
put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
put textencode(tVariable,"UTF-8") into URL ("binfile:" & tFileName & ".csv")
 I don't know if that will be any use though.
 
I'm not well versed with BOM so I looked it up: 
https://en.wikipedia.org/wiki/Byte_order_mark
For UTF-8 there is no distinction in 'endian-ness' so byte sequence doesn't matter. It's 0xFEFF in UTF-16; in UTF-8 it's 0xEF 0xBB 0xBF.
However - forgive me if I'm wrong - the code above looks like you're adding binary data to a string and then converting this again to binary data.
Would the correct process not to be be to add the BOM as string then convert the whole variable to binary? Or to add the binary to a binary?
e.g.
Code: Select all
put numToCodepoint(0xFEFF) before tVariable 
put textEncode (tVariable, "UTF-8") into URL ("binfile:" & tFilePath)
Or perhaps that first line should be
Code: Select all
put numToCodepoint(0xEF) & numToCodepont(0xBB) & numToCodepont(0xBF) before tVariable
I may be waaaay off, I'm really just commenting for my own learning...
Also, not sure if you 
can concatenate binary data but I'm guessing you can't concatenate binary with text (?)
More than likely the above is wrong - I have no way of testing as on Mac it seems to always respect the unicode text and can't see 'strange' characters.
 
			
					
				Re: CSV UTF-8
				Posted: Thu Mar 21, 2024 8:31 am
				by matgarage
				SparkOut wrote: ↑Wed Mar 20, 2024 10:37 pm
I don't know anything about how using a Mac would affect this, but have in the past used text files with BOM generated from LC on Windows to be read by TV station tricaster in RTL languages.
I wonder whether the order of the steps of your file creation might be part of the issue? Just maybe
Code: Select all
put numtoByte(238) & numtoByte(187) & numtoByte(191) before tVariable 
put textencode(tVariable,"UTF-8") into URL ("binfile:" & tFileName & ".csv")
 I don't know if that will be any use though.
 
The "strange" characters are stil there with this option.
 
			
					
				Re: CSV UTF-8
				Posted: Thu Mar 21, 2024 8:36 am
				by matgarage
				stam wrote: ↑Thu Mar 21, 2024 1:04 am
Code: Select all
put numToCodepoint(0xFEFF) before tVariable 
put textEncode (tVariable, "UTF-8") into URL ("binfile:" & tFilePath)
 
It works !!! Nice.
Thanks to all for your help
 
			
					
				Re: CSV UTF-8
				Posted: Thu Mar 21, 2024 10:03 am
				by stam
				matgarage wrote: ↑Thu Mar 21, 2024 8:36 am
stam wrote: ↑Thu Mar 21, 2024 1:04 am
Code: Select all
put numToCodepoint(0xFEFF) before tVariable 
put textEncode (tVariable, "UTF-8") into URL ("binfile:" & tFilePath)
 
It works !!! Nice.
Thanks to all for your help
 
Glad that helped! I learned something new too 

Maybe edit the title of the original post and add "[SOLVED]" after the title or some such for others that may have a similar issue?