Problems with text conversions

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
exheusden
Posts: 170
Joined: Fri Oct 09, 2009 5:03 pm
Contact:

Problems with text conversions

Post by exheusden » Sat May 28, 2011 11:29 am

I have a stack, written in RevMedia 4, that creates ePub formats from plain text files.

I have now discovered that if I try to use a Unicode UTF-8 format text as its input, the resulting text in the ePub document is to some extent garbled, with curly quotes, em-dashes and some other characters substituted incorrectly (at least displayed incorrectly in ePub readers such as Calibre, iBooks, etc.): for example a closing curly quote is shown thus �äô

The command used to encode the text is

put textToEntities(it) & return into chapterText

where it contains the text, which is read from its file with read from file textfile until "#!@" (The "#!@" being a marker I set myself in the original text file).

How can I achieve correct character display, even from a UTF-8 format text?

BvG
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1239
Joined: Sat Apr 08, 2006 1:10 pm
Contact:

Re: Problems with text conversions

Post by BvG » Sat May 28, 2011 2:42 pm

I'm not sure i understand your question, but most likely, somewhere in your code you need one or both of these functions:

Code: Select all

function createUtf8TextfromRevText theText
   return unidecode(uniencode(theText),"utf8")
end createUtf8TextfromRevText

Code: Select all

function getRevTextFromUtf8 theText
   return unidecode(uniencode(theText,"utf8"))
end getRevTextFromUtf8
Various teststacks and stuff:
http://bjoernke.com

Chat with other RunRev developers:
chat.freenode.net:6666 #livecode

exheusden
Posts: 170
Joined: Fri Oct 09, 2009 5:03 pm
Contact:

Re: Problems with text conversions

Post by exheusden » Sat May 28, 2011 5:24 pm

The variable "it" already contains UTF-8 text; that is the text that is read fromm a Unicode UTF-8 formatted file.

Is it then still necessary to go though this uniencode-unidecode process?

And what would happen with a plain text file (MacOS Roman, for example)?

BvG
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1239
Joined: Sat Apr 08, 2006 1:10 pm
Contact:

Re: Problems with text conversions

Post by BvG » Sat May 28, 2011 6:59 pm

For utf-8 text, that is the proper approach, weird i know. Basically it converts the utf8 text to utf16 text, and then that to rev-field compatible text.

No for "normal" text files you do not decode as if it'd be utf-8.
Various teststacks and stuff:
http://bjoernke.com

Chat with other RunRev developers:
chat.freenode.net:6666 #livecode

exheusden
Posts: 170
Joined: Fri Oct 09, 2009 5:03 pm
Contact:

Re: Problems with text conversions

Post by exheusden » Sun May 29, 2011 11:56 am

I have tried using both functions. I tested one function at a time.

After having read the text, I then passed it to the function being tested and worked further with the text returned by that function.

The result was just the same as without the use of the functions: "special" characters, such as curly quotes, em-dashes, etc. remain garbled.

I expect I am not using the functions correctly. Perhaps I have to use them both: one after having read the text from the input file and one prior to writing the text to the output file. If so, which is which?

BvG
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1239
Joined: Sat Apr 08, 2006 1:10 pm
Contact:

Re: Problems with text conversions

Post by BvG » Sun May 29, 2011 2:00 pm

no most likely the data is not utf8 at all. or your approach to read the file is garbling it, or your code is wrong, this works and i've done it before. Do you us url or open file/close file? do you use binfile or file? do you put stuff into fields somewhen (shouldn't do that). etc.
Various teststacks and stuff:
http://bjoernke.com

Chat with other RunRev developers:
chat.freenode.net:6666 #livecode

exheusden
Posts: 170
Joined: Fri Oct 09, 2009 5:03 pm
Contact:

Re: Problems with text conversions

Post by exheusden » Sun May 29, 2011 7:30 pm

The input file is certainly in UTF-8 encoding, as that is the way it is saved with TextEdit.

The file is opened with Open File and closed with Close File. No binfile.

Nothing is put into fields.

I didn't say your functions don't work; I said they didn't work when I tried them as I understood how they had to be used, which I expect is not the correct way.

What is the sequence? Am I to pass the read text to createUtf8TextfromRevText, process it and then pass it to getRevTextFromUtf8 prior to saving the result? Or perhaps something else…?

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4172
Joined: Sun Jan 07, 2007 9:12 pm

Re: Problems with text conversions

Post by bn » Sun May 29, 2011 8:08 pm

Hi Exheusden,

could you zip the text file an upload it to the forum? There is a tab with Upload attachement, then you choose the zipped file from you hard disk and then click Add the file.

Unicode is a mess and it is best to see the text file in question.

Have a look here for the background of unicode and Livecode

http://livecode.byu.edu/unicode/unicodeInRev.php

(not that I understand everything he explains :) )

Kind regards

Bernd

BvG
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 1239
Joined: Sat Apr 08, 2006 1:10 pm
Contact:

Re: Problems with text conversions

Post by BvG » Sun May 29, 2011 10:29 pm

alright try this then:

Code: Select all

answer file ""
put (url "binfile:" & it) into theData
put unidecode(uniencode(theData,"utf8")) into field 1
garbled or not?
Various teststacks and stuff:
http://bjoernke.com

Chat with other RunRev developers:
chat.freenode.net:6666 #livecode

Post Reply