Page 1 of 1
Convert to HTML entitiies
Posted: Mon Mar 21, 2011 11:11 am
by exheusden
I have written a little function that has served well to convert accented characters to HTML entities. At least, it did work well until I added some upper-case characters. Now, the upper-cases are converted to lower-case entity equivalents.
Here's a little script I wrote to test the function:
Code: Select all
on mouseUp
put textToEntities(fld "testIn") into fld "testOut"
end mouseUp
function textToEntities textToCheck
put "&,ä,á,à,â,ã,æ,å,ë,è,é,ê,í,ï,î,ñ,ö,ó,ò,ô,õ,ü,ú,ù,ç,Å,Â,Æ,Ö,Ü,—,£,¡" into textTable
put "&,ä,á,à,â,ã,æ,å,ë,è,é,ê,í,ï,î,ñ,ö,ó,ò,ô,õ,ü,ú,ù,ç,Å,Â,Æ,Ö,Ü,—,£,¡" into entityTable
put 0 into itemCount
repeat for each item textTableItem in textTable
get textTableItem
add 1 to itemCount
replace it with item itemCount of entityTable in textToCheck
end repeat
return textToCheck
end textToEntities
So, the card contains just two fields, testIn and testOut, together with a button containing the above code. Whatever is typed into testIn is, upon pressing the button, placed into testOut, with HTML entity codes replacing special characters.
This works fine for lower-case special characters: å â æ are correctly converted to å â æ for example.
But upper-cases are incorrectly converted to their lower-case equivalents: Å Â Æ go to å â æ and Ö Ü to ö ü (instead of Å Â: Æ and Ö Ü respectively).
What am I doing wrongly?
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 11:48 am
by BvG
By default, LC is not case sensitive. So Ö matches ö in your comparision. Set the caseSensitive to true, and it should work fine.
Alternatively, you could use the htmlText of a field, instead of doing this by hand:
Set the text of field "convert" to theData
put the htmlText of field "convert"
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 1:24 pm
by exheusden
Thank, BVG.
I understand the caseSensitive explanation, but not so alternative method, using htmlText. I'll do some testing to see if I can better understand it and perhaps even get it to work!
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 1:58 pm
by exheusden
I tried the following:
Code: Select all
put the htmlText of fld "testIn" into fld "testOut"
and it worked fine, even when I copied a (Welsh) text, and pasted it into testIn: testOut showed all the correct HTML entities, even those for specific Welsh characters, such as circumflex-w ((upper and lower) and circumflex-y (upper and lower).
So I then tried it with data read from the original Welsh text file in a RunRev script:
Code: Select all
put the htmlText of it into chapterNameText
Unfortunately, this fails with the error, "button "Test eBook": execution error at line 168 (Chunk: error in object expression) near "Malcym ", char 24"
("Malcym" is the start of the text.)
The text being read is the same; the text file is in UTF-8 format.
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 3:11 pm
by exheusden
Perhaps I should indicate that I am using RevMedia and not LiveCode. Also, I'm running under MacOS X (10.6.6).
I have found that the Welsh text I wish to convert to include HTML entities is handled differently when I copy and paste part of it for testing purposes to when it is read directly from its file in a Revolution script.
An example:
When copied and pasted, the following text is shown correctly and is converted correctly:
"'Holi-des! Holi-des! Isio mynd! Isio mynd!' swniodd y plant gan neidio i fyny ac i lawr o flaen Ifor lle bynnag yr âi. "
(Note the a-circumflex in the final word.) This can be converted perfectly well to include the correct HTML entity.
However, when read and placed into the same field, the text is shown as follows:
"'Holi-des! Holi-des! Isio mynd! Isio mynd!' swniodd y plant gan neidio i fyny ac i lawr o flaen Ifor lle bynnag yr √¢i."
(Note the strange characters which have replaced the a-circumflex.) This converts incorrectly.
All special characters seem to be replaced.
Why is there a difference? Encoding method? Something else?
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 4:52 pm
by bn
Hi exheusden,
this is complicated stuff. I only get there with trial and error.
You might want to have a look at this:
http://www.runrev.com/developers/lesson ... evolution/
http://livecode.byu.edu/unicode/encoding.php
http://livecode.byu.edu/unicode/unicodeInRev.php
or put the whole of the following line
into your message box and it will load the sample stack upon hitting return key. Very useful examples.
Having said all that I gave it a try and can load a UTF-8 file on my Mac that I created from your example sentence to display correctly in a field, with the accented a.
I don´t know what accented characters there are in galic, but you may want to give this code a try.
Code: Select all
on mouseUp
-- get your path to the UTF-8 textfile, I put at test textfile with your
-- sample text on the desktop
put (specialfolderpath("desktop") & "/gälic.txt") into tPath
-- important here is the "binfile", otherwise Revolution converts the high-ASCII
-- characters to match the platform (Mac/Windows)
put url ("binfile:" &tPath) into tData
set the text of field 1 to unidecode(uniencode(tData,"utf8"),"Ansi")
end mouseUp
or a variation of the above for the last line taken from Devin Asay's example stack:
Code: Select all
set the unicodetext of fld 1 to uniencode(tData,"UTF8")
Unicode confuses me utterly. There are some people here on the forum that know far more about this than I do.
Kind regards
Bernd
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 5:37 pm
by exheusden
Bernd, many thanks indeed for all this.
I have just been through Devin Asay's stack and I can well understand that Unicode confuses you utterly! It would perhaps not be so bad if there were just one of the encodings, but there are both the 8 and 16 variations. (I tried to open a UTF-8 file from the "Read Unicode (UTF-16) Text File" button in the stack and this caused Revolution to hang!) Makes things very complicated.
Anyway, I'm sure that, thanks to your help, I will be able to find a way out of my current Welsh-reading problems.
Re: Convert to HTML entitiies
Posted: Mon Mar 21, 2011 7:07 pm
by bn
Hi Exheusden,
if you like you may want to post the Welsh file and I give it a try. I really won't tell anybody what it says...

Is it a Windows, Linux or Mac file? Just asking because of the line-ending character(s).
you would have to zip the file for upload.
Kind regards
Bernd
Re: Convert to HTML entitiies
Posted: Tue Mar 22, 2011 12:39 am
by BvG
htmlText is a field property, you can't get or set the htmlText of variables or custom properties, or files, or any container but fields (and also button labels, I think). That is why you can't set the htmlText of it (which is a variable).
Re: Convert to HTML entitiies
Posted: Tue Mar 22, 2011 9:04 am
by exheusden
Is there a way of reading Unicode files using the equivalent of a Read… until command?
The structure of the text file is such that this makes its processing much easier than having to deal with the whole text at once, as I understand is done with the put url command. (The text file contains "markers" to indicate chapter headings, for example.)
(I suppose I could try using chunks (though the markers are three characters in length), but this is less elegant and chunks seem to be problematic with Unicode, if I understand the Users Guide and Asay's stack correctly, so if a read… until is possible, it would be excellent.)
Re: Convert to HTML entitiies
Posted: Tue Mar 22, 2011 9:46 am
by exheusden
[quote="bn"if you like you may want to post the Welsh file and I give it a try. I really won't tell anybody what it says...

Is it a Windows, Linux or Mac file?[/quote]
No problem, here is a zipped version of (part of) the (Mac-generated) text file.
Note that the "#!@" series of characters is the chapter marker.
- Welsh stories.txt.zip
- Copyright. Not for publication. For test purposes only.
- (14.21 KiB) Downloaded 295 times
Re: Convert to HTML entitiies
Posted: Tue Mar 22, 2011 12:30 pm
by bn
Hi Exheusden,
this code reads the whole text into a field
Code: Select all
on mouseUp
-- path to UTF-8 file
put "/Users/userName/Documents/Revolution test stacks/utilities/RunRevForum/Exheusden/Welsh stories.txt" into tPath
-- important here is the "binfile", otherwise Revolution converts the high-ASCII
-- characters to match the platform (Mac/Windows)
put url ("binfile:" &tPath) into tData
set the unicodetext of fld 1 to uniencode(tData,"UTF8")
end mouseUp
this code reads the first 'chapter' into field 3, it works but I guess only by luck...
Code: Select all
on mouseUp
put the unicodetext of field 1 into tData
put offset("#",tData) into tWhere
put offset("#",tData,tWhere) into tWhere
set the unicodeText of field 3 to char 1 to tWhere of tData
end mouseUp
for reading until a character look at
open file (myFilePath "/" myFileName) for binary read
read from file (myFilePath "/" myFileName) until "@"
etc. -> dictionary, should work for your UTF-8 file, not tested.
http://lessons.runrev.com/spaces/lesson ... nary-File-
Kind regards
Bernd