text conversion?

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 1:47 pm

Here is a seriously frightening list of non-unicode font encodings:

https://philip.html5.org/data/charsets-2.html

Klaus
Posts: 14249
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: text conversion?

Post by Klaus » Tue Feb 27, 2024 1:53 pm

joeMich wrote:
Tue Feb 27, 2024 1:19 pm
Yes, Klaus
simply put into a text-file...
OK, I guessed, try this script, it will NOT doe the replace thing but encodes your exported text to UTF8 and chances are good
that PHP will recognize the format as UTF8 and act accordingly.
However I have no idea of PHP, maybe something sels has to be set in PHP for this?

Anyway, see line 127 of my script:

Code: Select all

on mouseUp
   
   #   set the itemdel to "/"
   #   get the effective filename of this stack
   #   delete item -1 of it   
   #   put it & "/" into stakkensFilsti
   
   ## We can now use:
   put specialfolderpath("resources") & "/" into stakkensFilsti
   
   put stakkensFilsti & "xport af versemaal/" into dataMappensFilsti
   put dataMappensFilsti & "versemaal.txt" into filensSti
   
   put "file:" & filensSti into destFil
   ---det er den fil som enten skabes eller skrives til
   
   put "versemålsnr,linjer,metrik,stavelser2" into meterListen
   put "CSnr,KSnr,Hjsk19nr,Hjsk17nr,GTLnr,DDTnr,glDDSnr,nyDDSnr,vmliste,vers" into salmeListen
   put "nyDDKnr,glDDKnr,andenMelBognr,prefNr,prefGlNr" into koralListen
   put "" into tempResultat
   --put tab into adskilningstegn
   put "|" into adskilningstegn
   
   set cursor to busy
   set lockScreen to true
   put the seconds into startTid
   
   put "0" into antalKortBearbejdet
   put "1" into linNumresultat
   put the number of cards of this stack into tempSidsteSide
   repeat with x = 2 to tempSidsteSide -------357 -----alle kortene!!!
      go card x
      put "" into tempTempResultat ---det midlertidige output (for dette kort!)
      -------put "" into tempResultat ---det endelige output (for alle definerede kort!)
      
      repeat with i = 1 to the number of lines in fld "vmliste"
         
         
         set the itemDelimiter to ","
         repeat with u = 1 to the number of items in meterListen
            put fld (item u of meterListen) & adskilningstegn after line i of tempTempResultat
         end repeat
         
         
         
         set the itemDelimiter to ","
         repeat with u = 1 to the number of items in salmeListen
            if item u of salmeListen = "vmliste" then
               --------- her skal lidt ekstrabehandling til
               put line i of fld (item u of salmeListen) into afkortetLinje
               delete word 1 to 2 of afkortetLinje
               if char 1 of afkortetLinje = " " then
                  delete char 1 of afkortetLinje
               end if
               
               ---get afkortetLinje
               put afkortetLinje into tempAfkortet
               if ";" is in tempAfkortet then
                  put char 1 to offset (";", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
               else if "(" is in tempAfkortet then
                  put char 1 to offset ("(", tempAfkortet) - 1 of tempAfkortet into afkortetLinje
               else
                  put tempAfkortet into afkortetLinje
               end if
               
               replace tab with "" in afkortetLinje
               put afkortetLinje & adskilningstegn after line i of tempTempResultat
               ---put erstatVanskeligeBogstaver(afkortetLinje) & adskilningstegn after line i of tempTempResultat
               -----------
            else
               if line i of fld (item u of salmeListen) = "-" then
                  put "" & adskilningstegn after line i of tempTempResultat
               else if line i of fld (item u of salmeListen) = "÷" then
                  put "" & adskilningstegn after line i of tempTempResultat
               else if line i of fld (item u of salmeListen) = "" then
                  put "" & adskilningstegn after line i of tempTempResultat
               else
                  put line i of fld (item u of salmeListen) & adskilningstegn after line i of tempTempResultat
               end if
            end if
         end repeat
         
         
         
         set the itemDelimiter to ","
         repeat with u = 1 to the number of items in koralListen
            put "" into koralPræfiks
            put item u of koralListen into aktuelKoralbog
            if aktuelKoralbog = "nyDDKnr" then
               put "K " into koralPræfiks
            else if aktuelKoralbog = "glDDKnr" then
               put "gK " into koralPræfiks
            end if
            if line i of fld (item u of koralListen) = "-" then
               put "" & adskilningstegn after line i of tempTempResultat
            else if line i of fld (item u of koralListen) = "÷" then
               put "" & adskilningstegn after line i of tempTempResultat
            else if line i of fld (item u of koralListen) = "" then
               put "" & adskilningstegn after line i of tempTempResultat
            else
               put line i of fld (item u of koralListen) into tempKoralLinje
               put "," & koralPræfiks into kommaPræfiks
               replace "," with kommaPræfiks in tempKoralLinje
               put koralPræfiks & tempKoralLinje & adskilningstegn after line i of tempTempResultat
            end if
         end repeat
         
         put adskilningstegn after line i of tempTempResultat
         put "jabadaba" after line i of tempTempResultat
         replace tab with "" in tempTempResultat
      end repeat      
      ------put return & tempTempResultat after url destFil
      ---put the number of lines of tempResultal into linNumresultat
      ---add 1 to linNumresultat
      put  tempTempResultat & return after tempResultat
      ---put tempTempResultat into line linNumresultat of tempResultat
      ---put the number of lines of tempResultal into linNumresultat
      ---add 1 to linNumresultat
      add 1 to antalKortBearbejdet
   end repeat ---- rep-loop for alle kortene!!
   
   --put uniEncode(tempResultat) into url destFil
   ##########################################################
   
   ## DO NOT the REPLACE thing below. but put the TEXTENCODED text directly into that target file here:
   put textencode(tempResultat,"UTF-8") into url(stakkensFilsti)
   
   ##
   EXIT TO TOP
   ###########################################################
   set lockScreen to false
end mouseUp
Hope I got your file references correctly -> destFil

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 2:08 pm

Also . . .

It might be useful to remember that char hex 00FC (u umlaut) is NOT the same as hex 0075 (u) + hex 0308 (umlaut).

Even though they look the same. 8)
-
Screenshot 2024-02-27 at 15.07.49.png
-
So, don't get distracted by diversions . . .
-
umleitung.png
umleitung.png (4.12 KiB) Viewed 3957 times

Klaus
Posts: 14249
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: text conversion?

Post by Klaus » Tue Feb 27, 2024 2:13 pm

I'd NEVER take an Ü for an Ü! :-D

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 2:20 pm

Just so long as the spots are not on "U". 8)
-
spotty.jpg
spotty.jpg (12.49 KiB) Viewed 3955 times

joeMich
Posts: 20
Joined: Tue Jun 06, 2006 8:24 am

Re: text conversion?

Post by joeMich » Tue Feb 27, 2024 3:12 pm

Thank you for input
I shall dig into it when I get home later

And, ha ha. I like your sense of humour 😜

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 3:25 pm

HOWEVER: there is plenty of room for further complications.

Here's a Windows 1251 Cyrillic font I have opened:
-
Screenshot 2024-02-27 at 16.18.53.jpg
-
The Cyrillic letters do NOT have Unicode addresses, so it is unclear how LiveCode could effect a conversion in the way I indicated in my example stack.
-
Especially as LiveCode, apparently, does NOT offer Windows 1251:
-
textEncode.jpg
-
I dug out the literature on my Cyrillic converter I wrote about 20 years ago, and as the Cyrillic letters in the original texts did NOT adhere to any "known" encoding (i.e. someone had just bunged them into the second ASCII table) that would not have presented any complications.
Last edited by richmond62 on Tue Feb 27, 2024 3:52 pm, edited 1 time in total.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 3:44 pm

Aha: with a little bit of poking around one can also find that even with Windows 1251 one should be able to convert a text with this encoding in the incredibly crude way I use in my example:
-
Screenshot 2024-02-27 at 16.34.32.png
Screenshot 2024-02-27 at 16.34.32.png (5.04 KiB) Viewed 3933 times
-
Now . . . the unicode address for à is hex 00C3.

So, for the sake of argument if I install this font into my MacOS 12 system (ER Bukinist 1251) and do this:
-
Screenshot 2024-02-27 at 16.42.58.png
-
One can see that LiveCode CAN retrieve those addresses . . . it just involves a lot more faffing around for the programmer. :?

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 3:50 pm

Oh; fantastic!

If one opens a Windows 1251 font using Fontforge (instead of a popular and extremely expensive commercial font editor) the Hex addresses are 'right there' for all to see:
-
Screenshot 2024-02-27 at 16.46.16.png
-
Screenshot 2024-02-27 at 16.48.30.jpg
-
Super: another reason why I should stick with Open Source software.

When in doubt I always prefer a spade to a rotavator.

joeMich
Posts: 20
Joined: Tue Jun 06, 2006 8:24 am

Re: text conversion?

Post by joeMich » Tue Feb 27, 2024 5:05 pm

But is the problem necessarily tied to a font?
Or are you just using the font (scheme) to see what to translate to?

When the webpage (PHP script) reads the text-file, no font is specified

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10192
Joined: Fri Feb 19, 2010 10:17 am

Re: text conversion?

Post by richmond62 » Tue Feb 27, 2024 5:37 pm

When the webpage (PHP script) reads the text-file, no font is specified
The font should not be a problem, but the font layout may be.

The unicode font layout scheme is meant to takeover from ALL previous encodings, but that has not happened.

If you import text that uses a different font layout from the one you use in your LiveCode stack there will be a mismatch and you will be unable to read your text.

joeMich
Posts: 20
Joined: Tue Jun 06, 2006 8:24 am

Re: text conversion?

Post by joeMich » Tue Feb 27, 2024 11:12 pm

Now I think I got it working

I made some tests with "textEncode" in a stack (attachement)

When I tried the "UTF-16" version, then it looked much alike the one I had made earlier. So I took out the chars that I knew would be used and filled the conversion into my "translator"

Code: Select all

   --little translator - otherwise some chars are not shown right in the web-page!!!!
   repeat with i = 1 to the number of chars in tempResultat
      put char i of tempResultat into tChar
      
      if tChar = "§" then 
         put "ß" into tChar
      else if tChar = "å" then 
         put "Â" into tChar
      else if tChar = "Å" then 
         put "≈" into tChar
      else if tChar = "æ" then 
         put "Ê" into tChar
      else if tChar = "ø" then
         put "¯" into tChar
      else if tChar = "Æ" then 
         put "∆" into tChar
      else if tChar = "Ø" then 
         put "ÿ" into tChar
      else if tChar = "é" then 
         put "È" into tChar
      else if tChar = "á" then 
         put "·" into tChar
      else if tChar = "È" then 
         put "»" into tChar
      else if tChar = "À" then 
         put "¿" into tChar
      else if tChar = "ú" then 
         put "˙" into tChar
      else if tChar = "Ù" then 
         put "Ÿ" into tChar
      else if tChar = "Ü" then 
         put "‹" into tChar
      else if tChar = "ö" then 
         put "ˆ" into tChar
      else if tChar = "ü" then 
         put "¸" into tChar
      else if tChar = "ä" then 
         put "‰" into tChar
      end if
      
      put tChar after tempResultat2
   end repeat
   

   put tempResultat2 into url destFil
That's the only "conversion" I make upon exporting to text-file

From the webpage I import/read the file as an array (through PHP).
Most of the material is numbers. But I had to do something with the title-part of the array.
Here I did

Code: Select all

$titel[] = utf8_encode($linjeArray[12]);
I must admit that I don't understand it - but it works for now

Thanks for your comments and for looking into the problem!

best regards
johan
text-encode-test.livecode.zip
(2.79 KiB) Downloaded 109 times

Klaus
Posts: 14249
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: text conversion?

Post by Klaus » Wed Feb 28, 2024 1:56 pm

joeMich wrote:
Tue Feb 27, 2024 11:12 pm
From the webpage I import/read the file as an array (through PHP).
Most of the material is numbers. But I had to do something with the title-part of the array.
Here I did

Code: Select all

$titel[] = utf8_encode($linjeArray[12]);
I must admit that I don't understand it
Maybe that is a little part of the problem?
joeMich wrote:
Tue Feb 27, 2024 11:12 pm
... - but it works for now
No it doesn't!
macOS 12.6.7
Browser: Safari lastest version on the left.
Firefox latest version on the right.
dansk.png
I'm still convinced that using a correctly encoded text file will work with PHP.
Internally PHP uses iso-8859-1 encoding, and the PHP function "utf8_encode(...)" is deprecated since a long time.

Did you try my last script, without all the manual character replacements?
And maybe just outputting your data from your Mac to a iso-8859-1 encoded file will work out of the box?

Code: Select all

...
## Collect your data...
## And convert to PHP friendly encoding:
put mactoiso(tempResultat) into url(stakkensFilsti)
...

joeMich
Posts: 20
Joined: Tue Jun 06, 2006 8:24 am

Re: text conversion?

Post by joeMich » Wed Feb 28, 2024 3:33 pm

You are right
There are still unexpected chars
Maybe that is a little part of the problem?
Yep :lol:

I'll try to export from Livecode to a text-file with the encoding that you suggest.

Thanks!

joeMich
Posts: 20
Joined: Tue Jun 06, 2006 8:24 am

Re: text conversion?

Post by joeMich » Wed Feb 28, 2024 3:53 pm

Seems that this works:

from LC: macToIso

into php:

Code: Select all

$titel[] = mb_convert_encoding($linjeArray[12],'UTF-8','ISO-8859-1');
I get strange signs if I don't use the mb-convert_encoding


Right now I think the problems are solved

Post Reply