Page 1 of 1
Converting Unicode codes to readable text
Posted: Fri Mar 27, 2020 8:43 am
by japino
I'm getting data from a website in JSON format. As far as I understand, the text is Russian and I've put it into a field.
The text is this:
\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?
Is there no simple way to convert this to readable text? I've searched the forum and I came across this:
https://forums.livecode.com/viewtopic.p ... on#p183823
Is there no easier way to convert those Unicode codes?
Re: Converting Unicode codes to readable text
Posted: Fri Mar 27, 2020 3:51 pm
by FourthWorld
Does the JSON file or the API documentation include a description of the encoding used?
Re: Converting Unicode codes to readable text
Posted: Fri Mar 27, 2020 8:45 pm
by japino
No, I checked both the returned JSON output and the API docs and encoding isn’t mentioned anywhere.
Re: Converting Unicode codes to readable text
Posted: Fri Mar 27, 2020 9:41 pm
by richmond62
-
Code: Select all
on mouseUp
set the itemDelimiter to "\"
put 2 into KOUNT
repeat until item KOUNT of fld "fRAW" is "XXX"
put item KOUNT of fld "fRAW" into BUKVA
delete char 1 of BUKVA
put ("0x" & BUKVA) into MAGIC
put numToCodepoint(MAGIC) after fld "fOUT"
add 1 to KOUNT
end repeat
end mouseUp
Oddly enough the 3 words don't have gaps between them:
Привет как дела
Hey, what's up?
Re: Converting Unicode codes to readable text
Posted: Sat Mar 28, 2020 6:38 pm
by japino
Thanks for this richmond62! So it does look like I need to convert each character one by one. Was hoping that Livecode had some function for this that I overlooked, but I guess not. I’ll figure out a way to make sure the space gets preserved. Thanks again.
Re: Converting Unicode codes to readable text
Posted: Sat Mar 28, 2020 7:41 pm
by richmond62
По принсип мой скрипт е една функция!

Re: Converting Unicode codes to readable text
Posted: Sat Mar 28, 2020 11:42 pm
by jacque
A bit quicker, but the same idea:
Code: Select all
function doTranslate pString
set the itemDelimiter to "\u"
if char 1 to 2 of pString = "\u" then delete char 1 to 2 of pString -- avoid empty first item
repeat for each item i in pString
put numToCodepoint("0x" & i) after tTranslation
end repeat
return tTranslation
end doTranslate
Re: Converting Unicode codes to readable text
Posted: Sun Mar 29, 2020 8:57 am
by Thierry
Hi,
Applying your text sample with the last solution, I found 2 errors:
-- spaces are suppressed
-- last chunk \u0430? breaks the code (error with numtocodepoint)
So, here is my take on this:
Code: Select all
local T = "\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?"
on mouseUp
put tdzTranslate(T)
end mouseUp
Code: Select all
on getCodePoint V
return numToCodepoint( "0x" & V)
end getCodePoint
function tdzTranslate T
local R
get sunnyReplace(T,"\\u([0-9a-f]{4})","?{ getCodePoint \1}", R)
return R
end tdzTranslate
-->
Привет как дела?
and thank you, I've learned my 1st Russian sentence today
Take care,
Thierry
Re: Converting Unicode codes to readable text
Posted: Sun Mar 29, 2020 10:25 am
by japino
Thanks Jacque and Thierry.
Thierry, for my own small project I can't really afford a paid external, but it's good to know that it's there and I've made note of it, may be I will use it some time in the future.
For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder

but anyway, here you have it:
Code: Select all
on mouseup
put "\u041f\u0440\u0438\u0432\u0435\u0442 \u043a\u0430\u043a \u0434\u0435\u043b\u0430?" into myTranslation
repeat
put "\u" into myCharsToFind
put offset(myCharsToFind, myTranslation) into myStartChar
if myStartChar is 0 then exit repeat
put myStartChar + 5 into myEndChar
put char myStartChar to myEndChar of myTranslation into codeToConvert
replace "\u" with "0x" in codeToConvert
put numToCodepoint(codeToConvert) into myChar
delete char myStartChar to myEndChar in myTranslation
put myChar after char myStartChar - 1 in myTranslation
end repeat
answer myTranslation
end mouseup
Re: Converting Unicode codes to readable text
Posted: Sun Mar 29, 2020 11:35 am
by Thierry
japino wrote:
Thierry, for my own small project I can't really afford a paid external...
It's fine for me, I do understand.
Actually, I have a small number of regex followers who like
regex use cases; that's the main reason of my regex posts...
Oh, BTW, it's a library, not an external.
For now I've used a repeat loop which finds each \uXXXX string and replaces it with the actual character.
A bit hesitant to paste it here because I know I'm a bad hobby coder

but anyway, here you have it:
I've quickly made a new version of your excellent code,
just in case your curious...
But your code and mine is not efficient for long input text!
Code: Select all
function tdzTranslate txt
repeat
put offset("\u", txt) into idxStart
if idxStart is 0 then exit repeat
put idxStart + 5 into idxEnd
get numToCodepoint("0x" & char idxStart+2 to idxEnd of txt)
put IT into char idxStart to idxEnd of txt
end repeat
return txt
end tdzTranslate
Take care,
Thierry
Re: Converting Unicode codes to readable text
Posted: Sun Mar 29, 2020 4:09 pm
by japino
Aw, many thanks for this Thierry, this is excellent! And I don't worry about long texts, because I should be dealing with sentences only.
