UTF-8 help [SOLVED]

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
Nicke
Posts: 36
Joined: Thu Nov 28, 2013 11:19 am

UTF-8 help [SOLVED]

Post by Nicke » Tue Dec 17, 2013 11:20 am

Hi,

I have tried for a few weeks to get LC to encode to UTF8 but I must be stupid. I get chineese charaters or it just drops special charaters. I'm trying to save a xml file with utf-8 encoding. I want everything to be utf-8 in the xml. Below is a simple example that I have been playing around with

Code: Select all

put "<?xml version='1.0' encoding='UTF-8'?>" & cr into theXML
      put"<SETTINGS VERSION='1.0'>" & cr after theXML
      put tab & "<SERVER>" & field "srvField" & "</SERVER>" & cr after theXML
      put tab & "<SERVERPORT>" & uniEncode(field "portField", "utf8") & "</SERVERPORT>" & cr after theXML
      put "</SETTINGS>" after theXML
      put the effective filename of this stack into tPath
      set the itemDelimiter to slash
      delete last item of tPath
      put tPath & "/settings.xml" into tFileName
      put theXML into URL ("binfile:" & tFileName)
I have read the tutorials on unicode several times but can somebody show me an example code that reads a field (that may include special charaters eg. "åäö" and so on) converts it to utf8 and save it to an xml file

Comparing to php and python this seems so difficult in LC

Thanks in advance!
Last edited by Nicke on Tue Dec 17, 2013 1:42 pm, edited 1 time in total.

vedus
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 153
Joined: Tue Feb 26, 2013 9:23 am

Re: UTF-8 help

Post by vedus » Tue Dec 17, 2013 1:38 pm

i will try to help with this solution.
copy and paste the bellow code to substack or your stack..
Provided by Trevor DeVore of Blue Mango Learning Systems.

Code: Select all

/**
* Handlers for converting XML to LiveCode arrays and vice versa. 
*
* Provided by Trevor DeVore of Blue Mango Learning Systems.
*/

/**
* \brief Escapes the predefined XML entities in a string.
*
* \param pStr The string to escape the characters in.
*
* \return String
*/
function EscapePredefinedXMLEntities pStr
    replace "&" with "&" in pStr
    replace "<" with "<" in pStr
    replace ">" with ">" in pStr
    replace "'" with "&apos;" in pStr
    replace quote with """ in pStr
    
    return pStr
end EscapePredefinedXMLEntities


/**
* \brief Unescapes predefined xml entities in a string.
*
* \param pStr The strin to unescape the characters in.
*
* \return String
*/
function UnescapePredefinedXMLEntities pStr
    replace "&" with "&" in pStr
    replace "<" with "<" in pStr
    replace ">" with ">" in pStr
    replace "&apos;" with "'" in pStr
    replace """ with quote in pStr
    
    return pStr
end UnescapePredefinedXMLEntities


/**
* \brief Helper function for sorting keys of an array based on order in the XML document the array was created from.
*
* \param pArray Array whose keys you want to sort.
* \param pStripMetaKeys By default any meta keys (keys starting with "@") will be stripped. Pass in false to bypass this behavior.
*
* Revolution array keys are never guaranteed to be in order you created them in 
* so we must come up with some other way of maintaining proper sequence. 
* For arrays representing XML, the XML syntax is used (i.e. node[1], node[2], etc.). 
* This handler will sort keys that use this syntax for representing sequence.
*
* \return String
*/
function SortArrayKeysWithXMLOrdering pArray, pStripMetaKeys
   local theKeys
   
   put pStripMetaKeys is not false into pStripMetaKeys
   
   put the keys of pArray into theKeys
   set the itemDelimiter to "["
   sort theKeys numeric by the last item of each -- 1], 2], 3], etc.
   
   if pStripMetaKeys then
      filter theKeys without "@*"
   end if
   
   return theKeys
end SortArrayKeysWithXMLOrdering


/**
* \brief Converts an XML tree into a LiveCode multi-dimensional array.
*
* \param pXML The xml to convert.
* \param pStoreEncodedAs Encoding to use. Must be a value that can be passed to uniDecode. Default is "utf8".
* \param pUseValueKey  By default node values are stored in a key named after the node. This means you can't have a node with attributes and a node value. Pass in true if you want to store node values in a '@value' key. This will allow a key to have both attributes (in @attributes key) and a value (in @value key).
* \param pForceNumerIndexForNodes A comma delimited list of node names that should always have numbered indexes (NODE[index]) added to them. This makes it easier to loop over results that may have 1 or more results.
*
* A nodes attributes will be stored as an array of it's "@attributes" key.
* Node names will retain the sequence information (i.e. node[1], node[2], etc.).
* This information is necessary to determine order that keys should be processed in. Example:
* set the itemDelimiter to "["
* put the keys of theArray into theKeys
* sort theKeys numeric by the last item of each
*
* \return Array
*/
function ConvertXMLToArray pXML, pStoreEncodedAs, pUseValueKey, pForceNumerIndexForNodes
    local theArray,theResult,theRootNode,theTreeID
    local theXMLEncoding
    
    ## Create an XML tree from XML text
    put revCreateXMLTree(pXML, true, true, false) into theTreeID
    
    if theTreeID is an integer then
        ## Determine the encoding of the XML, default to UTF-8
        put matchText(pXML, "<\?xml (.*)encoding=" & quote & "(.*)" & quote & "\?>", versionMatch, theXMLEncoding) into theResult
        if theXMLEncoding is empty then put "utf-8" into theXMLEncoding
        
        ## Now convert to array. 
        ## The 1st dimension has one key which is the name of the root node.
        put revXMLRootNode(theTreeID) into theRootNode
        if theRootNode is not empty and not(theRootNode begins with "xmlerr,") then
            put ConvertXMLNodeToArray(theTreeID, theRootNode, theXMLEncoding, pStoreEncodedAs, pUseValueKey, pForceNumerIndexForNodes) into theArray[theRootNode]
        end if
        
        revDeleteXMLTree theTreeID
    end if
    
    return theArray
end ConvertXMLToArray


/**
* \brief Converts and revXML created XML Tree to an array.
*
* \param pXMLTree The xml tree id.
* \param pStoreEncodedAs See docs for ConvertXMLToArray.
* \param pUseValueKey See docs for ConvertXMLToArray.
* \param pForceNumerIndexForNodes See docs for ConvertXMLToArray.
*
* See docs for ConvertXMLToArray.
*
* \return Array
*/
function ConvertXMLTreeToArray pXMLTree, pStoreEncodedAs, pUseValueKey, pForceNumerIndexForNodes
    return ConvertXMLToArray(revXMLText(pXMLTree), pStoreEncodedAs, pUseValueKey, pForceNumerIndexForNodes)
end ConvertXMLTreeToArray


/**
* \brief Converts a multi-dimensional array to an XML tree.
*
* \param pArray The array to convert.
* \param pArrayEncoding Encoding used in the array. Must be a value that can be passed to uniEncode. Default is the current platform encoding.
* \param pStoreEncodedAs Encoding to use. Must be a value that can be passed to uniDecode. Default is "utf8".
*
* The array should consist of one key in the 1st dimension. This key becomes the root node in the XML tree.
* Attributes of a node should be stored as an array in an @attributes key. 
* Sequence information for multiple nodes with the same name should be included in the node name using brackets (i.e. node[1], node[2], node[3]).
*
* \return XML Tree id (integer) or error message.
*/
function ConvertArrayToXML pArray, pArrayEncoding, pStoreEncodedAs
    local theError,theRootNode,theXML,theXMLTree
    
    ## if pArrayEncoding is empty then current platform encoding is assumed
    if pStoreEncodedAs is empty then put "UTF-8" into pStoreEncodedAs
     
    ## Create XML for root node. Note that we take extra steps in order to support
    ## converting an array that only represents part of a tree rather than the entire tree.
    ## In this case there may be multiple nodes at the root level.
    put line 1 of the keys of pArray into theRootNode 
    set the itemDelimiter to "["
    put "<" & item 1 of theRootNode & "/>" into theXML
     
    ## Create XML needed to create tree
    put format("<?xml version=\"1.0\" encoding=\"%s\"?>%s", \
            pStoreEncodedAs, theXML) into theXML
    put revCreateXMLTree(theXML, true, true, false) into theXMLTree
     
    if theXMLTree is an integer then
        ## Loop over all nodes at root level
        put false into stripMetaKeys
        put SortArrayKeysWithXMLOrdering(pArray, stripMetaKeys) into theNodes
         
        ## Create tree using helper function
        repeat for each line theNode in theNodes
            ConvertArrayDimensionToXML pArray[theNode], theXMLTree, slash & theNode, \
                    pArrayEncoding, pStoreEncodedAs
            put the result into theError
             
            if theError is not empty then exit repeat
        end repeat
        
        if theError is not empty then
            ## something went wrong, clean bad tree
            revDeleteXMLTree theXMLTree
        end if
    else
        put theXMLTree into theError
    end if
     
    if theError is not empty then
        return theError
    else
        return theXMLTree
    end if
end ConvertArrayToXML
 
 
 /**
* \brief Helper function for ConvertArrayToXML.
*
* Converts the multi-dimensional array pArray to nodes in pTreeID. Calls itself recursively.
*
* \return Error message.
*/
private command ConvertArrayDimensionToXML pArray, pTreeID, pNode, pArrayEncoding, pStoreEncodedAs
    local theError,theKey,theKeys,theNode
    
    ## A workaround for fact that Revolution does not return
    ## keys in the order we created them
    put false into stripMetaKeys
    put SortArrayKeysWithXMLOrdering(pArray, stripMetaKeys) into theNodes
    
    ## Arrays might have sequencing info in name 
    ## (i.e. step[1], step[2], ... )
    set the itemDelimiter to "["
    
    repeat for each line theFullNode in theNodes
        put item 1 of theFullNode into theNode
         
        ## Look for attributes. These will be added as attributes to pNode.
        if theNode is "@attributes" or theNode is "@attr" then
            repeat for each line theKey in the keys of pArray[theFullNode]
                revSetXMLAttribute pTreeID, pNode, theKey, \
                        EncodeString(pArray[theFullNode][theKey], \
                        pArrayEncoding, pStoreEncodedAs)
                if the result begins with "xmlerr," then 
                    put the result && "(setting attribute" && theKey && "for node" && pNode & ")" into theError
                end if
                
                if theError is not empty then exit repeat
            end repeat
            
        else if theNode is "@value" then
            ## This XML tree is using complex structure. Node is the value of the parent node
            revPutIntoXMLNode pTreeID, pNode, EncodeString(pArray[theFullNode], pArrayEncoding, pStoreEncodedAs)
            if the result begins with "xmlerr," then
                put the result && "(adding child node" && theNode && "to node" && pNode & ")" into theError
            end if
            
        else
            if the keys of pArray[theFullNode] is not empty then
                ## Node has children. Add node to XML tree then call self recursivly to create children nodes. 
                revAddXMLNode pTreeID, pNode, theNode, empty
                if the result begins with "xmlerr," then
                    put the result && "(adding node" && theNode & ")" into theError
                end if
                
                if theError is empty then
                    ConvertArrayDimensionToXML pArray[theFullNode], pTreeID, pNode & slash & theFullNode, \
                            pArrayEncoding, pStoreEncodedAs
                    put the result into theError
                end if
            else
                ## Node has no children but possibly a value. Create node and add value (which may be empty).
                revAddXMLNode pTreeID, pNode, theNode, \
                        EncodeString(pArray[theFullNode], pArrayEncoding, pStoreEncodedAs)
                if the result begins with "xmlerr," then
                    put the result && "(adding child node" && theNode && "to node" && pNode & ")" into theError
                end if
            end if
        end if 
         
        if theError is not empty then exit repeat
    end repeat
    
    return theError
end ConvertArrayDimensionToXML
 

/**
* \brief Helper function for ConvertXMLToArray.
*
* Converts an XML node to a multi-dimensional array. Calls itself recursively.
*
* \return Array
*/
private function ConvertXMLNodeToArray pTreeID, pNode, pXMLTreeEncoding, pStoreEncodedAs, pUseValueKey, pForceNumerIndexForNodes
    local theArrayA,theAttributes,theChildNode,theKey
     
    ## Look for attributes of the node. Store as array in "@attributes" key
    put revXMLAttributes(pTreeID, pNode, tab, cr) into theAttributes
    if theAttributes is not empty then
        put EncodeString(theAttributes, pXMLTreeEncoding, pStoreEncodedAs) into theAttributes
        split theAttributes by cr and tab -- create array
        put theAttributes into theArrayA["@attributes"]
    end if
     
    ## Look for children nodes. 
    set the itemDelimiter to slash
    put revXMLFirstChild(pTreeID, pNode) into theChildNode
    if theChildNode is empty or theChildNode begins with "xmlerr," then
        put EncodeString(revXMLNodeContents(pTreeID, pNode), pXMLTreeEncoding, pStoreEncodedAs) into theValue
        if word 1 to -1 of theValue is empty and the keys of theArrayA is not empty then
            ## Empty node that has attributes
            return theArrayA
        else if pUseValueKey then
            ## Force value into @value
            put theValue into theArrayA["@value"]
            return theArrayA
        else
            ## Single Node with value: Return value. Attributes are ignored.
            return theValue
        end if
    else
        ## Child nodes were found. Recursively call self and store result in array.
        set the wholeMatches to true
        replace comma with cr in pForceNumerIndexForNodes
        repeat while theChildNode is not empty and not (theChildNode begins with "xmlerr,")
            put the last item of theChildNode into theKey
            if theKey is among the lines of pForceNumerIndexForNodes then
                ## Oops, key that needs index doesn't have one. Only 1 entry in XML.
                put "[1]" after theKey
            end if      
            put ConvertXMLNodeToArray(pTreeID, theChildNode, pXMLTreeEncoding, pStoreEncodedAs, pUseValueKey, \
                    pForceNumerIndexForNodes) into theArrayA[theKey]
            put revXMLNextSibling(pTreeID, theChildNode) into theChildNode
        end repeat
         
        return theArrayA
    end if
end ConvertXMLNodeToArray
 
 
/**
* \brief Helper function for converting the encoding of strings when converting to and from XML.
*
* \return String
*/
private function EncodeString pString, pInEncoding, pOutEncoding
   ## convert utf-8 to utf8 for uniencode/decode
   replace "-" with empty in pInEncoding
   replace "-" with empty in pOutEncoding
   
   if pInEncoding is not empty then
      -- if pOutEncoding is empty then pString will be converted to the current platform encoding
      return uniDecode(uniEncode(pString, pInEncoding), pOutEncoding)
   else
      if pOutEncoding is not empty then
         -- if pInEncoding is empty then pString is assumed to be in the current platform encoding
         return uniDecode(uniEncode(pString, pInEncoding), pOutEncoding)
      else
         return pString
      end if
   end if
end EncodeString
To code i am working with above to show up and write utf-8 is bellow

Code: Select all

on mouseUp
    # When the button is clicked, load up the preferences
    loadPreferences
end mouseUp

command loadPreferences
    # There are two parts to loading the preferences file. The first part is reading the file into memory and
    # creating an XML "tree". The second part is to process the tree and extract the data from it.
    
    # This function reads the XML file, and returns the tree. The tree is represented as a number, the actual
    # tree structure and data is managed by Revolution and so we don't need to worry about it.
    local tTree
    put readPreferencesToXMLTree() into tTree
    if tTree is empty then
        exit loadPreferences
    end if
    
    # This command reads the preferences we require from the tree and displays them.
    processPreferencesTree tTree
    
    # Close the XML tree. This will free up the memory that the tree was using and prevent our 
    # application using more memory than it needs or "leaking" memory by creating multiple trees
    # without closing any of them.
    revDeleteXMLTree tTree
end loadPreferences
# This function reads the XML file from disk, and turns it into an XML Tree. The tree is then returned
# for the second part of the process.
private function readPreferencesToXMLTree
    # Find the XML file on disk. This is for now assumed to be in the same location as the stack / application.
    # Note that we restore the itemDelimiter to comma (its default value) afterwards. This is not essential
    # but its good practice to avoid tricky bugs that can arise due to unexpected delimiter values.
    set the itemDelimiter to slash
    global tPreferencesFile
    put item 1 to -2 of the effective filename of this stack & "/yourfile.xml" into tPreferencesFile
    set the itemDelimiter to comma
    
    # Read the preferences data from the file into a variable. Always check for the result when reading files
    # as its possible that the file may have been deleted or moved.
    global tPreferencesData, tResult
    put url ("file:" & tPreferencesFile) into tPreferencesData
    put the result into tResult
    if tResult is not empty then
        answer error "Failed to read preferences file at location: " & tPreferencesFile
        return empty
    end if
    
    # Create the XML "tree" from the data, checking to make sure that the file has loaded properly.
    # The revCreateXMLTree function will return a number (the tree's "handle" or "id") if it succeeds,
    # otherwise it will return a message saying why it failed.
    local tTree
    put revCreateXMLTree(tPreferencesData, false, true, false) into tTree
    if tTree is not an integer then
        answer error "Failed to process preferences file with error: " & tTree
        return empty
    end if
    
    return tTree
end readPreferencesToXMLTree

private command processPreferencesTree pTree
   # Extract the text color and text size preferences. These are simple nodes in the XML file,
   # we can get what is inside them using the revXMLNodeContents function
   # This function will return a string beginning with "xmlerr," if it fails, but we don't check this
   # here as we created the file and we know it won't fail.
   
   local tName
   put revXMLNodeContents(pTree, "eortologio/onomastikes/name") into tName
   
   local tDay
   put revXMLNodeContents(pTree, "eortologio/onomastikes/day") into tDay
   
      local tMonth
      put revXMLNodeContents(pTree, "eortologio/onomastikes/month") into tMonth
      
     local tParagwga
   put revXMLNodeContents(pTree, "eortologio/onomastikes/paragwga") into tParagwga
   
   local tOutput
   --    put "Name = " & tName & return after tOutput
   --    put "Month = " & tMonth & return after tOutput
   --    put return after tOutput

//Here the utf8 convert
   set the unicodetext of fld "name" to uniencode(tName,"utf8")
   set the unicodetext of field "month" to uniencode(tMonth,"utf8")
   set the unicodetext of fld "day" to uniencode(tDay,"utf8")
set the unicodetext of field "paragwga" to uniencode(tParagwga,"utf8")
   
  
   end processPreferencesTree
the above code is from lessonhttp://lessons.runrev.com/s/lessons/m/4 ... n-xml-file
check them in a new project and see how is working :)

Nicke
Posts: 36
Joined: Thu Nov 28, 2013 11:19 am

Re: UTF-8 help

Post by Nicke » Tue Dec 17, 2013 1:42 pm

Ok, maybe it helped to post here because I now think I found the solution :)

When populating the textFields you need to do it like this, put does not seem to work:

Code: Select all

set the useUnicode to true
set the unicodeText of fld "portField" to  uniEncode(revXMLNodeContents(setXMLID, "SETTINGS/DEMO"), "UTF8")
When making the XML use get the unicodeText

Code: Select all

put "<?xml version='1.0' encoding='UTF-8'?>" & cr into theXML
      put"<SETTINGS VERSION='1.0'>" & cr after theXML
      put tab & "<SERVER>" & field "srvField" & "</SERVER>" & cr after theXML
      get the unicodeText of fld "portField"
      put tab & "<SERVERPORT>" & unidecode(it,"utf8") & "</SERVERPORT>" & cr after theXML
      put "</SETTINGS>" after theXML
      put the effective filename of this stack into tPath
      set the itemDelimiter to slash
      delete last item of tPath
      put tPath & "/settings.xml" into tFileName
      put theXML into URL ("binfile:" & tFileName)
And now I have field serverport UTF-8 encoded in the xml file... Hope that this is helping someone that has the same kind of problem

Nicke
Posts: 36
Joined: Thu Nov 28, 2013 11:19 am

Re: UTF-8 help [SOLVED]

Post by Nicke » Tue Dec 17, 2013 1:47 pm

Thanks Vedus,

will take a look at your solution.

Best Nicke

Post Reply