LiveCode Forums.

Posted: **Fri Jun 21, 2013 11:55 pm**

This would be a helpful one for lcVCS... all custom properties need to be tested for if they are ASCII or not so they can be base64Encoded if they need to be. Currently using this function:

Code: Select all

function needsEncoding pData
   if pData contains null then return true
   repeat for each byte tByte in pData
      put charToNum(tByte) into tNum
      if tNum > 127 then
         return true
      end if
   end repeat
   return false
end needsEncoding

I think such a test would be considerably faster in the engine. What do we think?

Posted: **Sat Jun 22, 2013 3:06 am**

Depends. Is this for xml? If so there's more than just ASCII to watch out for.
Also, ASCII is more restrictive if you're looking at printing characters, so you'll need something like

Code: Select all

    function needsEncoding pData
       if pData contains null then return true
       repeat for each byte tByte in pData
          -- if XML then
          --if tByte is ">" or tByte is "<" then
             --return true
          --end if
          put charToNum(tByte) into tNum
          if tNum < 32 then
             return true
          end if
          if tNum > 126 then
             return true
          end if
           -- also, don't put punctuation into xml tags
          --if tNum < 48 then
             --return true
          --end if
         -- switch tByte
             --case 58
             --case 59
             --case 60
             --case 61
             --case 62
             --case 63
             --case 64
             --case 91
             --case 92
             --case 93
             --case 94
             --case 96
             --case 123
             --case 124
             --case 125
             --case 126
               --return true
         --end switch
       end repeat
       return false
    end needsEncoding

Posted: **Sat Jun 22, 2013 3:43 am**

No it's for mergJSON which handles everything but NULL... below 128... But I expect the operator would be useful outside that...

Posted: **Sat Jun 22, 2013 6:29 am**

Since you want speed, I suggest replacing the function call for each byte with an array look-up for each byte.

Code: Select all

function needsEncoding pData
    local tArray
    repeat with i = 0 to 255
        if ( i < 48 ) or ( i >= 58 and i <= 64 ) or ( i >= 91 and i <= 96 ) or ( i >= 123 ) then put "true" into tArray[ i ]
    end repeat
    repeat for each byte tByte in pData
        if tArray[ tByte ] then return "true"
    end repeat
    return "false"
end needsEncoding

I haven't tested the code above. If the idea works for you, please let us know how it affected the speed.

An "is ASCII" operator might be expected return "false" for an empty input, so it would not always be the inverse of function "needsEncoding."

-- Dick

Posted: **Sat Jun 22, 2013 11:06 am**

After looking at the operator code I'm hoping that's high on the refactoring agenda...

Posted: **Sat Jun 22, 2013 12:32 pm**

How about 'x is an ascii string', 'x is a native string' (and later - when we have unicode - 'x is a unicode string'). Meaning whether the contents of a string just requires ascii, the native charset (or, later, unicode) to represent.

Essentially this means extending MCIs.

In regards to refactoring - yes, all syntax is refactored to split syntax from implementation. Adding operators will be as easy as any other syntax.

Posted: **Sat Jun 22, 2013 11:28 pm**

Well... obviously detecting 'is an ascii string' is easy... beyond that it gets a bit curly doesn't it?... If you knew the underlying encoding you could work out if it could be represented as ascii (like what we did for the unicode props in the properties) but without knowing that how do we go about it? I think I asked virtually the same question a few weeks back about the future unicode plans

Posted: **Sun Jun 23, 2013 1:42 am**

I've just sent a pull request for "is [not] an ascii string" but would need some guidance on the native variant if you want me to look at that.

Posted: **Sun Jun 23, 2013 4:01 pm**

Some years ago I mentioned on the improve list using 'every' and 'some' for multiple array and chunk operations and comparisons. I don't remember what syntax I suggested. Maybe like this:

Code: Select all

if charToNum( some char of x ) > 127 then

Posted: **Sun Jun 23, 2013 4:05 pm**

How does this get into the dictionary?

Posted: **Sun Jun 23, 2013 4:58 pm**

Indefinite articles are a problem in natural-language processing. "Any" is an analog for the random function in xtalk, but I could also see "any" used for what you want here:

Code: Select all

  if any char of x > 127 then

I could also see where "some" might be used in place of "any" in different contexts.

Posted: **Sun Jun 23, 2013 10:18 pm**

DarScott wrote:How does this get into the dictionary?

I think someone... possibly me... needs to submit a pull request with a doc... wherever the docs are, haven't looked yet.

Posted: **Mon Jun 24, 2013 10:05 am**

I think someone... possibly me... needs to submit a pull request with a doc... wherever the docs are, haven't looked yet.

Indeed - we've now merged in the docs and a release note system into 'develop'. So are starting to ask that contributions come with release notes and dictionary entries. These things are under the 'docs' folder in the livecode repo. There's info about both these things in the contribution docs (http://livecode.com/community/contribute-to-livecode/).

Posted: **Mon Jun 24, 2013 10:13 am**

hmm... well in this case I branched of master... so do I branch off develop to create the docs or what?

Posted: **Mon Jun 24, 2013 11:00 am**

Well... obviously detecting 'is an ascii string' is easy... beyond that it gets a bit curly doesn't it?... If you knew the underlying encoding you could work out if it could be represented as ascii (like what we did for the unicode props in the properties) but without knowing that how do we go about it? I think I asked virtually the same question a few weeks back about the future unicode plans

My suggestion above was a little terse (I didn't have much time to post this weekend).

At the moment all strings in the engine are 'native strings' (sequences of bytes that are interpreted as being the native text encoding). Due to the 1-1 mapping between char and byte in the native encodings, all strings in the engine are also 'binary strings'. This duality works very well - until you want to manipulate text that is in a larger encoding than the native ones (i.e. one that takes more than 1 byte per char).

So, right now, 'is a native string' will always return true for values that convert to strings (which is all at the moment).

Moving forward, all strings in the engine will be replaced by an MCStringRef abstraction. This opaque type will be able to hold either a native/binary string or a unicode string. More abstractly, an MCStringRef represents a sequence of characters - there's no need (from the outside) to be concerned about the internal representation (or encoding).

At that point 'is a native string' might not return true, if the text contained within the string cannot be converted (losslessly) to the native encoding.

In fact, (in the future) a whole family of 'is a string' type operators would be useful:

is a binary string - returns true if the string can convert to binary (i.e. is natively encoded) (and the value converts to a string)
is a native string - returns true if the string can be encoded as native (and the value converts to a string)
is a simple unicode string - returns true if the string can be encoded in unicode with no surrogate pairs (and the value converts to a string)
is a unicode string - returns true if the value converts to a string
is a string - returns true if the value converts to a string

So the above will probably cause more questions than it answers, but at least it's a start

LiveCode Forums.

is ASCII operator

is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator

Re: is ASCII operator