Unicode manipulation sans objects in the loop

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
theotherbassist
Posts: 115
Joined: Thu Mar 06, 2014 9:29 am

Unicode manipulation sans objects in the loop

Post by theotherbassist » Sun Feb 12, 2017 5:06 pm

I'm currently pulling text from web sources (XML) and then analysing it according to word frequencies. When I do this I don't want S.N.L.' and S.N.L.â, for example, to show up as different words. Using "trueword" won't remove the single quote on the end of the latter, because it's a "â".

So at the moment I'm putting everything into a field via uniEncode to mesh all the text into ASCII prior to analysis. This weeds out the differences between sources that use unicode and sources that don't.

But it seems so silly, and I'm sure it slows everything down--I have all my data in arrays, then I put it into a field, and then back to arrays again. Is there a way to convert the unicode text using no objects and only variables? If there is, I can't seem to get the syntax right.

Is there some way to do it without employing charToNum() and numToChar()? The CPU cost of doing that with hundreds of keys averaging ~15 words each seems unnecessary.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7393
Joined: Sat Apr 08, 2006 8:31 pm
Contact:

Re: Unicode manipulation sans objects in the loop

Post by jacque » Mon Feb 13, 2017 7:10 pm

Whenever you pull data from an external source, run it through textDecode to convert the unicode to UTF16. That should fix things. As of LC 7 you shouldn't need the old uniEncode/decode functions any more.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

theotherbassist
Posts: 115
Joined: Thu Mar 06, 2014 9:29 am

Re: Unicode manipulation sans objects in the loop

Post by theotherbassist » Mon Feb 13, 2017 11:55 pm

Thanks. Didn't know about textDecode.

Post Reply