I'm currently pulling text from web sources (XML) and then analysing it according to word frequencies. When I do this I don't want S.N.L.' and S.N.L.â, for example, to show up as different words. Using "trueword" won't remove the single quote on the end of the latter, because it's a "â".
So at the moment I'm putting everything into a field via uniEncode to mesh all the text into ASCII prior to analysis. This weeds out the differences between sources that use unicode and sources that don't.
But it seems so silly, and I'm sure it slows everything down--I have all my data in arrays, then I put it into a field, and then back to arrays again. Is there a way to convert the unicode text using no objects and only variables? If there is, I can't seem to get the syntax right.
Is there some way to do it without employing charToNum() and numToChar()? The CPU cost of doing that with hundreds of keys averaging ~15 words each seems unnecessary.
Unicode manipulation sans objects in the loop
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
-
- Posts: 115
- Joined: Thu Mar 06, 2014 9:29 am
Re: Unicode manipulation sans objects in the loop
Whenever you pull data from an external source, run it through textDecode to convert the unicode to UTF16. That should fix things. As of LC 7 you shouldn't need the old uniEncode/decode functions any more.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
-
- Posts: 115
- Joined: Thu Mar 06, 2014 9:29 am
Re: Unicode manipulation sans objects in the loop
Thanks. Didn't know about textDecode.