Different alphabetical orders

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

danielrr
Posts: 142
Joined: Mon Mar 04, 2013 4:03 pm

Different alphabetical orders

Post by danielrr » Tue Apr 23, 2013 2:31 pm

(different from the Latin order, I mean)

In Latin Alphabets, the order is... well, you know.
In other alphabets the alphabetical order is usually different. In greek aplphabet the order is (assuming we are using a latin transliteration) ABGDEZHQIKLMNX...
Is there an already made function to sort strings of characters based on different alphabetical orders or, more genearlly, based on any arbitrary sorting order?

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: Different alphabetical orders

Post by Simon » Wed Apr 24, 2013 12:21 am

Hi Daniel,
This sounded like Fun but gets complicated fast:

Code: Select all

on mouseUp
   put "gertrude" & cr & "alfred" & cr & "1penny" & cr & "Gerry" into myList
   put sortOrder(myList)
end mouseUp

function sortOrder tList
   set caseSensitive to true 
   put "a,A,B,1,g,G,D,E,Z,2,H,Q,#,I,K,L,M,N,X" into tSort  --Insert your own order
   split tSort by comma
   repeat for each line tLine in tList
      put char 1 of tLine into tLook
      repeat for each line tKey in the keys of  tSort
         if tLook = tSort[tKey]  then
            put tLine into line tKey of tNewList -- This needs a lot of work!!!
            exit repeat
         end if
      end repeat
   end repeat
   set the caseSensitive to false
   filter tNewList without empty
   return tNewList
end sortOrder
Well it's a start, only sorts on the first character, will overwrite lines, is generally a complete mess. :oops:
Should I continue building on it? I'm sure someone here knows about the complexities of custom sorts.

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Wed Apr 24, 2013 5:16 am

I would create pseudoWords by associating chars in the normal alphabet with the chars in the custom alphabet.

So looking at just the first three chars in the custom alphabet, "A,B,G", we can prepend accordingly, an "A,B,C". A word like "Gxxx" would be prepended with a "C", creating "CGxxx". Then sort normally, and lose the first char of each word.

Back to fun.

Craig Newman

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Wed Apr 24, 2013 6:37 am

Hmmm.

We need to sort by more than just the first char.

So let's substitute them all by brute force. The example custom alphabet was " ABGDEZH..."

We substitute a normal "A" for the first char in it, ("A" changes to "A")
We substitute a normal "B" for the second char in it, ("B" changes to "B")
We substitute a normal "C" for the third char in it, ("G" changes to "C")

So the custom word "AHBZ" becomes "AHBG" --first char, eighth char, second char, seventh char

We do this in, say 26 lines of code, as:

replace "A" with "A" in customAlphabet --1st char
replace "B" with "B" in customAlphabet --2nd Char
replace "G" with "C" in customAlphabet --3rd char
etc.

Now sort normally, and then un-replace

Craig Newman

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: Different alphabetical orders

Post by Simon » Wed Apr 24, 2013 6:51 am

I was also thinking about diacritical marks taking it past 26 possible.
So, substitute with a number instead and do a numeric sort.
Just to make sure start with 001. :D

Simon

Edit: Wow I'm nuts! Forget this post :oops: too late at night.
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

jmburnod
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2729
Joined: Sat Dec 22, 2007 5:35 pm
Contact:

Re: Different alphabetical orders

Post by jmburnod » Wed Apr 24, 2013 8:58 am

Hi All,
An other thread about a similar subject
http://forums.runrev.com/phpBB2/viewtop ... =9&t=12445
All the best
JEan-Marc
https://alternatic.ch

danielrr
Posts: 142
Joined: Mon Mar 04, 2013 4:03 pm

Re: Different alphabetical orders

Post by danielrr » Wed Apr 24, 2013 9:16 am

dunbarx wrote:Hmmm.

We need to sort by more than just the first char.

So let's substitute them all by brute force. The example custom alphabet was " ABGDEZH..."

We substitute a normal "A" for the first char in it, ("A" changes to "A")
We substitute a normal "B" for the second char in it, ("B" changes to "B")
We substitute a normal "C" for the third char in it, ("G" changes to "C")

So the custom word "AHBZ" becomes "AHBG" --first char, eighth char, second char, seventh char

We do this in, say 26 lines of code, as:

replace "A" with "A" in customAlphabet --1st char
replace "B" with "B" in customAlphabet --2nd Char
replace "G" with "C" in customAlphabet --3rd char
etc.

Now sort normally, and then un-replace

Craig Newman
Using brute force was my first option too. Should I worry? OK, the problem with this solution (substitute characters of alphabet A with characters of alphabet B, then sorting, then reverse the substitution) is:

a) it is OK as long as the other alphabet has the same number of characters, or less
b) It is in any case not a matter of using a simple iteration with "replace A with B in wordsToBeTransliterated" since at some point the replaced chars interferes with the chars still to be replaces, so that you need to use some workaround, making it even more clumsy (even if, in the end, this may be the shortest way)

So the question can be forked two ways:

a: what's the smartest function to transliterate all the characters of alphabet order "ABDCEF" to the alphabet order "FECDBAÑÇ"?

b: is there a way to order the words of a container using a non canonical alphabetical order, without substituting all the characters of the container with the characters that occupy the right order in the Latin alphabet?

Just to make it more fun!

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Wed Apr 24, 2013 5:30 pm

Hi.

The "replace" command will do the entire body of text in one shot, so that is not an issue.

If the custom alphabet has more than 26 chars, or maybe just do this anyway, substitute two numeric digits for each char instead, sort of what Simon was alluding to.

So the mapping would be:

" ABGDEZH..."

is mapped to "01,02,03,04, etc. (where the third char, "G" becomes "03")

So now a word like "ADHB" becomes "01040702"

And another brute force sort can be done as follows, using the undocumented "&" form to create multiple sorts (see my dictionary note under "sort")

Sort yourData numeric by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each & char 9 to 10 of each.

I have not tested this, but it should work.

Craig Newman

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Wed Apr 24, 2013 7:35 pm

Hi.

What do you mean by
not a matter of using a simple iteration with "replace A with B in wordsToBeTransliterated" since at some point the replaced chars interferes with the chars still to be replaces
The "replace" command works through the text as a whole, and all the chars are replaced at once.

Are you asking for help in writing the code?

Craig Newman

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Thu Apr 25, 2013 12:04 am

Daniel.

Played around a bit. There are things that still need to be done with the following script, but it should be a good head start.

I assume your original "latin" alphabet: "ABGDEZHQIKLMNX"

You need two fields to play with. The first holds a return delimited list of eight (or fewer) letter "latin" words. The second is for the results. Make a button with:

Code: Select all

on mouseUp
   put "ABGDEZHQIKLMNX" into latinString
   put fld 1 into test
   
   repeat with y = 11 to 24 --just not to have to deal with "03" for example
      put y into latin[char (y - 10) of latinString]
   end repeat
   
   put test into newString
   repeat for each char tChar in test
      if tChar = return then next repeat
      replace tChar with latin[tChar] in newString
   end repeat
   
      Sort newString numeric by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each
      put newString into fld 2
end mouseUp
Let me know where you take this.

Craig Newman

danielrr
Posts: 142
Joined: Mon Mar 04, 2013 4:03 pm

Re: Different alphabetical orders

Post by danielrr » Thu Apr 25, 2013 10:38 am

Thanks Craig,

I took the concept and developed a little bit (not necessarily in the right direction). Here's what I did:

sortByAnyOrder is a function that should return the text received as second parameter returned by the alphabetical order received as first parameter. I implemented for Greek Betacode (the most standard way to transliterate polytonic Greek, since you asked). There's a lot of ugly things in the function, but hey, I'm a newbie at LC.
The main problem (probably not really the main one, but still a problem) is that using this approach the low case words appear all of them after all the uppercase words with the same initial (that is "Laura, lamb, luck", not "lamb, Laura, Luck" as they should). This is a problem only with alphabets with the upper and lowercase difference (alphabets derived from Greek alphabet, including the Latin alphabet) but a problem anyway.

All comments to this approach are most welcome

Code: Select all

on mouseUp
   put "ABGDEZHQIKLMNXOPRSTUFXYWabgdezhqiklmnxoprstufxyw" into greekOrder  --betacode, that is
   --NO!, better
   put "AaBbGgDdEeZzHhQqIiKkLlMmNnXxOoPpRrSsTtUuFfXxYyWw" into greekOrder  --slightly better but still all the capitalized word appear on top of
   --the words with the same initial
   put ")/andra moi e)/nnepe, Mou=sa, polu/tropon, o(\s ma/la polla\ pla/gxqh, e)pei\ Troi/hs i(ero\n ptoli/eqron e)/perse:" into test

   put sortByAnyOrder(greekOrder,test) into newOrder
   put newOrder into TheWorld
end mouseUp


function sortByAnyOrder whichOrder,theWords
   replace " " with return in theWords
   replace "," with "" in theWords
   set the caseSensitive to true
   repeat with y = 10 to number of chars of  whichOrder + 9 --
      put "@" & y into latin[char (y - 9) of whichOrder]
   end repeat
   
   put theWords into newString
   repeat for each char tChar in whichOrder
      replace tChar with latin[tChar] in newString
   end repeat
   
   repeat with x = 1 to number of lines of newString  --there must be a less ugly way to do this
      put "," & line x of theWords after line x of newString
   end repeat
   sort lines of newString ascending text by item 1 of each
   repeat with x = 1 to number of lines of newString  --again, there must be a less ugly way to strip the first item
      delete item 1 of line x of newString
   end repeat
   set the caseSensitive to false
   return newString
end sortByAnyOrder


dunbarx wrote:Daniel.

Played around a bit. There are things that still need to be done with the following script, but it should be a good head start.

I assume your original "latin" alphabet: "ABGDEZHQIKLMNX"

You need two fields to play with. The first holds a return delimited list of eight (or fewer) letter "latin" words. The second is for the results. Make a button with:

Code: Select all

on mouseUp
   put "ABGDEZHQIKLMNX" into latinString
   put fld 1 into test
   
   repeat with y = 11 to 24 --just not to have to deal with "03" for example
      put y into latin[char (y - 10) of latinString]
   end repeat
   
   put test into newString
   repeat for each char tChar in test
      if tChar = return then next repeat
      replace tChar with latin[tChar] in newString
   end repeat
   
      Sort newString numeric by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each
      put newString into fld 2
end mouseUp
Let me know where you take this.

Craig Newman

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Thu Apr 25, 2013 2:15 pm

Daniel.

Nothing like playing around endlessly.

Doesn't the case-sensitive ordering of your custom alphabet solve the issue of lower/upper case?

But my numeric pseudo alphabet should sidestep whatever problems arise in that regard, and would encompass diacritical or any other variants as well. By simply encoding strings of characters into strings of numbers, based only on the order of those characters in the alphabet string, I would think that all problems are resolved.

So "a,A,b,B..." would encode as "11,12,13,14..." The encoding hard wires the sorting. Or am I missing something? What is going wrong?


Craig Newman

danielrr
Posts: 142
Joined: Mon Mar 04, 2013 4:03 pm

Re: Different alphabetical orders

Post by danielrr » Thu Apr 25, 2013 2:32 pm

Doesn't the case-sensitive ordering of your custom alphabet solve the issue of lower/upper case?
Not quite. My function puts the words with uppercase initial on top of all the words with the same initial. That is "Laura, Lebanon, Lost, last, lost, lancet etc" Usually this is not what you are looking for. In order to do a proper sorting you need to take in account the first and second character.

But my numeric pseudo alphabet should sidestep whatever problems arise in that regard, and would encompass diacritical or any other variants as well. By simply encoding strings of characters into strings of numbers, based only on the order of those characters in the alphabet string, I would think that all problems are resolved.
but then ¿how do you recover the original text? If I am not missing something, in your script you end up with a numeric string, but what you need is the reordered list of words

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: Different alphabetical orders

Post by dunbarx » Thu Apr 25, 2013 5:41 pm

Daniel.

You need to sort all the characters, not just the first and second. The compound sort command, using the concatenated sets of local "each" variables, addresses this. I only went to the fifth char in my script (up to char 10 in the paired digit schema), but this can be extended as required.

Right?

I had alluded several times to the fact that the coded strings had, of course, to be restored to plain text. This was intended to be left to you. Do you see how the characters were encoded? I recently used an array variable for compactness, but earlier, I had written a brute force example. Please step through that, and step through the script that encodes using the array. If you see how they were encoded, you should be able to work out how to restore.

So I still leave this to you. If you are not yet comfortable with arrays, try to write it all out explicitly. If you still have problems, write back. But please do try.

Craig Newman

danielrr
Posts: 142
Joined: Mon Mar 04, 2013 4:03 pm

Re: Different alphabetical orders

Post by danielrr » Sat Apr 27, 2013 5:12 pm

Hi Dunbar. Apologies for the delay in answering

First I'll answer myself by posting a new function that improves my previous one. This one correctly sorts text written in other alphabets, provided that you imput the right alphabetic order, and places a word starting with uppercase initial just before the lower case form of the same string. Still, it's ugly. Here it is:

Code: Select all

on mouseUp
   put "abgdezhqiklmnxoprstufxyw" into greekOrder
   put "ABGDEZHQIKLMNXOPRSTUFXYW" into upperCase
   put ")/andra moi e)/nnepe, Mou=sa, polu/tropon, o(\s ma/la polla\ pla/gxqh, e)pei\ Troi/hs i(ero\n ptoli/eqron e)/perse:" into test

   put sortByAnyOrder(greekOrder,test,upperCase) into newOrder
   put newOrder into TheWorld
end mouseUp


function sortByAnyOrder whichOrder,theWords,upperCase
   replace " " with return in theWords
   replace "," with "" in theWords
   set the caseSensitive to false
   repeat with y = 10 to number of chars of  whichOrder + 9 --
      put "@" & y into latin[char (y - 9) of whichOrder]
   end repeat
   
   put theWords into newString
   repeat for each char tChar in whichOrder
      --if tChar = return then next repeat
      replace tChar with latin[tChar] in newString
   end repeat
   set the caseSensitive to true
   repeat with x = 1 to number of lines of newString  --there must be a less ugly way to do this
      if char 1 of line x of theWords is in upperCase
      then
         put "@0," & line x of theWords after line x of newString
      else
         put "@1," & line x of theWords after line x of newString
      end if
      
   end repeat
   set the caseSensitive to false
   Sort lines of newString ascending text by item 1 of each--by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each
   repeat with x = 1 to number of lines of newString  --again, there must be a less ugly way to strip the first item, I'm a nwebbie in LC
      delete item 1 of line x of newString
   end repeat
   
   return newString
end sortByAnyOrder
And now I'd like to reply to your questions. First, all my gratitude for all your time with this newcomer
dunbarx wrote:Daniel.

You need to sort all the characters, not just the first and second. The compound sort command, using the concatenated sets of local "each" variables, addresses this. I only went to the fifth char in my script (up to char 10 in the paired digit schema), but this can be extended as required.

Right?

I had alluded several times to the fact that the coded strings had, of course, to be restored to plain text. This was intended to be left to you. Do you see how the characters were encoded? I recently used an array variable for compactness, but earlier, I had written a brute force example. Please step through that, and step through the script that encodes using the array. If you see how they were encoded, you should be able to work out how to restore.

So I still leave this to you. If you are not yet comfortable with arrays, try to write it all out explicitly. If you still have problems, write back. But please do try.

Craig Newman
I tried your function and it doesn't work. I mean it doesn't order the strings by the alphabetic order of choice. Try for yourself, and tell me if I did anything wrong.

On the other hand, and this is what intrigues me the most, I don't know of a straightforward way to restitute the numerical strings into its alphabetical (original) equivalent). I mean, if I have a string "2112" formed by the 21st character followed by the 12st character, and I start using "replace", my first replacement will be (wrongly) transforming the middle two characters into the 11th character. Yes I can imagine several, convoluted ways to do it in the right way but ¿which is the simple way to do it? Could you provide an example with code?

thanks, and best wishes,

Daniel

Post Reply