Page 1 of 2
Different alphabetical orders
Posted: Tue Apr 23, 2013 2:31 pm
by danielrr
(different from the Latin order, I mean)
In Latin Alphabets, the order is... well, you know.
In other alphabets the alphabetical order is usually different. In greek aplphabet the order is (assuming we are using a latin transliteration) ABGDEZHQIKLMNX...
Is there an already made function to sort strings of characters based on different alphabetical orders or, more genearlly, based on any arbitrary sorting order?
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 12:21 am
by Simon
Hi Daniel,
This sounded like Fun but gets complicated fast:
Code: Select all
on mouseUp
put "gertrude" & cr & "alfred" & cr & "1penny" & cr & "Gerry" into myList
put sortOrder(myList)
end mouseUp
function sortOrder tList
set caseSensitive to true
put "a,A,B,1,g,G,D,E,Z,2,H,Q,#,I,K,L,M,N,X" into tSort --Insert your own order
split tSort by comma
repeat for each line tLine in tList
put char 1 of tLine into tLook
repeat for each line tKey in the keys of tSort
if tLook = tSort[tKey] then
put tLine into line tKey of tNewList -- This needs a lot of work!!!
exit repeat
end if
end repeat
end repeat
set the caseSensitive to false
filter tNewList without empty
return tNewList
end sortOrder
Well it's a start, only sorts on the first character, will overwrite lines, is generally a complete mess.
Should I continue building on it? I'm sure someone here knows about the complexities of custom sorts.
Simon
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 5:16 am
by dunbarx
I would create pseudoWords by associating chars in the normal alphabet with the chars in the custom alphabet.
So looking at just the first three chars in the custom alphabet, "A,B,G", we can prepend accordingly, an "A,B,C". A word like "Gxxx" would be prepended with a "C", creating "CGxxx". Then sort normally, and lose the first char of each word.
Back to fun.
Craig Newman
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 6:37 am
by dunbarx
Hmmm.
We need to sort by more than just the first char.
So let's substitute them all by brute force. The example custom alphabet was " ABGDEZH..."
We substitute a normal "A" for the first char in it, ("A" changes to "A")
We substitute a normal "B" for the second char in it, ("B" changes to "B")
We substitute a normal "C" for the third char in it, ("G" changes to "C")
So the custom word "AHBZ" becomes "AHBG" --first char, eighth char, second char, seventh char
We do this in, say 26 lines of code, as:
replace "A" with "A" in customAlphabet --1st char
replace "B" with "B" in customAlphabet --2nd Char
replace "G" with "C" in customAlphabet --3rd char
etc.
Now sort normally, and then un-replace
Craig Newman
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 6:51 am
by Simon
I was also thinking about diacritical marks taking it past 26 possible.
So, substitute with a number instead and do a numeric sort.
Just to make sure start with 001.
Simon
Edit: Wow I'm nuts! Forget this post

too late at night.
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 8:58 am
by jmburnod
Hi All,
An other thread about a similar subject
http://forums.runrev.com/phpBB2/viewtop ... =9&t=12445
All the best
JEan-Marc
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 9:16 am
by danielrr
dunbarx wrote:Hmmm.
We need to sort by more than just the first char.
So let's substitute them all by brute force. The example custom alphabet was " ABGDEZH..."
We substitute a normal "A" for the first char in it, ("A" changes to "A")
We substitute a normal "B" for the second char in it, ("B" changes to "B")
We substitute a normal "C" for the third char in it, ("G" changes to "C")
So the custom word "AHBZ" becomes "AHBG" --first char, eighth char, second char, seventh char
We do this in, say 26 lines of code, as:
replace "A" with "A" in customAlphabet --1st char
replace "B" with "B" in customAlphabet --2nd Char
replace "G" with "C" in customAlphabet --3rd char
etc.
Now sort normally, and then un-replace
Craig Newman
Using brute force was my first option too. Should I worry? OK, the problem with this solution (substitute characters of alphabet A with characters of alphabet B, then sorting, then reverse the substitution) is:
a) it is OK as long as the other alphabet has the same number of characters, or less
b) It is in any case not a matter of using a simple iteration with "replace A with B in wordsToBeTransliterated" since at some point the replaced chars interferes with the chars still to be replaces, so that you need to use some workaround, making it even more clumsy (even if, in the end, this may be the shortest way)
So the question can be forked two ways:
a: what's the smartest function to transliterate all the characters of alphabet order "ABDCEF" to the alphabet order "FECDBAÑÇ"?
b: is there a way to order the words of a container using a non canonical alphabetical order, without substituting all the characters of the container with the characters that occupy the right order in the Latin alphabet?
Just to make it more fun!
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 5:30 pm
by dunbarx
Hi.
The "replace" command will do the entire body of text in one shot, so that is not an issue.
If the custom alphabet has more than 26 chars, or maybe just do this anyway, substitute two numeric digits for each char instead, sort of what Simon was alluding to.
So the mapping would be:
" ABGDEZH..."
is mapped to "01,02,03,04, etc. (where the third char, "G" becomes "03")
So now a word like "ADHB" becomes "01040702"
And another brute force sort can be done as follows, using the undocumented "&" form to create multiple sorts (see my dictionary note under "sort")
Sort yourData numeric by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each & char 9 to 10 of each.
I have not tested this, but it should work.
Craig Newman
Re: Different alphabetical orders
Posted: Wed Apr 24, 2013 7:35 pm
by dunbarx
Hi.
What do you mean by
not a matter of using a simple iteration with "replace A with B in wordsToBeTransliterated" since at some point the replaced chars interferes with the chars still to be replaces
The "replace" command works through the text as a whole, and all the chars are replaced at once.
Are you asking for help in writing the code?
Craig Newman
Re: Different alphabetical orders
Posted: Thu Apr 25, 2013 12:04 am
by dunbarx
Daniel.
Played around a bit. There are things that still need to be done with the following script, but it should be a good head start.
I assume your original "latin" alphabet: "ABGDEZHQIKLMNX"
You need two fields to play with. The first holds a return delimited list of eight (or fewer) letter "latin" words. The second is for the results. Make a button with:
Code: Select all
on mouseUp
put "ABGDEZHQIKLMNX" into latinString
put fld 1 into test
repeat with y = 11 to 24 --just not to have to deal with "03" for example
put y into latin[char (y - 10) of latinString]
end repeat
put test into newString
repeat for each char tChar in test
if tChar = return then next repeat
replace tChar with latin[tChar] in newString
end repeat
Sort newString numeric by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each
put newString into fld 2
end mouseUp
Let me know where you take this.
Craig Newman
Re: Different alphabetical orders
Posted: Thu Apr 25, 2013 10:38 am
by danielrr
Thanks Craig,
I took the concept and developed a little bit (not necessarily in the right direction). Here's what I did:
sortByAnyOrder is a function that should return the text received as second parameter returned by the alphabetical order received as first parameter. I implemented for Greek Betacode (the most standard way to transliterate polytonic Greek, since you asked). There's a lot of ugly things in the function, but hey, I'm a newbie at LC.
The main problem (probably not really the main one, but still a problem) is that using this approach the low case words appear all of them after all the uppercase words with the same initial (that is "Laura, lamb, luck", not "lamb, Laura, Luck" as they should). This is a problem only with alphabets with the upper and lowercase difference (alphabets derived from Greek alphabet, including the Latin alphabet) but a problem anyway.
All comments to this approach are most welcome
Code: Select all
on mouseUp
put "ABGDEZHQIKLMNXOPRSTUFXYWabgdezhqiklmnxoprstufxyw" into greekOrder --betacode, that is
--NO!, better
put "AaBbGgDdEeZzHhQqIiKkLlMmNnXxOoPpRrSsTtUuFfXxYyWw" into greekOrder --slightly better but still all the capitalized word appear on top of
--the words with the same initial
put ")/andra moi e)/nnepe, Mou=sa, polu/tropon, o(\s ma/la polla\ pla/gxqh, e)pei\ Troi/hs i(ero\n ptoli/eqron e)/perse:" into test
put sortByAnyOrder(greekOrder,test) into newOrder
put newOrder into TheWorld
end mouseUp
function sortByAnyOrder whichOrder,theWords
replace " " with return in theWords
replace "," with "" in theWords
set the caseSensitive to true
repeat with y = 10 to number of chars of whichOrder + 9 --
put "@" & y into latin[char (y - 9) of whichOrder]
end repeat
put theWords into newString
repeat for each char tChar in whichOrder
replace tChar with latin[tChar] in newString
end repeat
repeat with x = 1 to number of lines of newString --there must be a less ugly way to do this
put "," & line x of theWords after line x of newString
end repeat
sort lines of newString ascending text by item 1 of each
repeat with x = 1 to number of lines of newString --again, there must be a less ugly way to strip the first item
delete item 1 of line x of newString
end repeat
set the caseSensitive to false
return newString
end sortByAnyOrder
dunbarx wrote:Daniel.
Played around a bit. There are things that still need to be done with the following script, but it should be a good head start.
I assume your original "latin" alphabet: "ABGDEZHQIKLMNX"
You need two fields to play with. The first holds a return delimited list of eight (or fewer) letter "latin" words. The second is for the results. Make a button with:
Code: Select all
on mouseUp
put "ABGDEZHQIKLMNX" into latinString
put fld 1 into test
repeat with y = 11 to 24 --just not to have to deal with "03" for example
put y into latin[char (y - 10) of latinString]
end repeat
put test into newString
repeat for each char tChar in test
if tChar = return then next repeat
replace tChar with latin[tChar] in newString
end repeat
Sort newString numeric by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each
put newString into fld 2
end mouseUp
Let me know where you take this.
Craig Newman
Re: Different alphabetical orders
Posted: Thu Apr 25, 2013 2:15 pm
by dunbarx
Daniel.
Nothing like playing around endlessly.
Doesn't the case-sensitive ordering of your custom alphabet solve the issue of lower/upper case?
But my numeric pseudo alphabet should sidestep whatever problems arise in that regard, and would encompass diacritical or any other variants as well. By simply encoding strings of characters into strings of numbers, based only on the order of those characters in the alphabet string, I would think that all problems are resolved.
So "a,A,b,B..." would encode as "11,12,13,14..." The encoding hard wires the sorting. Or am I missing something? What is going wrong?
Craig Newman
Re: Different alphabetical orders
Posted: Thu Apr 25, 2013 2:32 pm
by danielrr
Doesn't the case-sensitive ordering of your custom alphabet solve the issue of lower/upper case?
Not quite. My function puts the words with uppercase initial on top of all the words with the same initial. That is "Laura, Lebanon, Lost, last, lost, lancet etc" Usually this is not what you are looking for. In order to do a proper sorting you need to take in account the first and second character.
But my numeric pseudo alphabet should sidestep whatever problems arise in that regard, and would encompass diacritical or any other variants as well. By simply encoding strings of characters into strings of numbers, based only on the order of those characters in the alphabet string, I would think that all problems are resolved.
but then ¿how do you recover the original text? If I am not missing something, in your script you end up with a numeric string, but what you need is the reordered list of words
Re: Different alphabetical orders
Posted: Thu Apr 25, 2013 5:41 pm
by dunbarx
Daniel.
You need to sort all the characters, not just the first and second. The compound sort command, using the concatenated sets of local "each" variables, addresses this. I only went to the fifth char in my script (up to char 10 in the paired digit schema), but this can be extended as required.
Right?
I had alluded several times to the fact that the coded strings had, of course, to be restored to plain text. This was intended to be left to you. Do you see how the characters were encoded? I recently used an array variable for compactness, but earlier, I had written a brute force example. Please step through that, and step through the script that encodes using the array. If you see how they were encoded, you should be able to work out how to restore.
So I still leave this to you. If you are not yet comfortable with arrays, try to write it all out explicitly. If you still have problems, write back. But please do try.
Craig Newman
Re: Different alphabetical orders
Posted: Sat Apr 27, 2013 5:12 pm
by danielrr
Hi Dunbar. Apologies for the delay in answering
First I'll answer myself by posting a new function that improves my previous one. This one correctly sorts text written in other alphabets, provided that you imput the right alphabetic order, and places a word starting with uppercase initial just before the lower case form of the same string. Still, it's ugly. Here it is:
Code: Select all
on mouseUp
put "abgdezhqiklmnxoprstufxyw" into greekOrder
put "ABGDEZHQIKLMNXOPRSTUFXYW" into upperCase
put ")/andra moi e)/nnepe, Mou=sa, polu/tropon, o(\s ma/la polla\ pla/gxqh, e)pei\ Troi/hs i(ero\n ptoli/eqron e)/perse:" into test
put sortByAnyOrder(greekOrder,test,upperCase) into newOrder
put newOrder into TheWorld
end mouseUp
function sortByAnyOrder whichOrder,theWords,upperCase
replace " " with return in theWords
replace "," with "" in theWords
set the caseSensitive to false
repeat with y = 10 to number of chars of whichOrder + 9 --
put "@" & y into latin[char (y - 9) of whichOrder]
end repeat
put theWords into newString
repeat for each char tChar in whichOrder
--if tChar = return then next repeat
replace tChar with latin[tChar] in newString
end repeat
set the caseSensitive to true
repeat with x = 1 to number of lines of newString --there must be a less ugly way to do this
if char 1 of line x of theWords is in upperCase
then
put "@0," & line x of theWords after line x of newString
else
put "@1," & line x of theWords after line x of newString
end if
end repeat
set the caseSensitive to false
Sort lines of newString ascending text by item 1 of each--by char 1 to 2 of each & char 3 to 4 of each & char 5 to 6 of each & char 7 to 8 of each
repeat with x = 1 to number of lines of newString --again, there must be a less ugly way to strip the first item, I'm a nwebbie in LC
delete item 1 of line x of newString
end repeat
return newString
end sortByAnyOrder
And now I'd like to reply to your questions. First, all my gratitude for all your time with this newcomer
dunbarx wrote:Daniel.
You need to sort all the characters, not just the first and second. The compound sort command, using the concatenated sets of local "each" variables, addresses this. I only went to the fifth char in my script (up to char 10 in the paired digit schema), but this can be extended as required.
Right?
I had alluded several times to the fact that the coded strings had, of course, to be restored to plain text. This was intended to be left to you. Do you see how the characters were encoded? I recently used an array variable for compactness, but earlier, I had written a brute force example. Please step through that, and step through the script that encodes using the array. If you see how they were encoded, you should be able to work out how to restore.
So I still leave this to you. If you are not yet comfortable with arrays, try to write it all out explicitly. If you still have problems, write back. But please do try.
Craig Newman
I tried your function and it doesn't work. I mean it doesn't order the strings by the alphabetic order of choice. Try for yourself, and tell me if I did anything wrong.
On the other hand, and this is what intrigues me the most, I don't know of a straightforward way to restitute the numerical strings into its alphabetical (original) equivalent). I mean, if I have a string "2112" formed by the 21st character followed by the 12st character, and I start using "replace", my first replacement will be (wrongly) transforming the middle two characters into the 11th character. Yes I can imagine several, convoluted ways to do it in the right way but ¿which is the simple way to do it? Could you provide an example with code?
thanks, and best wishes,
Daniel