Normalizing text
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Normalizing text
Hello,
Is there a way to convert text with accentuation into a "normalized" text ?
For example: transform "Béla Bartók" into "Bela Bartok"
I cannot seem to get the function "normalize" (https://livecode.fandom.com/wiki/NormalizeText) to work, but I may not be using it correctly.
			
			
									
									
						Is there a way to convert text with accentuation into a "normalized" text ?
For example: transform "Béla Bartók" into "Bela Bartok"
I cannot seem to get the function "normalize" (https://livecode.fandom.com/wiki/NormalizeText) to work, but I may not be using it correctly.
Re: Normalizing text
Hi,
I tried to work first time with normalizeText function but i dont understand yet.
I use this for similar goal:
Best regards
Jean-Marc
			
			
													I tried to work first time with normalizeText function but i dont understand yet.
I use this for similar goal:
Code: Select all
function fromAccentToNot pText
   put "àâäçéèêëîïòôùû" into tAccent
   put "aaaceeeeiioouu" into tNoAccent
   put 0 into tCount
   repeat for each char tChar in tAccent
      add 1 to tCount
      replace tChar with char tCount of tNoAccent in pText
   end repeat
   return  pText
end fromAccentToNottJean-Marc
					Last edited by jmburnod on Sun Aug 18, 2019 9:52 am, edited 1 time in total.
									
			
									https://alternatic.ch
						Re: Normalizing text
Normalize isn't the right term. It only works with a few characters that are visually identical but which have different unicode values. It isn't meant to remove diacriticals, those are actually part of the character itself.
I think Jean-Marc's method is all you can do.
			
			
									
									I think Jean-Marc's method is all you can do.
Jacqueline Landman Gay         |     jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
						HyperActive Software | http://www.hyperactivesw.com
Re: Normalizing text
@Jean-Marc
Two considerations.
1. One should possibly use the casesensitive.
2. Sometimes 'accented' chars are replaced by two chars, for example "Ä" with "Ae" or "ß" with "sz".
Here is a mapping that I once used (=fld "Mapping" in the code below).
Then the "Replace" button code is alike yours:
			
			
									
									Two considerations.
1. One should possibly use the casesensitive.
2. Sometimes 'accented' chars are replaced by two chars, for example "Ä" with "Ae" or "ß" with "sz".
Here is a mapping that I once used (=fld "Mapping" in the code below).
Code: Select all
Á to A
Á to A
Ä to Ae
 to A
À to A
à to A
Å to A
Č to C
Ç to C
Ć to C
Ď to D
É to E
Ě to E
Ë to E
È to E
Ê to E
Ẽ to E
Ĕ to E
Ȇ to E
Í to I
Ì to I
Î to I
Ï to I
Ň to N
Ñ to N
Ó to O
Ö to Oe
Ò to O
Ô to O
Õ to O
Ø to O
Ř to R
Ŕ to R
Š to S
Ť to T
Ú to U
Ů to U
Ü to Ue
Ù to U
Û to U
Ý to Y
Ÿ to Y
Ž to Z
á to a
ä to ae
â to a
à to a
ã to a
å to a
č to c
ç to c
ć to c
ď to d
é to e
ě to e
ë to e
è to e
ê to e
ẽ to e
ĕ to e
ȇ to e
í to i
ì to i
î to i
ï to i
ň to n
ñ to n
ó to o
ö to oe
ò to o
ô to o
õ to o
ø to o
ð to o
ř to r
ŕ to r
š to s
ť to t
ú to u
ů to u
ü to ue
ù to u
û to u
ý to y
ÿ to y
ž to z
þ to b
Þ to B
Đ to D
đ to d
ß to sz
Æ to AE
Œ to OE
æ to ae
œ to oeCode: Select all
on mouseUp
  lock screen; lock messages
  put fld "TextIn" into s
  put fld "Mapping" into m
  replace " to " with comma in m
  set the casesensitive to true
  repeat for each line L in m
    replace (item 1 of L) with (item 2 of L) in s
  end repeat
  put s into fld "TextOut"
end mouseUp
shiftLock happens
						Re: Normalizing text
Thank you for your answers. I wish there were a simpler way of doing this, but the above code will certainly do the job.
			
			
									
									
						Re: Normalizing text
Possibly not "simpler" but for long input strings and replacement mappings 20-30 times faster by using regular expressions (btn "ReplaceText"):
The replacement mapping (fld "Mapping2"):
			
			
									
									Code: Select all
on mouseUp
  put the millisecs into m1
  lock screen; lock messages
  put fld "TextIn" into s
  put length(s) into n
  put fld "Mapping2" into m2
  replace " to " with comma in m2
  repeat for each line L in m2
    put replaceText(s,item 1 of L,item 2 of L) into s
  end repeat
  put s into fld "TextOut2"
  put n & ": " & (the millisecs - m1) && (s is fld "TextOut") into fld "timing"
end mouseUpCode: Select all
Á|Á|Â|À|Ã|Å to A
Þ to B
Č|Ç|Ć to C
Ď|Đ to D
É|Ě|Ë|È|Ê|Ẽ|Ĕ|Ȇ to E
Í|Ì|Î|Ï to I
Ň|Ñ to N
Ó|Ò|Ô|Õ|Ø to O
Ř|Ŕ to R
Š to S
Ť to T
Ú|Ů|Ù|Û to U
Ý|Ÿ to Y
Ž to Z
á|â|à|ã|å to a
þ to b
č|ç|ć to c
ď|đ to d
é|ě|ë|è|ê|ẽ|ĕ|ȇ to e
í|ì|î|ï to i
ň|ñ to n
ó|ò|ô|õ|ø|ð to o
ř|ŕ to r
š to s
ť to t
ú|ů|ù|û to u
ý|ÿ to y
ž to z
Æ to AE
Ä to Ae
Œ to OE
Ö to Oe
Ü to Ue
ä|æ to ae
ö|œ to oe
ß to sz
ü to ue
shiftLock happens
						Re: Normalizing text
Hi,
@Hermann
Thank you for doing the job
Jean-Marc
			
			
									
									@Hermann
Yes, you're rightTwo considerations...
Thank you for doing the job
Jean-Marc
https://alternatic.ch
						- 
				richmond62
- Livecode Opensource Backer 
- Posts: 10202
- Joined: Fri Feb 19, 2010 10:17 am
Re: Normalizing text
That's odd:"ß" with "sz"
Straße = Strasse

