Page 1 of 1

Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Posted: Sat Jul 24, 2021 3:18 am
by rodneyt
Hi everyone,

I'm wondering if knowledgable people here are aware of any good text processing libraries for Livecode.

The first problem I'm interested in is processing address information - where a user may have copied name, address and URL information, and (as reliably as possible) breaking this up into constituent elements.
Examples include recognising cities, countries, phone numbers, rules for breaking name strings etc. I'd also be interested in ability to identify and extract URIs from a text (which might be in html format or plain text). Website URLs, email addresses, twitter handles - that sort of thing.

A lot of this string pattern matching, and I can think of lots of ways of doing this, but it occurs to me it's a pretty standard problem, so it's likely there is an existing solution.

Perhaps there is a more general text processing library that allows one to specify a set of rules and actions (e.g. processing a set of rules and building up results into a property array).

I can think of ways of doing all of this, but before I start rolling my own solution I thought it worth checking.

~ Rodney

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Posted: Thu Aug 12, 2021 9:54 am
by MaxV
Hello,
this code extract all email adresses from a field:

Code: Select all

on MouseUp
   put field 1 into testo
   repeat forever
      if  matchText(testo, "((\w|\.)+@(\w|\.)+)" , trovato) then 
         put trovato & return after listaEmail
         put matchChunk(testo, "((\w|\.)+@(\w|\.)+)" , inizio, fine)
         put char fine to -1 of testo into testo
      else 
         exit repeat
      end if
   end repeat
   put ListaEmail
end MouseUp

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Posted: Thu Aug 12, 2021 10:37 am
by richmond62
I don't think you need any libraries, after all:

1. email addresses always have an ampersand (@) in them.

2. URLs always have "www." in them.

3. Telephone numbers usually contain multiple digit numbers.

4. Addresses almost always contain "street"/"avenue"/"boulevard"/"square"/"plaza"/"place"
or their abbreviations.

So . . . if you have, say, comma-delimited text strings containing these things in random order
running each line through a SWITCH statement and then reordering those items in a list field should not be
unduly difficult.

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Posted: Thu Aug 12, 2021 2:06 pm
by stam
richmond62 wrote:
Thu Aug 12, 2021 10:37 am
I don't think you need any libraries, after all:

1. email addresses always have an ampersand (@) in them.

2. URLs always have "www." in them.

3. Telephone numbers usually contain multiple digit numbers.

4. Addresses almost always contain "street"/"avenue"/"boulevard"/"square"/"plaza"/"place"
or their abbreviations.

So . . . if you have, say, comma-delimited text strings containing these things in random order
running each line through a SWITCH statement and then reordering those items in a list field should not be
unduly difficult.
erm... ampersand = "&", not "@" ;)

Re: emails - all emails contain the '@' but not all '@' signify an email.
You probably not only want to detect the "@" but also assess the validity of the email format (for example stam@gmail is not a valid email - or sometimes people will address someone with an @ handle, for example @Richmond - not a valid email ;)). I've 'borrowed' the algorithm generously provided with the liveCloud starter solutions which works well:

Code: Select all

function isValidEmailFormat pEmail
    # PURPOSE : returns boolean describing valilidty of  email provided
    return matchText(pEmail,"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$")
end isValidEmailFormat
URLs do not always have 'www' in them. The venerable mothership's URL is livecode.com.

And addresses can vary significantly and not include any of the keywords you mention, so that's not reliable either (for example my street address only consists of the name of a hill in London with no other qualifiers).
Not so straightforward once you dig into the detail...

And besides, i think the OP was asking if there was a ready made solution or should he roll his own...

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Posted: Thu Aug 12, 2021 3:10 pm
by richmond62
erm... ampersand = "&", not "@"
Erm, Yes: the 'at' sign; at least in Bulgarian it has
a name: кломба.

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Posted: Thu Aug 12, 2021 4:25 pm
by stam
That's all hungarian to me...

there's a world for it in Greek as well: Παπάκι, which means duckling - don't ask me why it's the name of the 'at' sign...