Regex

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Traxgeek
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 281
Joined: Wed Jan 09, 2013 10:11 am

Regex

Post by Traxgeek » Sun Nov 30, 2014 4:45 pm

Hi,

I'm trying to use regex... and failing miserably. I have read a load of posts here (biggest contributor was Thierry), on Google (mostly StackOverflow) and read the Wikipedia 'how to' article but I still find understanding regex enough to create a simple 'search-for-and-remove-some-text' script eludes me !

I thought that if someone would be good enough to provide some help with my precise requirement I might be able to 'reverse engineer' it and apply the various instructions I've bookmarked to see precisely how the components of the specific regex statement work in my particular case.

An example of my (HTML) text :
<p></p>
<ul type="square">
<li>
<p firstindent="-36" leftindent="36" rightindent="15"><b><font face="Arial" size="12"
color="#262626" bgcolor="#FFFFFF">Free App of the Day (FAD) eligibility</font></b><font face="Arial" size="12" color="#262626" bgcolor="#FFFFFF">&nbsp;If the checkbox labeled&nbsp;<b>Yes, please consider this app for the program</b>&nbsp;is checked (this is the default), Amazon may select your app for this promotional program. If Amazon selects your app for FAD, Amazon will contact you with more details about what to expect as your app goes through the testing and approval process.</font></p>
</li>
</ul><p></p>


An example of what I'm trying to do :
(If I can work out how the regex statement works for this example than I should be able to expand it to do other things - well, that's my idea... :)
Isolate and remove all instances of, say, the color tag (red, bold text) but for ANY colour/value between the quotes, meaning that :
<p firstindent="-36" leftindent="36" rightindent="15"><b><font face="Arial" size="12" color="#262626" bgcolor="#FFFFFF">
becomes :
<p firstindent="-36" leftindent="36" rightindent="15"><b><font face="Arial" size="12" bgcolor="#FFFFFF">

I figure I can then start to work out the rest...

Can anyone enlighten me please ?
Thanks a million.

Trax.
I'm 'getting there'... just far too slowly !
Mac (Siera) and PC (Win7)
LiveCode 8.1.2 / 7.1.1

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Regex

Post by Thierry » Sun Nov 30, 2014 7:23 pm

Traxgeek wrote: I'm trying to use regex...
An example of what I'm trying to do :
Isolate and remove all instances of, say, the color tag (red, bold text) but for ANY colour/value between the quotes, meaning that :
<p firstindent="-36" leftindent="36" rightindent="15"><b><font face="Arial" size="12" color="#262626" bgcolor="#FFFFFF">
becomes :
<p firstindent="-36" leftindent="36" rightindent="15"><b><font face="Arial" size="12" bgcolor="#FFFFFF">
Hi Traxgeek,

Here is one for a start:

Code: Select all

   put replaceText( yourHtmlText, "\scolor=.#[A-F0-9]{6}.", empty) into whatever
PS: I'm using the dot to match the quote; I am a lazy man :roll:

HTH,
Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Traxgeek
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 281
Joined: Wed Jan 09, 2013 10:11 am

Re: Regex

Post by Traxgeek » Sun Nov 30, 2014 11:20 pm

Hi Thierry,

Really, really appreciated.
AND I understand pretty much how it works (I think :D ).
I don't understand exactly what " I'm using the dot to match the quote; I am a lazy man" means but it's my homework !! :D
Not in my office right now (so can't try my theory) but I think by modifying your script to :
put replaceText( yourHtmlText, "\scolor|\sbgcolor=.#[A-F0-9]{6}.", empty) into whatever
should then remove BOTH the color AND the bgcolor tags ?
Anyways, I'm looking forward to trying it all out tomorrow.

Again, fantastic. Really appreciated. Thanks.

Trax.
I'm 'getting there'... just far too slowly !
Mac (Siera) and PC (Win7)
LiveCode 8.1.2 / 7.1.1

rkriesel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 119
Joined: Thu Apr 13, 2006 6:25 pm

Re: Regex

Post by rkriesel » Mon Dec 01, 2014 6:00 am

Hi, Trax.

Here's another way. This one avoids alternation and ranges, so I'd guess it'd be faster.

Code: Select all

put replaceText(t1, "\s(?:bg)?color=\" & quote & "#[[:xdigit:]]{6}\" & quote, empty) into t2
The above (\" & quote & ") is more rigorous than the lazy (.) but probably equivalent in your data.

-- Dick

Traxgeek
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 281
Joined: Wed Jan 09, 2013 10:11 am

Re: Regex

Post by Traxgeek » Mon Dec 01, 2014 8:42 am

Hi Dick,

Thanks to you too !
I'm one happy bunny :D and off to 'play'... (and do my 'homework' - trying to figure out why these two scripts work...) Happy days...
Much appreciate chaps.

Trax.
I'm 'getting there'... just far too slowly !
Mac (Siera) and PC (Win7)
LiveCode 8.1.2 / 7.1.1

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Regex

Post by Thierry » Mon Dec 01, 2014 8:58 am

Hi Thierry,
Really, really appreciated.
AND I understand pretty much how it works (I think :D ).
Glad that you understand it :)
I don't understand exactly what " I'm using the dot to match the quote; I am a lazy man" means but it's my homework !! :D
Guess this need a bit of clarificaion.
The first thought would be to put a quote instead of a dot, as it is exactly what you are expecting from your data.
But, as it is Livecode, you can't just write a quote; so you have to type this instead of the dot: " & quote & "
And because it was sunday evening, I was the lazy guy not to type all this.
So, the dot is not lazy, and in fact will do pretty well the job
and even a little faster than a quote (no test inside the regex engine)
But this is just ridiculous to think about this; most of the time you won't see any difference in term of speed.
put replaceText( yourHtmlText, "\scolor|\sbgcolor=.#[A-F0-9]{6}.", empty) into whatever
should then remove BOTH the color AND the bgcolor tags ?
Almost.

Code: Select all

put replaceText( yourHtmlText, "\s(?:color|bgcolor)=.#[A-F0-9]{6}.", empty) into whatever
or

Code: Select all

put replaceText( yourHtmlText, "\s(?:bg)?color=.#[A-F0-9]{6}.", empty) into whatever
There is also a POSIX way of writing this [A-Fa-f0-9] -> [[:xdigit]]
This is only syntaxic sugar, as the [[:xdigit]] will be translated to [A-Fa-f0-9]
And you won't win a nano second, even with Gbytes of data; so it's more a matter of what you like, nothing else.

About performance, writing: (?:bg)?color or (?:color|bgcolor), in most situation won't make much difference!!!
Except if you know your data, then you can optimize your regex.
But in your case, I don't think you have to take care of that. If you really need to understand why,
well, ask and I'll explain in more details...

Happy regexing :)

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Traxgeek
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 281
Joined: Wed Jan 09, 2013 10:11 am

Re: Regex

Post by Traxgeek » Wed Dec 03, 2014 8:09 am

Hi Thierry,

Amazing. I've now spent quite a few hours on Regex since your help and it's a powerful (if little confusing to read) topic. I've been practising removing multiple specific tags using the ' | ' (or) symbol and playing with the Posix A-F method. A lot to learn (and retain).

Much obliged - especially for the explanations after your initial repsonse. Really useful.

Trax
I'm 'getting there'... just far too slowly !
Mac (Siera) and PC (Win7)
LiveCode 8.1.2 / 7.1.1

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Regex

Post by Thierry » Wed Dec 03, 2014 9:23 am

Traxgeek wrote:Hi Thierry,

Amazing. I've now spent quite a few hours on Regex since your help and it's a powerful (if little confusing to read) topic. I've been practising removing multiple specific tags using the ' | ' (or) symbol and playing with the Posix A-F method. A lot to learn (and retain).

Much obliged - especially for the explanations after your initial repsonse. Really useful.
Hi Trax,

Thanks for the positive feedback.

Yes, the syntax is a bit terse when starting,
but there is not that much to learn; and obviously you are doing well :)

Good luck,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Post Reply