matchText issues

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
battleman100
Posts: 3
Joined: Wed Oct 29, 2014 4:01 pm

matchText issues

Post by battleman100 » Wed Oct 29, 2014 4:16 pm

Hi all

Bit of a LiveCode newbie; wondering if anyone could tell me what I've done wrong with my code.

I have two text fields one called 'Input' and one called 'Output'. I'm using a regular expression to find HTML tags inside 'Input' then, upon clicking a button, listing the tags used within the document in 'Output'. Here's my code thus far:

Code: Select all

on mouseDown
   put field "Input" into MyVar
   if matchText(MyVar,"</?\w+((\s+\w+(\s*=\s*(?:.*?|'.*?'|[^'>\s]+))?)+\s*|\s*)/?>", varTags) then
      put varTags into field "Output"
   end if

end mouseDown
However, this produces no output! I thought it might be the regular expression, but it definitely holds a 'true' value on testing; I've also used the replaceText command to double check, too (again without any issues)!

MaxV
Posts: 1580
Joined: Tue May 28, 2013 2:20 pm
Contact:

Re: matchText issues

Post by MaxV » Wed Oct 29, 2014 5:43 pm

First of all the regular expression seems wrong to me, there are "/" that are unescaped. What do you want to check?
Try this site: http://www.regexr.com/
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: matchText issues

Post by Thierry » Wed Oct 29, 2014 5:57 pm

Hello Sir battleman,

First, matchText() can return true even if you have no capture groups set!
Without looking much, I guess your regex will match "<div>", leaving all capture groups empty.

Therefore, I humbly suggest that you enclosed all your regex with parentheses; at least for testing.

Second, as you wrote varTags with an s, are you aware, that you have one var for one capture group?
In your case, you need 3 vars. Please, check dictionary..

Here is something untested which should give you more verbose informations:
- just add outer () to your regex..

Code: Select all

   put "(</?\w+((\s+\w+(\s*=\s*(?:.*?|'.*?'|[^'>\s]+))?)+\s*|\s*)/?>)" into Rx
   if matchText(MyVar, Rx, v1, v2, v3, v4) then
      put  format( "Found: v1: %s, v2: %s, v3: %s, v4: %s", v1, v2, v3, v4) into  field "Output"
   else
      put "No match!" into  field "Output"
   end if
HTH,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

battleman100
Posts: 3
Joined: Wed Oct 29, 2014 4:01 pm

Re: matchText issues

Post by battleman100 » Wed Oct 29, 2014 7:12 pm

Many thanks to you all!

Yes, my regex can only be explained by a copy/paste disaster :lol: silly me - no wonder it didn't work!

I have reformulated as below with my three variables set to capture all three groups (specifically capturing the 2nd group ready for export to my Output field):

Code: Select all

on mouseDown

   put field "Input" into MyVar
   
   if matchText(MyVar,"(<)([ -0-9a-zA-Z:]*[ 0-9a-zA-Z;]*)(>)", OpenTag, varWords, CloseTag) then 
      put varWords into field "Output"
   end if
end mouseDown
The question now is, how do I iterate the matchText command for all items held in the MyVar variable e.g. taking the following input:

Code: Select all

<html><body>hello</body></html>
and then exporting

Code: Select all

html, body, /body, /html
Again, I'm sure there's an obvious answer I am oblivious to! Thanks again all!

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: matchText issues

Post by Thierry » Wed Oct 29, 2014 7:27 pm

battleman100 wrote: The question now is, how do I iterate the matchText command for all items held in the MyVar variable
For this, it's easier to use the matchChunk() function.

Do a: repeat while matchChunk() ...

and at the end of your repeat loop, delete char 1 to positionOfyourVarMatching of MyVar

Check some of my answers in this forum, I'm pretty sure I've already put some code
which does just that...

Regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

battleman100
Posts: 3
Joined: Wed Oct 29, 2014 4:01 pm

Re: matchText issues

Post by battleman100 » Wed Oct 29, 2014 8:18 pm

That's absolutely fantastic, Thierry! A big, big thank you - I bow down to your supreme regex/livecode knowledge! :D

For reference, this is the approach I took, modifying Thierry's code from here:

Code: Select all

on mouseDown
   put field "Input" into MyVar
   repeat while matchChunk(MyVar, "(<[ -0-9a-zA-Z:]*[ 0-9a-zA-Z;]*>)", p1start,p1End)
      put char p1Start+1 to p1End-1 of MyVar & cr after field "Output"
      delete char 1 to p1End of MyVar
   end repeat
end mouseDown
Essentially, making my regular expression the whole HTML tag (e.g. <div>, <html>, etc), matching the chunk to the expression, then outputting each HTML tag it finds to the output field, except one character ahead and one character behind to remove the chevron braces, followed by a carriage return.

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: matchText issues

Post by Thierry » Wed Oct 29, 2014 8:42 pm

battleman100 wrote:That's absolutely fantastic, Thierry! A big, big thank you
You're welcome :)

Code: Select all

on mouseDown
   put field "Input" into MyVar
   repeat while matchChunk(MyVar, "(<[ -0-9a-zA-Z:]*[ 0-9a-zA-Z;]*>)", p1start,p1End)
      put char p1Start+1 to p1End-1 of MyVar & cr after field "Output"
      delete char 1 to p1End of MyVar
   end repeat
end mouseDown
This will work too:

Code: Select all

   repeat while matchChunk(MyVar, "<([ -0-9a-zA-Z:]*[ 0-9a-zA-Z;]*)>", p1start,p1End)
      put char p1Start to p1End of MyVar & cr after field "Output"
      delete char 1 to p1End of MyVar
   end repeat
Regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Post Reply