Page 1 of 1
Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 9:39 am
by golife
Subject: Find text between two different delimiters/tags using Regex
Frequently, I am extracting the text data between two different tags or delimiters of varying length.
I like to use a Regex function, but even though trying various models published, I failed to get it working. Maybe someone has an idea to use the function below as a Regex function?
My LiveCode function looks like this:
Usage:
... Extract a number enclosed in parenthesis "(" and ")"
... Extracting text between a start tag, for example "---", and an ending tag "---/"
... Defining one's own tags to extract text between such tags
... Extract tags from HTML or any other tagged source code
Note:
This function only returns the FIRST instance in a string. To go through a whole document, the function needs to be extended.
Code: Select all
// Example 1: Using "(" as the beginning tag and ")" as the ending tag
put "James hit the ball in the (23)rd soccer game in Milano." into tText
put filterTags (tText, "(,)" )
Code: Select all
// Example 2: Using beginning tag "---" and "---/" as ending tag
put "James played the ball in the at the ---23rd---/ soccer game." into tText
put filterTags (tText, "---,---/" )
Code: Select all
function filterTags pString,pDels
## Extract text that is between two different delimiters/tags
if pString = "?" then
return "filterTags ( pString, pDels ). "& \
"Param 'pDels': One or two delimiters ('tags','separators') "& \
"as comma separated items."
end if
local a,b
if the number of items of pDels is 2 then
put item 1 of pDels into pDel1
put item 2 of pDels into pDel2
else
return empty
end if
if pDel1 is pDel2 then return empty -- requires two tags that are different
put offset( pDel1 , pString ) + length( pDel1 ) into a
put offset( pDel2 , pString ) - 1 into b
if a > 0 AND b > a then
return char a to b of pString
else
return empty
end if
end filterTags
s
Regards, Roland (golife)
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 10:52 am
by grzkmo
Without regex:
Card with 2 Fields (f1, f2) and 1 Button
Text in fld "f1":
<div class="entry">
<h1>{{title}}</h1>
<div class="body">
{{body}}
</div>
</div>
A handlebars expression is a {{, some contents, followed by a }}
Code of field "f1":
Code: Select all
on mouseUp pMouseButton
constant kLd = "}}"
constant kId = "{{"
put fld "f1" into tText
set the linedelimiter to kLd
set the itemdelimiter to kId
repeat for each line tLine in tText
put the last item of tLine & cr after tFound
end repeat
delete the last char of tFound
put tFound into fld "f2"
end mouseUp
After mouse clicked the Text of fld "f2" will be:
title
body
, some contents, followed by a
Best
Günter
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 10:57 am
by grzkmo
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 11:21 am
by AxWald
From my Library-Stack:
Code: Select all
function inBeet_BL theString, theStart, theEnd, offNum
/* inBeet_BL() is a very fast function - it extracts data between 2 search hits.
# theString (String): The data you're searching in;
# theStart, theEnd (String): what is before and after your desired text,
. assumed that theEnd comes AFTER theStart, and that at least theEnd is not empty;
. If you look for something at the beginning of theString, leave theStart empty -
. in case of mode 2 (2-line return) you need to set offNum to 0.
# OffNum (Int, optional): skip [offNum] chars at the beginning of theString;
. 2 modes of action, depending on the value of offNum:
- Mode 1: When offNum is empty, it just returns the found string, or empty;
- Mode 2: When offNum <> empty, it returns a 2-line result:
. line 1 is the pos of the last char touched in theString (= last char found of theEnd)
. and line 2 is the found string. Use line 1 as offNum for your next repeat!
What negative values do for offNum is left as an exercise - it has a (strange) use too!
axwald @ forums.livecode.com, GPL v3, 10/2020 */
if (offNum is not empty) then
if theStart is empty then
put 1 into theStart
else
put offset(theStart,theString,offNum) + len(theStart) + offNum into myStart
end if
if myStart is (len(theStart) + offNum) then return empty
put offset(theEnd,theString,myStart) + (myStart)-1 into myEnd
if myEnd is (myStart)-1 then return empty
return myEnd & CR & char myStart to myEnd of theString
else
return char (offset(theStart,theString) + len(theStart)) to \
(offset(theEnd,theString,(offset(theStart,theString) + len(theStart))) \
+ (offset(theStart,theString) + len(theStart))-1) \
of theString
end if
end inBeet_BL
Have fun ;-)
PS: I don't use RegExp - I just detest the cryptic syntax. But they are mighty tools for those that can bear 'em - I'm sure Thierry will chime in with some breathtaking solution :)
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 1:34 pm
by stam
i think the regex to match text between 2 different delimiters (delim1 and delim2 in this example) would be
hope that helps..
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 3:43 pm
by dunbarx
What am I missing here? Why does this not do what is required? With a field 1 with text in it, and a button with this in its script:
Code: Select all
on mouseUp
answer twoTags(fld 1,comma,"epoch")
end mouseUp
function twoTags tText,firsttag,secondTag
set the itemDel to firstTag
put item 2 to 1000 of tText into temp
set the itemDel to secondTag
return item 1 of temp
end twoTags
I had this in my own field 1"
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair.
I got back this:
it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the
Craig
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 4:09 pm
by Thierry
stam wrote: ↑Sat Apr 03, 2021 1:34 pm
hope that helps..
Hello Stam,
here is a little saturday night quiz...
I've replaced delim1 and delim2 by _1 and _2
Code: Select all
on mouseUP
constant T = "_1qwerty_2 _1asdf_2"
put "Text in: " &tab& T &cr& \
"regex:" &tab& "result:" &cr into fld 1
constant rexStam1 = "(?<=_1)(.*)(?=_2)"
if matchText( T, rexStam1, gotIt) then
put rexStam1 &tab& gotIt &cr after fld 1
end if
constant rexStam2 = "_1(.*)_2"
if matchText( T, rexStam2, gotIt) then
put rexStam2 &tab& gotIt &cr after fld 1
end if
-- constant rexTdz1 = "?????"
if matchText( T, rexTdz1, gotIt) then
put "rexTdz1" &tab& gotIt &cr after fld 1
end if
end mouseUP
and the corresponding results:

- screenshot 2021-04-03 à 16.50.16.jpg (17.18 KiB) Viewed 14208 times
Hint: I added 1 meta char in your regex. so what's in rexTdz1 ?
Of course, this doesn't resolve the OP question, as he wants all the pattern occurences,
and not only the 1st.
Enjoy or not
Thierry
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 4:47 pm
by dunbarx
Thierry wrote:
Of course, this doesn't resolve the OP question, as he wants all the pattern occurences,
and not only the 1st.
I did not know that. But anyway, and like Axwald I also do not love regex (though I admire it) this "old fashioned" gadget works. I had:
abc, def epoch, hij epoch klm, nop epoch qrs , tuv epoch wxyz epoch
in field 1 and this in the button script. Comma and "epoch" are the two "tags". A little recursion never hurts:
Code: Select all
on mouseUp
answer twoTags(fld 1,comma,"epoch")
end mouseUp
function twoTags ttext,firstTag,secondTag,accum
set the itemDel to firstTag
put item 2 to 10000 of tText into temp
set the itemDel to secondTag
put item 1 of temp after accum
delete item 1 of tText
if firstTag is in tText and secondTag is in tText then
put twoTags(tText,firstTag,secondtag,accum) after accum
else
return accum
end if
end twoTags
I get:
def hij nop tuv
Craig
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 4:56 pm
by stam
Thierry wrote: ↑Sat Apr 03, 2021 4:09 pm
here is a little saturday night quiz...
...
Hint: I added 1 meta char in your regex. so what's in rexTdz1 ?

Thierry
Hi Thierry - and and thanks for jumping in and correcting me

I forgot to add the non-greedy operator '?'
code should be
he's an example using the non-greedy operator, using the text Craig posted above
Using (.*) instead of (.*?) finds one long group, from the first delim1 to the last delim2:
Adding the non-greedy operator finds the individual groups - thank you for correcting me, i always find regex so useful but so difficult lol....
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 6:21 pm
by SparkOut
I always find regex difficult until Thierry helps out. Then I wonder what was so hard, but with a long time between needing to use regex it gets difficult again the next time, until Thierry swoops* in again.
*I don't need to link the xkcd strip again do I?
Edit: Oh alright then ...
Substitute LiveCode for PERL
Re: Find text between two different delimiters/tags using Regex
Posted: Sat Apr 03, 2021 7:15 pm
by stam
SparkOut wrote: ↑Sat Apr 03, 2021 6:21 pm
*I don't need to link the xkcd strip again do I?
Edit: Oh alright then ...
that's so brilliant lol!
Re: Find text between two different delimiters/tags using Regex
Posted: Sun Apr 04, 2021 8:53 am
by golife
When I posted the question first, I had in mind that Thierry night jump in. He did ...
I feel like a bloody beginner regarding Regex, nevertheless, it does things fast and sometimes it is just the best solution. But it also is a black box for most of us since we do not spend the time to study it deeply enough. Yes, it looks extremely cryptic. But how would it look if it would not use such a condensed way of parameterization?
I swear to myself to take some time to study it more deeply with examples and trial-and-error.
Very helpful posts here..
Thanks to all ..., Roland (golife)
Re: Find text between two different delimiters/tags using Regex
Posted: Sun Apr 04, 2021 5:13 pm
by dunbarx
Hi.
Regex is very powerful and compact. It can do in one line what would take "ordinary" LC several. I spent a little time working through some basics. However, I never use it. Too old and not smart enough, you know.
But did you check out the second "twoTags" handler above? I think it does what you want, the old fashioned way.
Craig
Re: Find text between two different delimiters/tags using Regex
Posted: Mon Apr 05, 2021 11:57 am
by Thierry
stam wrote: ↑Sat Apr 03, 2021 4:56 pm
Hi Thierry - and and thanks for jumping in

I forgot to add the non-greedy operator '?'
code should be
please don't give too much heavy food to my 'sick-hatred-anti-regex' friends

this will work too with less typing: (1)
he's an example using the non-greedy operator
Mmm, still some more work to do to get the same result in LiveCode
Happy Easter
Thierry
[1] all my comments with regex are in a LiveCode context only!
Re: Find text between two different delimiters/tags using Regex
Posted: Mon Apr 05, 2021 12:13 pm
by Thierry
SparkOut wrote: ↑Sat Apr 03, 2021 6:21 pm
*I don't need to link the xkcd strip again do I?
Edit: Oh alright then ...
Hi SparkOut, funny but that's an old story and only one side of the play
So, look at what was happening after that:
tarzan-liane.gif
(click the gif to see it in action...)
Happy Easter,
Thierry