Adventures into matchChunk and regex

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4198
Joined: Sun Jan 07, 2007 9:12 pm

Adventures into matchChunk and regex

Post by bn » Tue Dec 09, 2025 12:01 pm

Unfortunately "matchChunk" is not well explained in the dictionary.

The task was to break up short texts into 3, 4 or 5 paragraphs. I manged that by counting full stops "." and determine the number of sentences that should make up a paragraph. However the texts could contain decimal numbers and the decimal delimiter is a ".". Furthermore abbreviations used "." as in "e.g." which I had to navigate around. Doing this in native Livecode would have been quite some coding and I figured that matchChunk using a regex expression could be helpful. Now regex is something I avoided since my needs up to now could be solved using livecode syntax.

Stam recommends using "Regex101" website and I followed his advice and found it a very sensible test ground for testing regex expressions.

I want to post here my solution to the problem of cleaning up a text by replacing non full stop dots temporarily and then reintroduce dots after formatting which is not shown here.

Since I did not find any working solutions for "matchChunk" I post my function here just as an example using matchChunk.

Code: Select all

function escapeAbbrAndDezimalpoints pText
   local startMatch, endMatch, tStart, tBegin, tEnd, tVar, tRegex, tList
   local tFrom, tTo
   
   put pText into tVar
   
   ## this catches "e.g." and "2.7" 
   put "([a-zA-Z]\.[a-zA-Z]\.|[0-9]\.[0-9])" into tRegex
   
   ## define search range; tFrom will be changed if a hit is found, excluding part up to a hit
   put 1 into tFrom
   put the number of chars of tVar into tTo
   
   repeat
      if matchChunk(char tFrom to tTo of tVar, tRegex, tStart, tEnd) is true then
         
         ## built list of hits
         put tStart + tFrom - 1, tEnd + tFrom - 1 & return after tList
         
         ## update search range
         add tEnd to tFrom
         
         if tFrom > tTo then
            exit repeat
         end if
      else
         exit repeat
      end if
   end repeat
   
   if tList is empty then return tVar
   
   ## sort not necessary here but for cases where replace changes character count
   sort tList descending numeric by item 1 of each
    
   repeat for each line aLine in tList
      put item 1 of aLine into tBegin
      put item 2 of aLine into tEnd
      replace "." with "±" in char tBegin to tEnd of tVar
   end repeat
   return tVar
end escapeAbbrAndDezimalpoints

Kind regards
Bernd

stam
Posts: 3167
Joined: Sun Jun 04, 2006 9:39 pm

Re: Adventures into matchChunk and regex

Post by stam » Tue Dec 09, 2025 12:29 pm

Hi Bernd,

Yes I swear but Regex101 for all regex testing. The only problem with it is that it defaults to JS regex which has a 'global' match flat (ie all matches are returned) whereas LiveCode uses PCRE regex, which doesn't have this (only the first match is returned).

As a workaround I uses this function, which returns an array with all matches, and the start and end offsets for each match. It only works if you have 1 pattern to match, but that's the usual case anyway.

Code: Select all

function globalMatch pSource, pRegex
    local tMatchesA, tVarCount, tVar1, tVar2, tSource, tTemp
    -- count occurences
    put replaceText(pSource, pRegex, "regexMatch" & return) into tTemp
    filter lines of tTemp with "*regexMatch"
    put the number of lines of tTemp into tVarCount
    
    -- get occurences by locating offsets with matchChunk(), loop to store values and delete occurence in copy of source.
    put pSource into tSource
    repeat with x = 1 to tVarCount
        if matchChunk(tSource, pRegex , tVar1, tVar2) then
            put char tVar1 to tVar2 of tSource into tMatchesA[x]["value"]
            put tVar1 into tMatchesA[x]["start"]
            put tVar2 into tMatchesA[x]["end"]
            delete char tVar1 to tVar2 of tSource
        end if
    end repeat
    return tMatchesA
end globalMatch
Not sure if that adds anything to your solution, but may be helpful for others...

Post Reply