Page 4 of 4
Re: matchtext and replacetext regex parts not working as expecte
Posted: Thu Mar 22, 2018 7:15 pm
by FourthWorld
bwmilby wrote: Thu Mar 22, 2018 6:50 pm
Your code is very similar to the ReplaceText code. It does a similar thing working through the chunk.
With the minor advantage of avoiding the overhead of the regex subsystem. It's great for many problems, esp. complex ones, but as a very generalized solution many simpler problems benefit from simpler parsing methods.
This also works across line breaks, through presumably an engine-level enhancement could handle that as well.
That’s why I think implementing in the engine wouldn’t be that monumental of an effort. I think we just need to define the LCS syntax first. I’m also wondering if I could do it in LCB (access to composite types being the question).
It would be interesting to have both so we can gauge performance improvements in LCB over multiple iterations. I realize it's too early to expect optimization just yet, but the language was designed for, among other things, performance opportunities not possible with LCS. At the moment we have few (none?) examples allowing performance comparison.
Re: matchtext and replacetext regex parts not working as expecte
Posted: Thu Mar 22, 2018 7:21 pm
by bogs
Interesting bit of code there Richard.
Re: matchtext and replacetext regex parts not working as expecte
Posted: Sat Mar 24, 2018 7:36 pm
by bwmilby
Would something like this be a good syntax?
Code: Select all
matchArray(string, regularExpression [, resultArray])
Where the result array would be constructed as follows:
resultArray[matchNumber][subMatchNumber][item]
- item 1: matching string
- item 2: match starting position
- item 3: match ending position
Example:
Code: Select all
matchArray("TestATestB","(Te)(st.)",tResult)
tResult[1][1][1] = "Te"
tResult[1][1][2] = 1
tResult[1][1][3] = 2
tResult[1][2][1] = "stA"
tResult[1][2][2] = 3
tResult[1][2][3] = 5
tResult[2][1][1] = "Te"
tResult[2][1][2] = 6
tResult[2][1][3] = 7
tResult[2][2][1] = "stB"
tResult[2][2][2] = 8
tResult[2][2][3] = 10
This would cover the first part of the request. (I also thought about using "matchDetail")
It could be implemented as a command instead of function by setting the resultArray to empty if nothing found.
We would still need an enhanced flavor of the replaceText function that can handle sub-matches and manipulation of them.
Re: matchtext and replacetext regex parts not working as expecte
Posted: Sat Mar 24, 2018 7:46 pm
by bwmilby
FourthWorld wrote: Thu Mar 22, 2018 7:15 pm
It would be interesting to have both so we can gauge performance improvements in LCB over multiple iterations.
I did look into this, but the engine PCRE code is not available to LCB. If it were to be moved to libFoundation and made extern, then it could be used by a LCB library. Some of the data structures would make use a little cumbersome, but I think that could be cleaned up a bit. My concern about performance is the potential to need to convert string formats repeatedly. That is what caused replaceText to be so slow before the latest bug fix.