Adventures into matchChunk and regex

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10101
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Adventures into matchChunk and regex

Post by FourthWorld » Wed Mar 25, 2026 11:59 pm

dunbarx wrote:
Wed Mar 25, 2026 9:09 pm
Otherwise you are oftentimes out of luck.
The number and variety of rules for defining English sentence boundaries is probably knowable, and certainly within the range of modern computing horsepower to use.

The question is: who has the expertise in both linguistics and computer science to pull it off in any usefully complete form?

This pursuit reminds me of the story of the Porter stemmer, a task many had failed at until Martin Porter came along, and even his isn't perfect, just good enough for most uses.

Stemming is the process of finding a root word, useful in information retrieval for indexing related terms which may have different spelling.

For example, it's easy to see "run" in "running", but what do you do with "ran"? Different time sense, sure, but it's really the same word. How would you index "run" with "ran"?

Lemmatization is the set of cognitive processes native speakers pick up understand these word relationships, but it's too much to try to put into a machine too stupid to count past 1.

Porter looked at the problem from a purely mechanical perspective that focused on the outcomes of permutations, and then set about developing an algo to encode the most generalizable permutation rules useful for common indexing tasks.

https://en.wikipedia.org/wiki/Stemming

We may find that elsewhere in computer science history lies a similar answer for finding sentence boundaries. And as with the Porter stemmer, it'll save years of trial and error to find it and adopt it than to reinvent it.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10476
Joined: Wed May 06, 2009 2:28 pm

Re: Adventures into matchChunk and regex

Post by dunbarx » Thu Mar 26, 2026 4:28 am

Richard.
...to try to put into a machine too stupid to count past 1.
:D

Craig

Post Reply