Sentence chunk and lowercase letter after period

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Sjatplat
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 75
Joined: Wed Jun 22, 2011 1:53 pm

Sentence chunk and lowercase letter after period

Post by Sjatplat » Fri Aug 07, 2015 1:01 pm

Hi there

I want to make sure that the first letter of each sentence is capitalized in the output text in a field.
I thought the easy way was to use the sentence chunk. But it seems that the sentence chunk does not work with a sentence that starts with a lowercase character after a period.


Example:

Code: Select all

Put the number of sentences of "sentence one. sentence two. sentence three."
... returns 1 in the message box
and

Code: Select all

Put the number of sentences of "sentence one. Sentence two. sentence three."
... returns 2 in the message box

So I guess this has something to do with the ICU library?
Anyone know an easy solution to this?

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10305
Joined: Wed May 06, 2009 2:28 pm

Re: Sentence chunk and lowercase letter after period

Post by dunbarx » Fri Aug 07, 2015 4:27 pm

I really must get into v7. There is no "sentence" chunk in v.6.

It would seem that the caseSensitive property ought to handle this, but if it does not, then I suppose you are left to kluge it yourself. The only problem is those pesky periods that live inside of normally parsed sentences, like:

"I gave $3.50 to my favorite charity, dunbarxPleadingForCash.com just last week."

Otherwise a snap to make such a thing.

By the way, how does the sentence chunk deal with the above silliness? Does it only fire at the end of strings, followed by a space or CR and an uppercase letter?

Craig Newman

SparkOut
Posts: 2943
Joined: Sun Sep 23, 2007 4:58 pm

Re: Sentence chunk and lowercase letter after period

Post by SparkOut » Fri Aug 07, 2015 7:37 pm

I don't know.But! although it is good practice, there are plenty of tracts where a sentence does not preserve a space after the preceding full stop.(period for US types. Which is not the only end of sentence marker. Or is it?no, seriously, right? ) Or other anomalies, like a dot.com bubble bursting into a raincloud over the sentence parade.
Er... What rules govern what constitutes a "Sentence Chunk"?
"The only problem"...? A snap? Well yes, but only having chosen a ruleset. What is the ruleset in use here? (and yes, please excuse this additional silliness).

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10043
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Sentence chunk and lowercase letter after period

Post by FourthWorld » Fri Aug 07, 2015 10:02 pm

SparkOut wrote:What rules govern what constitutes a "Sentence Chunk"?
The Dictionary entry for "sentence" notes the ICU library as handling the details of defining what a sentence is, and includes this link for more info:
http://www.unicode.org/reports/tr29/#Se ... Boundaries

Natural language parsing is complex stuff. No doubt it's possible to find edge cases the ICU library wasn't designed to handle.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

SparkOut
Posts: 2943
Joined: Sun Sep 23, 2007 4:58 pm

Re: Sentence chunk and lowercase letter after period

Post by SparkOut » Sat Aug 08, 2015 9:11 am

Yes, my additional silliness was meant to highlight that a ruleset that has to try and work with natural language is very complex, and a worse job to consider even than the CSV fiasco. And "natural language" these days - what about twitter and Facebook? Nightmarish.

Sjatplat
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 75
Joined: Wed Jun 22, 2011 1:53 pm

Re: Sentence chunk and lowercase letter after period

Post by Sjatplat » Sat Aug 08, 2015 11:37 am

Always looking for shortcuts but I suspected I had to kluge this.

Thanks for the answers.

Post Reply