Page 1 of 1

Sentence chunk and lowercase letter after period

Posted: Fri Aug 07, 2015 1:01 pm
by Sjatplat
Hi there

I want to make sure that the first letter of each sentence is capitalized in the output text in a field.
I thought the easy way was to use the sentence chunk. But it seems that the sentence chunk does not work with a sentence that starts with a lowercase character after a period.


Example:

Code: Select all

Put the number of sentences of "sentence one. sentence two. sentence three."
... returns 1 in the message box
and

Code: Select all

Put the number of sentences of "sentence one. Sentence two. sentence three."
... returns 2 in the message box

So I guess this has something to do with the ICU library?
Anyone know an easy solution to this?

Re: Sentence chunk and lowercase letter after period

Posted: Fri Aug 07, 2015 4:27 pm
by dunbarx
I really must get into v7. There is no "sentence" chunk in v.6.

It would seem that the caseSensitive property ought to handle this, but if it does not, then I suppose you are left to kluge it yourself. The only problem is those pesky periods that live inside of normally parsed sentences, like:

"I gave $3.50 to my favorite charity, dunbarxPleadingForCash.com just last week."

Otherwise a snap to make such a thing.

By the way, how does the sentence chunk deal with the above silliness? Does it only fire at the end of strings, followed by a space or CR and an uppercase letter?

Craig Newman

Re: Sentence chunk and lowercase letter after period

Posted: Fri Aug 07, 2015 7:37 pm
by SparkOut
I don't know.But! although it is good practice, there are plenty of tracts where a sentence does not preserve a space after the preceding full stop.(period for US types. Which is not the only end of sentence marker. Or is it?no, seriously, right? ) Or other anomalies, like a dot.com bubble bursting into a raincloud over the sentence parade.
Er... What rules govern what constitutes a "Sentence Chunk"?
"The only problem"...? A snap? Well yes, but only having chosen a ruleset. What is the ruleset in use here? (and yes, please excuse this additional silliness).

Re: Sentence chunk and lowercase letter after period

Posted: Fri Aug 07, 2015 10:02 pm
by FourthWorld
SparkOut wrote:What rules govern what constitutes a "Sentence Chunk"?
The Dictionary entry for "sentence" notes the ICU library as handling the details of defining what a sentence is, and includes this link for more info:
http://www.unicode.org/reports/tr29/#Se ... Boundaries

Natural language parsing is complex stuff. No doubt it's possible to find edge cases the ICU library wasn't designed to handle.

Re: Sentence chunk and lowercase letter after period

Posted: Sat Aug 08, 2015 9:11 am
by SparkOut
Yes, my additional silliness was meant to highlight that a ruleset that has to try and work with natural language is very complex, and a worse job to consider even than the CSV fiasco. And "natural language" these days - what about twitter and Facebook? Nightmarish.

Re: Sentence chunk and lowercase letter after period

Posted: Sat Aug 08, 2015 11:37 am
by Sjatplat
Always looking for shortcuts but I suspected I had to kluge this.

Thanks for the answers.