Trying to group phrases in text...
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
Trying to group phrases in text...
I'm trying to take a block of text and grab all the 2 word and 3 word phrases from it.
So, say for example I have this chunk of text:
Mr. Dennis went to the flower shop to find a bucket of roses.
I was hoping to find a way to grab all the two word phrases...and then grab all the three word phrases.
Example:
2 Word Phrases:
Mr. Dennis
Dennis went
went to
to the
the flower
flower shop
shop to
to find
find a
a bucket
bucket of
of roses
3 Word Phrases:
Mr. Dennis went
Dennis went to
went to the
to the flower
the flower shop
flower shop to
shop to find
to find a
find a bucket
a bucket of
bucket of roses
I've got some of the first steps and am trying to put this all in an array...but the code for looping this in Revolution has got me stumped.
Thanks for any tips/advice.
So, say for example I have this chunk of text:
Mr. Dennis went to the flower shop to find a bucket of roses.
I was hoping to find a way to grab all the two word phrases...and then grab all the three word phrases.
Example:
2 Word Phrases:
Mr. Dennis
Dennis went
went to
to the
the flower
flower shop
shop to
to find
find a
a bucket
bucket of
of roses
3 Word Phrases:
Mr. Dennis went
Dennis went to
went to the
to the flower
the flower shop
flower shop to
shop to find
to find a
find a bucket
a bucket of
bucket of roses
I've got some of the first steps and am trying to put this all in an array...but the code for looping this in Revolution has got me stumped.
Thanks for any tips/advice.
-
- VIP Livecode Opensource Backer
- Posts: 977
- Joined: Sat Apr 08, 2006 7:47 am
- Contact:
Ah, the perfect way to kick-start your brain in the morning whilst eating breakfast
Off the top of my head:
If you don't need to preserve the number of spaces between words, you can probably optimize this with something like:
While you may not see much difference in execution time between the two approaches with short texts, the speed advantage of the second approach will become noticeable as the texts grow.
HTH,
Jan Schenkel.

Code: Select all
on mouseUp
put field 1 into theText
--
put the number of words in theText into theWordCount
put 0 into theTwoWordCount
put 0 into theThreeWordCount
repeat with theWordIndex = 1 to theWordCount
if theWordIndex > 1 then
add 1 to theTwoWordCount
put word theWordIndex - 1 to theWordIndex of theText into theTwoWordArray[theTwoWordCount]
end if
if theWordIndex > 2 then
add 1 to theThreeWordCount
put word theWordIndex - 2 to theWordIndex of theText into theThreeWordArray[theThreeWordCount]
end if
end repeat
--
combine theTwoWordArray using return
put theTwoWordArray into field 2
combine theThreeWordArray using return
put theThreeWordArray into field 3
end mouseUp
Code: Select all
on mouseUp
put field 1 into theText
--
put 0 into theWordIndex
put 0 into theTwoWordCount
put 0 into theThreeWordCount
put empty into theTwoWordBuffer
put empty into theThreeWordBuffer
repeat for each word theWord in theText
add 1 to theWordIndex
if theWordIndex = 1 then
put theWord into theTwoWordBuffer
put theWord into theThreeWordBuffer
else
put space & theWord after theTwoWordBuffer
put space & theWord after theThreeWordBuffer
if theWordIndex > 1 then
add 1 to theTwoWordCount
if theWordIndex > 2 then
delete word 1 of theTwoWordBuffer
end if
put theTwoWordBuffer into theTwoWordArray[theTwoWordCount]
end if
if theWordIndex > 2 then
add 1 to theThreeWordCount
if theWordIndex > 3 then
delete word 1 of theThreeWordBuffer
end if
put theThreeWordBuffer into theThreeWordArray[theThreeWordCount]
end if
end if
end repeat
--
combine theTwoWordArray using return
put theTwoWordArray into field 2
combine theThreeWordArray using return
put theThreeWordArray into field 3
end mouseUp
HTH,
Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com
www.quartam.com
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
Jan does indeed rock. Now that the coffee's hit I rise to the occasion and post an alternative way to do things:
Code: Select all
repeat for each word tWord in field "fldText"
put tWord & space after tTwoWords
put tWord & space after tThreeWords
if the number of words in tTwoWords is 2 then
put tTwoWords after tTwoWordList
put cr into char -1 of tTwoWordList
delete word 1 of tTwoWords
end if
if the number of words in tThreeWords is 3 then
put tThreeWords after tThreeWordList
put cr into char -1 of tThreeWordList
delete word 1 of tThreeWords
else
end if
end repeat
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
@mwieder
Ok...so I've hit just a small snag with the code.
It works perfectly...except I was hoping to place the values in an array so that I could reference them individually.
I tried this:
But that's not working...what is wrong with my array here?
Thanks!
Ok...so I've hit just a small snag with the code.
It works perfectly...except I was hoping to place the values in an array so that I could reference them individually.
I tried this:
Code: Select all
on mouseUp
local loopvalue
put empty into loopvalue
repeat for each word tWord in field "fldText"
add 1 to loopvalue
put tWord & space after tTwoWords
put tWord & space after tThreeWords
if the number of words in tTwoWords is 2 then
put tTwoWords into tTwoWordList[loopvalue]
put cr into char -1 of tTwoWordList
delete word 1 of tTwoWords
end if
if the number of words in tThreeWords is 3 then
put tThreeWords into tThreeWordList[loopvalue]
put cr into char -1 of tThreeWordList
delete word 1 of tThreeWords
else
end if
end repeat
answer tTwoWordList[1]
end mouseUp
Thanks!
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
I have played with this a little just for academic reasons and Mark's is the super rocker from my trials. I was doing the same sort of process in a "repeat for each word" loop. Given a source text of the "Dennis went" sentence repeated a bunch of times (about 50 or 60) the results were:
Mark's "repeat for each": 8 to 14 milliseconds
My "repeat for each" (similar to Mark's but not as efficient, obviously): 14 to 19 milliseconds.
Jan's routine 1: 16 to 21 milliseconds
Jan's routing 2: 28 to 36 milliseconds
Jan's routines seem to give some odd results too. I'm getting peculiar gaps in the list of two and three words - is that because of the combining of the arrays?
Just out of testingness to see how much a "repeat with i = 1 to..." loop is slowed compared with the "repeat for each" approach, I did that too, and the results were 28 to 31 milliseconds.
Mark's "repeat for each": 8 to 14 milliseconds
My "repeat for each" (similar to Mark's but not as efficient, obviously): 14 to 19 milliseconds.
Jan's routine 1: 16 to 21 milliseconds
Jan's routing 2: 28 to 36 milliseconds
Jan's routines seem to give some odd results too. I'm getting peculiar gaps in the list of two and three words - is that because of the combining of the arrays?
Just out of testingness to see how much a "repeat with i = 1 to..." loop is slowed compared with the "repeat for each" approach, I did that too, and the results were 28 to 31 milliseconds.
And to get the array - just leave Mark's script as it is, don't add the loop counters or anything. Just let it build up the list and then at the very end
That will give you two arrays each numerically indexed from 1.
Code: Select all
split tTwoWordList by cr
split tThreeWordList by cr
Code: Select all
on mouseUp
put the milliseconds into tNow
repeat for each word tWord in field "Field1"
put tWord & space after tTwoWords
put tWord & space after tThreeWords
if the number of words in tTwoWords is 2 then
put tTwoWords after tTwoWordList
put cr into char -1 of tTwoWordList
delete word 1 of tTwoWords
end if
if the number of words in tThreeWords is 3 then
put tThreeWords after tThreeWordList
put cr into char -1 of tThreeWordList
delete word 1 of tThreeWords
else
end if
end repeat
split tTwoWordList by cr
split tThreeWordList by cr
put tTwoWordList[1] into field "Field2"
put tThreeWordList[1] into field "Field3"
put the milliseconds - tNow into field "fldTime"
end mouseUp
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
Well, I wasn't really aiming for speed here, just that the overhead of putting everything into an array along the way seemed unnecessary. Out of curiousity, why do you want the results to end up in an array rather than just keeping them in a variable? Seems like you could use the line number just as easily as the array key. I don't see the advantage, so I must be missing out on something.
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm
-
- Posts: 13
- Joined: Mon Feb 16, 2009 7:39 pm