tokenize string and parsing
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
tokenize string and parsing
Hi All,
I'm trying to figure out the best way to parse a string which may contain something like this:
this: ....... word
I'd like to retrieve "word"(minus the quotes) and store it.
I could use put line and put char but would rather somehow tokenize the string.
The reason being because "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:
this: ................ livecode
Perhaps retrieving the text after the space after the last dot would work but I'm still in a quandary as
to how to implement this.
Thanks!
I'm trying to figure out the best way to parse a string which may contain something like this:
this: ....... word
I'd like to retrieve "word"(minus the quotes) and store it.
I could use put line and put char but would rather somehow tokenize the string.
The reason being because "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:
this: ................ livecode
Perhaps retrieving the text after the space after the last dot would work but I'm still in a quandary as
to how to implement this.
Thanks!
Re: tokenize string and parsing
Hi.
Have you looked up (and experimented with) the "offset" and "wordOffset" functions in the dictionary? If you are trying to extract string fragments within text, it is a little more involved, especially with multiple possible matches, but still an old and well understood process.
Please write back with an actual example. The snippet you sent is ambiguous. In any case, this is very straightforward.
Craig Newman
Have you looked up (and experimented with) the "offset" and "wordOffset" functions in the dictionary? If you are trying to extract string fragments within text, it is a little more involved, especially with multiple possible matches, but still an old and well understood process.
Please write back with an actual example. The snippet you sent is ambiguous. In any case, this is very straightforward.
Craig Newman
Re: tokenize string and parsing
Thanks, Craig.
I'll look into those functions in the dictionary. I appreciate the response.
I'll look into those functions in the dictionary. I appreciate the response.
Re: tokenize string and parsing
Hi,
Regular expressions can be helpful here... if you like them
RX is the regular expression, which yo can read as:
capture the last word of a line if there are some dots
then some spaces before it.
One way, using matchText():
Another one, using matchChunk():
HTH.
Regular expressions can be helpful here... if you like them

Code: Select all
-- some data:
put "this: ................ livecode" &cr into T
put " ............ aword" &cr after T
put "this won't .... work zz" &cr after T
capture the last word of a line if there are some dots
then some spaces before it.
One way, using matchText():
Code: Select all
on mouseUp
local x,RX
put "\.+\s+(\w+)$" into RX
repeat for each line aLine in T
if matchText( aLine, RX, theLastWord) then
put "get1: " & theLastWord &cr after x
end if
end repeat
put x
end mouseUp
Code: Select all
on mouseUp
local x,RX
-- same regex as above except we work on a multi-lines string
put "(?m)\.+\s+(\w+)$" into RX
repeat while matchChunk( T, RX, p1start,p1End)
put "get2: " & char p1Start to p1End of T &cr after x
delete char 1 to p1End of T
end repeat
put x
end mouseUp
Last edited by Thierry on Fri Sep 12, 2014 8:07 am, edited 1 time in total.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
Re: tokenize string and parsing
Your examples have a space before the last word, sothis: ....... word
I'd like to retrieve "word"(minus the quotes) and store it.
... "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:
this: ................ livecode
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"
If there is no space between the last period and the word,
this: ................livecode
try
function getLastWord myString
replace "." with space in myString
return word -1 of myString
end getLastWord
put getLastWord("this: ................livecode")into theLastWord
theLastWord now equals "livecode"
Re: tokenize string and parsing
Autocorrect incorrectly changed that post.Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"
It should read:
Your examples have a space before the last word, so
put word -1 of "this: ................ livecode" into theLastWord
theLastWord now equals "livecode"