Page 1 of 1
tokenize string and parsing
Posted: Thu Mar 13, 2014 8:30 pm
by tasdvl9
Hi All,
I'm trying to figure out the best way to parse a string which may contain something like this:
this: ....... word
I'd like to retrieve "word"(minus the quotes) and store it.
I could use put line and put char but would rather somehow tokenize the string.
The reason being because "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:
this: ................ livecode
Perhaps retrieving the text after the space after the last dot would work but I'm still in a quandary as
to how to implement this.
Thanks!
Re: tokenize string and parsing
Posted: Thu Mar 13, 2014 9:00 pm
by dunbarx
Hi.
Have you looked up (and experimented with) the "offset" and "wordOffset" functions in the dictionary? If you are trying to extract string fragments within text, it is a little more involved, especially with multiple possible matches, but still an old and well understood process.
Please write back with an actual example. The snippet you sent is ambiguous. In any case, this is very straightforward.
Craig Newman
Re: tokenize string and parsing
Posted: Thu Mar 13, 2014 10:12 pm
by tasdvl9
Thanks, Craig.
I'll look into those functions in the dictionary. I appreciate the response.
Re: tokenize string and parsing
Posted: Fri Mar 14, 2014 7:03 am
by Thierry
Hi,
Regular expressions can be helpful here... if you like them
Code: Select all
-- some data:
put "this: ................ livecode" &cr into T
put " ............ aword" &cr after T
put "this won't .... work zz" &cr after T
RX is the regular expression, which yo can read as:
capture the last word of a line if there are some dots
then some spaces before it.
One way, using matchText():
Code: Select all
on mouseUp
local x,RX
put "\.+\s+(\w+)$" into RX
repeat for each line aLine in T
if matchText( aLine, RX, theLastWord) then
put "get1: " & theLastWord &cr after x
end if
end repeat
put x
end mouseUp
Another one, using matchChunk():
Code: Select all
on mouseUp
local x,RX
-- same regex as above except we work on a multi-lines string
put "(?m)\.+\s+(\w+)$" into RX
repeat while matchChunk( T, RX, p1start,p1End)
put "get2: " & char p1Start to p1End of T &cr after x
delete char 1 to p1End of T
end repeat
put x
end mouseUp
HTH.
Re: tokenize string and parsing
Posted: Sun Mar 16, 2014 6:40 pm
by jiml
this: ....... word
I'd like to retrieve "word"(minus the quotes) and store it.
... "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:
this: ................ livecode
Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"
If there is no space between the last period and the word,
this: ................livecode
try
function getLastWord myString
replace "." with space in myString
return word -1 of myString
end getLastWord
put getLastWord("this: ................livecode")into theLastWord
theLastWord now equals "livecode"
Re: tokenize string and parsing
Posted: Sun Mar 16, 2014 6:43 pm
by jiml
Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"
Autocorrect incorrectly changed that post.
It should read:
Your examples have a space before the last word, so
put word -1 of "this: ................ livecode" into theLastWord
theLastWord now equals "livecode"