Page 1 of 1

tokenize string and parsing

Posted: Thu Mar 13, 2014 8:30 pm
by tasdvl9
Hi All,

I'm trying to figure out the best way to parse a string which may contain something like this:

this: ....... word

I'd like to retrieve "word"(minus the quotes) and store it.
I could use put line and put char but would rather somehow tokenize the string.
The reason being because "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:

this: ................ livecode

Perhaps retrieving the text after the space after the last dot would work but I'm still in a quandary as
to how to implement this.

Thanks!

Re: tokenize string and parsing

Posted: Thu Mar 13, 2014 9:00 pm
by dunbarx
Hi.

Have you looked up (and experimented with) the "offset" and "wordOffset" functions in the dictionary? If you are trying to extract string fragments within text, it is a little more involved, especially with multiple possible matches, but still an old and well understood process.

Please write back with an actual example. The snippet you sent is ambiguous. In any case, this is very straightforward.

Craig Newman

Re: tokenize string and parsing

Posted: Thu Mar 13, 2014 10:12 pm
by tasdvl9
Thanks, Craig.

I'll look into those functions in the dictionary. I appreciate the response.

Re: tokenize string and parsing

Posted: Fri Mar 14, 2014 7:03 am
by Thierry
Hi,

Regular expressions can be helpful here... if you like them :)

Code: Select all

   -- some data:
   put "this: ................ livecode" &cr into T
   put " ............            aword" &cr after T
   put "this won't .... work zz" &cr after T
RX is the regular expression, which yo can read as:
capture the last word of a line if there are some dots
then some spaces before it.


One way, using matchText():

Code: Select all

on mouseUp
   local x,RX

   put "\.+\s+(\w+)$" into RX
   repeat for each line aLine in T
      if matchText( aLine, RX, theLastWord) then
         put "get1: " & theLastWord &cr after x
      end if
   end repeat
   put x
end mouseUp
Another one, using matchChunk():

Code: Select all

on mouseUp
   local x,RX
   -- same regex as above except we work on a multi-lines string
   put "(?m)\.+\s+(\w+)$" into RX
   repeat while matchChunk( T, RX, p1start,p1End)
      put "get2: " & char p1Start to p1End of T &cr after x
      delete char 1 to p1End of T
   end repeat
   put x
end mouseUp
HTH.

Re: tokenize string and parsing

Posted: Sun Mar 16, 2014 6:40 pm
by jiml
this: ....... word

I'd like to retrieve "word"(minus the quotes) and store it.
... "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:

this: ................ livecode
Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"


If there is no space between the last period and the word,
this: ................livecode
try
function getLastWord myString
replace "." with space in myString
return word -1 of myString
end getLastWord

put getLastWord("this: ................livecode")into theLastWord
theLastWord now equals "livecode"

Re: tokenize string and parsing

Posted: Sun Mar 16, 2014 6:43 pm
by jiml
Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"
Autocorrect incorrectly changed that post.
It should read:

Your examples have a space before the last word, so
put word -1 of "this: ................ livecode" into theLastWord
theLastWord now equals "livecode"