tokenize string and parsing

tasdvl9 · Post by **tasdvl9** » Thu Mar 13, 2014 8:30 pm

Hi All,

I'm trying to figure out the best way to parse a string which may contain something like this:

this: ....... word

I'd like to retrieve "word"(minus the quotes) and store it.
I could use put line and put char but would rather somehow tokenize the string.
The reason being because "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:

this: ................ livecode

Perhaps retrieving the text after the space after the last dot would work but I'm still in a quandary as
to how to implement this.

Thanks!

dunbarx · Post by **dunbarx** » Thu Mar 13, 2014 9:00 pm

Hi.

Have you looked up (and experimented with) the "offset" and "wordOffset" functions in the dictionary? If you are trying to extract string fragments within text, it is a little more involved, especially with multiple possible matches, but still an old and well understood process.

Please write back with an actual example. The snippet you sent is ambiguous. In any case, this is very straightforward.

Craig Newman

tasdvl9 · Post by **tasdvl9** » Thu Mar 13, 2014 10:12 pm

Thanks, Craig.

I'll look into those functions in the dictionary. I appreciate the response.

Thierry · Post by **Thierry** » Fri Mar 14, 2014 7:03 am

Hi,

Regular expressions can be helpful here... if you like them

Code: Select all

   -- some data:
   put "this: ................ livecode" &cr into T
   put " ............            aword" &cr after T
   put "this won't .... work zz" &cr after T

RX is the regular expression, which yo can read as:
capture the last word of a line if there are some dots
then some spaces before it.

One way, using matchText():

Code: Select all

on mouseUp
   local x,RX

   put "\.+\s+(\w+)$" into RX
   repeat for each line aLine in T
      if matchText( aLine, RX, theLastWord) then
         put "get1: " & theLastWord &cr after x
      end if
   end repeat
   put x
end mouseUp

Another one, using matchChunk():

Code: Select all

on mouseUp
   local x,RX
   -- same regex as above except we work on a multi-lines string
   put "(?m)\.+\s+(\w+)$" into RX
   repeat while matchChunk( T, RX, p1start,p1End)
      put "get2: " & char p1Start to p1End of T &cr after x
      delete char 1 to p1End of T
   end repeat
   put x
end mouseUp

HTH.

jiml · Post by **jiml** » Sun Mar 16, 2014 6:40 pm

this: ....... word

I'd like to retrieve "word"(minus the quotes) and store it.
... "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:

this: ................ livecode

Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"

If there is no space between the last period and the word,
this: ................livecode
try
function getLastWord myString
replace "." with space in myString
return word -1 of myString
end getLastWord

put getLastWord("this: ................livecode")into theLastWord
theLastWord now equals "livecode"

jiml · Post by **jiml** » Sun Mar 16, 2014 6:43 pm

Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"

Autocorrect incorrectly changed that post.
It should read:

Your examples have a space before the last word, so
put word -1 of "this: ................ livecode" into theLastWord
theLastWord now equals "livecode"

LiveCode Forums.

tokenize string and parsing

tokenize string and parsing

Re: tokenize string and parsing

Re: tokenize string and parsing

Re: tokenize string and parsing

Re: tokenize string and parsing

Re: tokenize string and parsing