tokenize string and parsing

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
tasdvl9
Posts: 94
Joined: Fri Dec 06, 2013 3:55 am

tokenize string and parsing

Post by tasdvl9 » Thu Mar 13, 2014 8:30 pm

Hi All,

I'm trying to figure out the best way to parse a string which may contain something like this:

this: ....... word

I'd like to retrieve "word"(minus the quotes) and store it.
I could use put line and put char but would rather somehow tokenize the string.
The reason being because "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:

this: ................ livecode

Perhaps retrieving the text after the space after the last dot would work but I'm still in a quandary as
to how to implement this.

Thanks!

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10333
Joined: Wed May 06, 2009 2:28 pm

Re: tokenize string and parsing

Post by dunbarx » Thu Mar 13, 2014 9:00 pm

Hi.

Have you looked up (and experimented with) the "offset" and "wordOffset" functions in the dictionary? If you are trying to extract string fragments within text, it is a little more involved, especially with multiple possible matches, but still an old and well understood process.

Please write back with an actual example. The snippet you sent is ambiguous. In any case, this is very straightforward.

Craig Newman

tasdvl9
Posts: 94
Joined: Fri Dec 06, 2013 3:55 am

Re: tokenize string and parsing

Post by tasdvl9 » Thu Mar 13, 2014 10:12 pm

Thanks, Craig.

I'll look into those functions in the dictionary. I appreciate the response.

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: tokenize string and parsing

Post by Thierry » Fri Mar 14, 2014 7:03 am

Hi,

Regular expressions can be helpful here... if you like them :)

Code: Select all

   -- some data:
   put "this: ................ livecode" &cr into T
   put " ............            aword" &cr after T
   put "this won't .... work zz" &cr after T
RX is the regular expression, which yo can read as:
capture the last word of a line if there are some dots
then some spaces before it.


One way, using matchText():

Code: Select all

on mouseUp
   local x,RX

   put "\.+\s+(\w+)$" into RX
   repeat for each line aLine in T
      if matchText( aLine, RX, theLastWord) then
         put "get1: " & theLastWord &cr after x
      end if
   end repeat
   put x
end mouseUp
Another one, using matchChunk():

Code: Select all

on mouseUp
   local x,RX
   -- same regex as above except we work on a multi-lines string
   put "(?m)\.+\s+(\w+)$" into RX
   repeat while matchChunk( T, RX, p1start,p1End)
      put "get2: " & char p1Start to p1End of T &cr after x
      delete char 1 to p1End of T
   end repeat
   put x
end mouseUp
HTH.
Last edited by Thierry on Fri Sep 12, 2014 8:07 am, edited 1 time in total.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

jiml
Posts: 339
Joined: Sat Dec 09, 2006 1:27 am

Re: tokenize string and parsing

Post by jiml » Sun Mar 16, 2014 6:40 pm

this: ....... word

I'd like to retrieve "word"(minus the quotes) and store it.
... "word" may change to some other text in which the character length is
more or less. For instance I may have something like this:

this: ................ livecode
Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"


If there is no space between the last period and the word,
this: ................livecode
try
function getLastWord myString
replace "." with space in myString
return word -1 of myString
end getLastWord

put getLastWord("this: ................livecode")into theLastWord
theLastWord now equals "livecode"

jiml
Posts: 339
Joined: Sat Dec 09, 2006 1:27 am

Re: tokenize string and parsing

Post by jiml » Sun Mar 16, 2014 6:43 pm

Your examples have a space before the last word, so
put word -1 of "this: ................ live code" into theLastWord
theLastWord now equals "livecode"
Autocorrect incorrectly changed that post.
It should read:

Your examples have a space before the last word, so
put word -1 of "this: ................ livecode" into theLastWord
theLastWord now equals "livecode"

Post Reply