Page 1 of 1

Reverse Offset

Posted: Thu Sep 04, 2008 9:45 pm
by Mikey
So, is this better than using Bugzilla?

Anyway, I'd like to use offset() to search backwards.

In this case, I have an HTML document that is dynamic, generated by a web server that I'm using RR to interface with. I know what the text link is going to be, but I don't know what the URL for the link is going to be (as is the norm for dynamic and database driven sites). So, I'd like to be able to use offset() going forward to find the text of the link, the offset() backward to find the beginning of the tag so I can then follow it.

So, for example, the page might be

<html>
<head>
blah
</head>
<body>
yadda
blah
<a href="2938jfsliusleifu2398she82h3" class="gum">Trick or Treat</a>
<a href="23908hjadp9uasdlkfhj230js" class="tree">Smelly Feet</a>
<a href="209jsalasop93jfli32wel9fj39" class="rooster">Gimme Something Good to eat</a>
</body>
</html>

If I want to follow the "Smelly Feet" link, I either have to ASSUME that the 2nd link on the page is the one I want (and hope hat doesn't change), or (more reliably, IMHO), get the offset of "Smelly Feet", and from there get the backward offset to the link.

Granted you can also solve this problem by just waking backward to the two quotes, but there are other tags that I want to work with that aren't so simple, e.g. FORM tags, that have multiple components.

Posted: Thu Sep 04, 2008 10:54 pm
by BvG
This is quite evil, as you actually don't know the string you're looking for, and the identifier comes after the string you are looking for. I'd probably do something like this (untested):

Code: Select all

on mouseUp
  put url "http://correct.url.here.please" into theData
  set the linedelimiter to "<"
  put "Smelly Feet" into theString
  repeat for each line theLine in theData
    if char -(the number of chars in theString) to -1 of theLine = theString then
      put offset("href", theLine) + 5 into firstChar
      put offset(quote, theLine, firstChar)-1 into lastChar
      put char firstChar to lastChar of theLine into theURL
      exit repeat
    end if
  end repeat
  put theURL
end mouseUp

Posted: Fri Sep 05, 2008 6:58 am
by mwieder
Check out the matchText and matchChunk functions. Those are what I use for data-mining html pages. It's still a rather knotty problem - you have to have a fairly educated guess about the format of the text you're examining and a good idea of what sort of thing you're looking for.