Not sure if you managed to solve this, but I'm facing the same problem, and I think I've found the solution, hoping you (Or anybody) can help validate it.
It seems that revBrowser and it's ability to get the source HTML is spotty at best.
I found some times it worked, some times it didn't, and subsequent calls to the same instance returns different answers!
So, instead, (Since I don't need to actually RENDER the HTML), I'm using libURLDownloadToFile to retrieve the HTML directly from the server to a local file, which I then open, read & process.
In my process, I split the HTML by ">" then, for each item if it starts with "A" or "a", split that item by quotes (To separate the Names and Values of the attributes):
Code: Select all
  put field htmlSource into theElements 
   
   split theElements by ">"
   repeat for each element thisElement in theElements
      if char 1 to 2 of thisElement = "<A" or "<a" then
      split thisElement by space
      repeat for each element thisAttribute in thisElement
         if char 1 to 4 of thisAttribute = "href" or "HREF" then
            split thisAttribute by quote
            put thisAttribute[2] & return  after theURLS
         end if
         
      end repeat
      end if
   end repeat
   
put theURLS into field "listURLS"
Of course many of the HREF's are like "/News" and "/Comments" , etc...
So I'm adding the root URL in front to build out the entire URL:
Code: Select all
on getMyURLs
   
   put field "tbURL" into rootURL
   
   --Remove any trailing slash
   if rootURL ends with "/"  then
      put char 1 to (the length of rootURL - 1) of rootURL into rootURL
   end if
   
   set the  itemdelimiter to return
   repeat for each item thisURL in field "listURLS"
      
      --If it's already formatted, and we've not been there, grab it.
      if thisURL begins with rootURL then
         if thisURL is not among the items of field "listMyCollectedURLS" then
            put thisURL & return after field "listMyCollectedURLS"
         end if
      end if
      
      --If it's relative, fix it, if we've not been there, grab it.
      if thisURL begins with "/"  then
         put rootURL & thisURL into newURL
         if newURL is not among the items of field "listMyCollectedURLS" then
            put newURL & return after field "listMyCollectedURLS"
         end if         
      end if    
      
      if thisURL begins with "./"  then
         put rootURL & "/" & thisURL into newURL
         if newURL is not among the items of field "listMyCollectedURLS" then
            put newURL & return after field "listMyCollectedURLS"
         end if         
      end if    
      
   end repeat
end getMyURLs
So far this seems to be working, I'm now working on automating this process and then validating it against more web sites.
Of course this won't help if links are performed via anything other than pure HTML, so some sites may have incomplete scraping, but I doubt it's possible to handle ALL of those cases with any technology...
I'm VERY new to LiveCode, so I hope this helps, and if anybody has suggestions for improvements, I'm more than interested in hearing of them.
Thanks!
...Jeff