getting page sourcs of a dynamically created webpage

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

getting page sourcs of a dynamically created webpage

Post by magice » Thu Feb 10, 2011 8:34 pm

Ok this is a hard one for me to explain.

I need to open a web page and search for certain key words. As an experimental beginning of the script, i wrote the following code:

Code: Select all

on mouseUp
   get  the date
   convert it to dateitems
   put it into tDate
   
   put the first item of tDate into tYear
   put the second item of tDate into tMonth
   put the third item of tDate into tDay
   
   do "get url "&quote&"http://guildportal.com/Guild.aspx?GuildID=130397&TabID=1110325&SelectDay="&tDay&"SelectMonth="&tMonth&"2&SelectYear="&tYear&quote
   put it into tSource
   answer tSource
   set the text of field calSource to tSource
   find string tKeyWord in field "calSource"
   answer the foundline
end mouseUp
I assumed that the data that dumped into tSource would be the same as the source code you see if you right click the web page and choose "view page source". However, because the web page is dynamically created, much of the source code is left out. Does anyone know any tricks to having runrev get the same source code of a page that the right click/"view page source" gives you?

Klaus
Posts: 14194
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: getting page sourcs of a dynamically created webpage

Post by Klaus » Thu Feb 10, 2011 9:44 pm

Hi magice,

dynamic or not, you should get the same HTML stuff as in a browser with "view source..."!
What is missing?

Hint: No need for "do" here:

Code: Select all

...
put "http://guildportal.com/Guild.aspx?GuildID=130397&TabID=1110325&SelectDay=" & tDay & "SelectMonth=" & tMonth & "2&SelectYear=" & tYear into tUrl
## There is typo(?) in your string -> tMonth & "2&SelectYear..."
## The 2 before &SelectYear... ?
put url tUrl into tSource

## Check for possible problems
if the result <> empty then
  answer the result
 exit mouseup
end if
...
Best

Klaus

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Thu Feb 10, 2011 10:53 pm

That is what I thought, but there is actually quite a bit missing. That web page is an event calander with a list of people who have signed up for the days event. When I view the source code that table is there. When rev retrieves the code it is not. Given that you have confirmed that it should be the same, I suspect that maybe the problem is with logging in to the website. My browser has a cookie that makes the website remember me. Maybe rev is instead getting a "guest" version.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: getting page sourcs of a dynamically created webpage

Post by mwieder » Fri Feb 11, 2011 12:49 am

Actually you still have typos in your url. There are missing ampersands before the url parameters. Try

Code: Select all

get url "http://guildportal.com/Guild.aspx?GuildID=130397&TabID=1110325&SelectDay=" & tDay & \
"&SelectMonth=" & tMonth & \
"&SelectYear=" & tYear

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Fri Feb 11, 2011 1:06 am

The typos are just in the post from where I started to change the URL for privacy reasons. The URL is written right in my stack. I have also removed the "do" as was suggested. This may be one of those ideas that I have to give up on...or at least put aside for awhile.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: getting page sourcs of a dynamically created webpage

Post by mwieder » Fri Feb 11, 2011 1:31 am

Well then, I don't know what to say... I get the whole thing back when I put the right stuff in the url. Calendar info and all. I must be doing something wrong.

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Fri Feb 11, 2011 1:58 am

mwieder wrote:Well then, I don't know what to say... I get the whole thing back when I put the right stuff in the url. Calendar info and all. I must be doing something wrong.
did you get a list of names? I copied your code and pasted it in but get the same results. I'm sure I'm doing something stupid, and I will see it when I look at it with fresh eyes tomorrow.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: getting page sourcs of a dynamically created webpage

Post by mwieder » Fri Feb 11, 2011 2:49 am

I seem to be picking dates when nobody's signed up. What's a good date to try?

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Fri Feb 11, 2011 3:25 am

mwieder wrote:I seem to be picking dates when nobody's signed up. What's a good date to try?
today

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Fri Feb 11, 2011 3:42 am

OK, I see what is going on. I took the source code that runrev returns back and pasted it into a txt document then renamed it to html. I then opened it in my browser. It gives me the log in screen. If I put the url directly in the browser, it automatically logs me in and takes me to the calender page. How to get runrev to auto log me in is beyond me. I do have one idea but it is a long shot. I use Firefox as my browser. I read somewhere that runrev uses IE. If maybe I were to log into the site using IE and save the auto login cookie maybe runrev would auto log in? Anyway I just thought of that and i probably could have tested it in the time I typed it, but after 26 straight hours in front of a computer my brain isn't working that well.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: getting page sourcs of a dynamically created webpage

Post by mwieder » Fri Feb 11, 2011 4:03 am

Must be a login thing, because I see no names for today in the browser either. Try adding your login info to the beginning of the url:

Code: Select all

user:password@http://guildportal.com...
I read somewhere that runrev uses IE.
That's the revBrowser component that uses IE under the hood on Windows, but it doesn't affect libURL calls.

doc
Posts: 148
Joined: Fri Jun 09, 2006 4:30 pm

Re: getting page sourcs of a dynamically created webpage

Post by doc » Fri Feb 11, 2011 2:56 pm

Hey folks,
I think what you are running into and seeing as a problem is just the effect of AJAX programming, where the data is never actually displayed in a static html format. I doubt that you are coding anything wrong, nor is there a problem with LiveCode or the browser control... For all intent and purposes, the data really doesn't exist on that page (statically) at all, but rather it is flowed into the page as user data, without the html ever changing.

Best regards,
-Doc-

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: getting page sourcs of a dynamically created webpage

Post by mwieder » Fri Feb 11, 2011 6:25 pm

Well, there's a lot of ajax and other javascript in the html, but I *am* getting the calendar info.

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Sun Feb 13, 2011 7:50 am

OK here is where i am at. Since I couldn't get past the login screen with my previous attempts, I decided to play with the revBrowser library instead. I stole a lot of the code from the sample browser stack and here is what I have come up with:

Code: Select all

on altBrowserOn
   local tWindowId
   local sBrowserId
   local tBrowserId
   global tSource
   get  the date
   convert it to dateitems
   put it into tDate
   
   put the first item of tDate into tYear
   put the second item of tDate into tMonth
   put the third item of tDate into tDay
   put "http://guildportal.com/Guild.aspx?&GuildID=130397&TabID=1110325&SelectDay=" & tDay & "&SelectMonth=" & tMonth & "&SelectYear=" & tYear into tURL
   put the windowid of this stack into tWindowId
   put revBrowserOpen(tWindowId, tURL) into tBrowserId

 if tBrowserId is not an integer then
      answer "Error opening browser: " & tBrowserId
      exit altBrowserOn
   end if

   put tBrowserId into sBrowserId
   
   revBrowserSet sBrowserId, "showborder", true
   revBrowserSet sBrowserId, "rect", the rect of image "browserimage"
      --Here is where the problem comes in
put revBrowserGet(tBrowserId, "htmltext") into tSource
end altBrowserOn


on getSignData
   global tSource
   
   answer tSource
end getSignData
Everything works fine right up until I try to get the htmltext. When I call the getSignData with a button, it answers an empty box when I expect the source code of the displayed page. I thought at first I did something wrong in the way I wrote the revBrowserGet function, but if I have it get the url instead it returns the url properly. Is there a problem with the htmltext property?

magice
Posts: 457
Joined: Wed Mar 18, 2009 12:57 am

Re: getting page sourcs of a dynamically created webpage

Post by magice » Mon Feb 14, 2011 1:59 am

Another problem I am having with the revBrowser library, is that once I open a browser on my stack, it is on all cards. This would not be so bad, if I could navigate properly to a pertinent web page for that card. I have tried many variations of navigating to a new page or just reopening the browser with a new URL, and they have all worked to change pages. However when you click within the browser on another card, it reverts to the page I started with. I believe this is because it technically exists on the original card. Can anyone shed some light on the way around this problem?

Post Reply