Page 1 of 2
getting page sourcs of a dynamically created webpage
Posted: Thu Feb 10, 2011 8:34 pm
by magice
Ok this is a hard one for me to explain.
I need to open a web page and search for certain key words. As an experimental beginning of the script, i wrote the following code:
Code: Select all
on mouseUp
get the date
convert it to dateitems
put it into tDate
put the first item of tDate into tYear
put the second item of tDate into tMonth
put the third item of tDate into tDay
do "get url ""e&"http://guildportal.com/Guild.aspx?GuildID=130397&TabID=1110325&SelectDay="&tDay&"SelectMonth="&tMonth&"2&SelectYear="&tYear"e
put it into tSource
answer tSource
set the text of field calSource to tSource
find string tKeyWord in field "calSource"
answer the foundline
end mouseUp
I assumed that the data that dumped into tSource would be the same as the source code you see if you right click the web page and choose "view page source". However, because the web page is dynamically created, much of the source code is left out. Does anyone know any tricks to having runrev get the same source code of a page that the right click/"view page source" gives you?
Re: getting page sourcs of a dynamically created webpage
Posted: Thu Feb 10, 2011 9:44 pm
by Klaus
Hi magice,
dynamic or not, you should get the same HTML stuff as in a browser with "view source..."!
What is missing?
Hint: No need for "do" here:
Code: Select all
...
put "http://guildportal.com/Guild.aspx?GuildID=130397&TabID=1110325&SelectDay=" & tDay & "SelectMonth=" & tMonth & "2&SelectYear=" & tYear into tUrl
## There is typo(?) in your string -> tMonth & "2&SelectYear..."
## The 2 before &SelectYear... ?
put url tUrl into tSource
## Check for possible problems
if the result <> empty then
answer the result
exit mouseup
end if
...
Best
Klaus
Re: getting page sourcs of a dynamically created webpage
Posted: Thu Feb 10, 2011 10:53 pm
by magice
That is what I thought, but there is actually quite a bit missing. That web page is an event calander with a list of people who have signed up for the days event. When I view the source code that table is there. When rev retrieves the code it is not. Given that you have confirmed that it should be the same, I suspect that maybe the problem is with logging in to the website. My browser has a cookie that makes the website remember me. Maybe rev is instead getting a "guest" version.
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 12:49 am
by mwieder
Actually you still have typos in your url. There are missing ampersands before the url parameters. Try
Code: Select all
get url "http://guildportal.com/Guild.aspx?GuildID=130397&TabID=1110325&SelectDay=" & tDay & \
"&SelectMonth=" & tMonth & \
"&SelectYear=" & tYear
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 1:06 am
by magice
The typos are just in the post from where I started to change the URL for privacy reasons. The URL is written right in my stack. I have also removed the "do" as was suggested. This may be one of those ideas that I have to give up on...or at least put aside for awhile.
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 1:31 am
by mwieder
Well then, I don't know what to say... I get the whole thing back when I put the right stuff in the url. Calendar info and all. I must be doing something wrong.
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 1:58 am
by magice
mwieder wrote:Well then, I don't know what to say... I get the whole thing back when I put the right stuff in the url. Calendar info and all. I must be doing something wrong.
did you get a list of names? I copied your code and pasted it in but get the same results. I'm sure I'm doing something stupid, and I will see it when I look at it with fresh eyes tomorrow.
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 2:49 am
by mwieder
I seem to be picking dates when nobody's signed up. What's a good date to try?
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 3:25 am
by magice
mwieder wrote:I seem to be picking dates when nobody's signed up. What's a good date to try?
today
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 3:42 am
by magice
OK, I see what is going on. I took the source code that runrev returns back and pasted it into a txt document then renamed it to html. I then opened it in my browser. It gives me the log in screen. If I put the url directly in the browser, it automatically logs me in and takes me to the calender page. How to get runrev to auto log me in is beyond me. I do have one idea but it is a long shot. I use Firefox as my browser. I read somewhere that runrev uses IE. If maybe I were to log into the site using IE and save the auto login cookie maybe runrev would auto log in? Anyway I just thought of that and i probably could have tested it in the time I typed it, but after 26 straight hours in front of a computer my brain isn't working that well.
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 4:03 am
by mwieder
Must be a login thing, because I see no names for today in the browser either. Try adding your login info to the beginning of the url:
Code: Select all
user:password@http://guildportal.com...
I read somewhere that runrev uses IE.
That's the revBrowser component that uses IE under the hood on Windows, but it doesn't affect libURL calls.
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 2:56 pm
by doc
Hey folks,
I think what you are running into and seeing as a problem is just the effect of AJAX programming, where the data is never actually displayed in a static html format. I doubt that you are coding anything wrong, nor is there a problem with LiveCode or the browser control... For all intent and purposes, the data really doesn't exist on that page (statically) at all, but rather it is flowed into the page as user data, without the html ever changing.
Best regards,
-Doc-
Re: getting page sourcs of a dynamically created webpage
Posted: Fri Feb 11, 2011 6:25 pm
by mwieder
Well, there's a lot of ajax and other javascript in the html, but I *am* getting the calendar info.
Re: getting page sourcs of a dynamically created webpage
Posted: Sun Feb 13, 2011 7:50 am
by magice
OK here is where i am at. Since I couldn't get past the login screen with my previous attempts, I decided to play with the revBrowser library instead. I stole a lot of the code from the sample browser stack and here is what I have come up with:
Code: Select all
on altBrowserOn
local tWindowId
local sBrowserId
local tBrowserId
global tSource
get the date
convert it to dateitems
put it into tDate
put the first item of tDate into tYear
put the second item of tDate into tMonth
put the third item of tDate into tDay
put "http://guildportal.com/Guild.aspx?&GuildID=130397&TabID=1110325&SelectDay=" & tDay & "&SelectMonth=" & tMonth & "&SelectYear=" & tYear into tURL
put the windowid of this stack into tWindowId
put revBrowserOpen(tWindowId, tURL) into tBrowserId
if tBrowserId is not an integer then
answer "Error opening browser: " & tBrowserId
exit altBrowserOn
end if
put tBrowserId into sBrowserId
revBrowserSet sBrowserId, "showborder", true
revBrowserSet sBrowserId, "rect", the rect of image "browserimage"
--Here is where the problem comes in
put revBrowserGet(tBrowserId, "htmltext") into tSource
end altBrowserOn
on getSignData
global tSource
answer tSource
end getSignData
Everything works fine right up until I try to get the htmltext. When I call the getSignData with a button, it answers an empty box when I expect the source code of the displayed page. I thought at first I did something wrong in the way I wrote the revBrowserGet function, but if I have it get the url instead it returns the url properly. Is there a problem with the htmltext property?
Re: getting page sourcs of a dynamically created webpage
Posted: Mon Feb 14, 2011 1:59 am
by magice
Another problem I am having with the revBrowser library, is that once I open a browser on my stack, it is on all cards. This would not be so bad, if I could navigate properly to a pertinent web page for that card. I have tried many variations of navigating to a new page or just reopening the browser with a new URL, and they have all worked to change pages. However when you click within the browser on another card, it reverts to the page I started with. I believe this is because it technically exists on the original card. Can anyone shed some light on the way around this problem?