Get Data from a web page

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
lohill
Posts: 770
Joined: Tue Dec 08, 2009 6:37 pm

Get Data from a web page

Post by lohill » Tue Apr 20, 2010 12:19 am

I have a post about getting data from a webpage that required a login. No one seemed willing or able to help me there so I have modified the problem to see if anyone will respond. Now the problem is just to get the data from a table in a specific web page.
Here is the URL: http://www.investors.com/StockResearch/ ... ymbol=AAPL

When I go to that page in Safari and ask to look at the 'Source', I can see the data in the SmartSelect Ratings. The following is an excert of the source code for the table I want:
<table class="smartSelectTable">
<thead>
<tr>
<th scope="col" class="type">
</th>
<th scope="col" class="rating">
Rating
</th>
<th scope="col" class="ibdTest">
Checklist
</th>
</tr>
</thead>
<tbody>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=Composite Rating SmartSelect">
Composite Rating</a>
</td>
<td class="rating">
<span>
99</span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=EPS Rating">
EPS Rating</a>
</td>
<td class="rating">
<span>
98</span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=Relative Price Strength (RS) Rating or Relative Strength">
RS Rating</a>
</td>
<td class="rating">
<span>
82</span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=Industry Group Relative Strength Letter Rating (Group RS)">
Group RS Rating</a>
</td>
<td class="rating">
<span>
A </span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=SMR Rating">
SMR Rating</a>
</td>
<td class="rating">
<span>
A </span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=Accumulation/Distribution (Acc/Dis) Rating">
Acc/Dis Rating</a>
</td>
<td class="rating">
<span>
B </span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

</tbody> </table>
When I examine that data I can clearly see the headings and data for Composite Rating, EPS Rating, RS Rating, Group RS Rating, SMR Rating and Acc/Dis Rating.

When I use REV to get that data (and believe me I have tried) the data that I get in the retrieve only shows that the table has 1 line - that of EPS Rating.

Her is some code that I have used:

Code: Select all

on mouseUp
   libUrlFollowHttpRedirects true
   put empty into field "LogField"
   libURLsetLogField "LogField"
   put"http://www.investors.com/StockResearch/Quote.aspx?symbol=AAPL" into tUrl
   get url tUrl
   put it into myRetrieve
     answer myRetrieve
   else
      answer the result
   end if
end mouseUp
The code runs but the display of myRetrieve shows only the EPS Rating and its value. This is what I have exerted from myRetrieve that shows the structure of the table I get:
<table class="smartSelectTable">
<thead>
<tr>
<th scope="col" class="type">
</th>
<th scope="col" class="rating">
Rating
</th>
<th scope="col" class="ibdTest">
Checklist
</th>
</tr>
</thead>
<tbody>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=EPS Rating">
EPS Rating</a>
</td>
<td class="rating">
<span>
98</span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

</tbody> </table>
Another attempt that I made used a call to the terminal and the bash shell. The REV code looked sort of like this:

Code: Select all

on mouseUp
      put "http://www.investors.com/StockResearch/Quote.aspx?symbol=AAPL" into tURL
      put BashGetData(turl) into myRetrieve
       answer myRetrieve
end  mouseup

function BashGetData tURL
   put "stringX=$(curl -X GET " & tURL & ");echo $stringX" into tBash
   put shell(tBash) into tBashText
   return tBashText
end BashGetData
Again myRetrieve displayed only the EPS Rating and the code showed the structure of the table to be just like the the previous one.
<table class="smartSelectTable">
<thead>
<tr>
<th scope="col" class="type">
</th>
<th scope="col" class="rating">
Rating
</th>
<th scope="col" class="ibdTest">
Checklist
</th>
</tr>
</thead>
<tbody>

<tr>
<td class="type">
<a class="glossDef" href="javascript:void(0);" rel="Term.axd?term=EPS Rating">
EPS Rating</a>
</td>
<td class="rating">
<span>
98</span>
</td>
<td class="ibdTest pass">
<img class="FSIcons" src="http://www1.ibdcd.com/images/icons/Pass.gif" width="11" height="12" alt="Pass" />
</td>
</tr>

</tbody> </table>
I really would like some help in getting this solved. It is driving me nutty.

Thanks,
Larry

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: Get Data from a web page

Post by mwieder » Tue Apr 20, 2010 12:50 am

Larry-

You've mentioned things like SMR Rating and Group RS Rating - I don't know where you're getting those, but they're not on the web page you list. There's very little in the SmartSelect table other than EPS rating, and that's probably why that's all you're pulling out.

lohill
Posts: 770
Joined: Tue Dec 08, 2009 6:37 pm

Re: Get Data from a web page

Post by lohill » Tue Apr 20, 2010 5:17 pm

Thanks for that observation mwwieder. It was exactly what I needed. I went to my wife's computer and got exactly what you described. That means that my computer has permissions to see the data from the browser that hers (and yours) cannot see. Now if I can just make REV figure out how to act like my browser.

Larry

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: Get Data from a web page

Post by mwieder » Tue Apr 20, 2010 6:01 pm

It does sound like a permissions problem. If you're using the Enterprise version of rev you can use https instead of http (with the appropriate user/password info, of course). Otherwise my guess is that you're probably out of luck unless you want to create your own SSL authentication library.

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Get Data from a web page

Post by sturgis » Tue Apr 20, 2010 7:47 pm

As a short term workaround, you could try using revbrowser to get the page. Don't even have to show it, just start a revbrowser instance, wait for the page to complete loading (browserDocumentComplete) and then use revbrowserget(browserInstanceID,"htmltext") to get the source, and close the rev browser Instance.

The only provision is, if whatever permissions you have for that URL expire or timeout or whatever requiring a new login, and the revbrowser instance isn't visible, you wouldn't know it.

In that situation the htmltext would differ in ways that could be detected thereby allowing you to redirect to your account login page, show the browser so login can be completed, then hide it again and proceed as before.

Just a thought. Also, as far as using liburl to set the headers properly you might consider using firefox with the live http headers addon while figuring things out. It might help you get over the hump as far as determining how to set your headers up so you can work the site with liburl. A live working example is always a good thing.

lohill
Posts: 770
Joined: Tue Dec 08, 2009 6:37 pm

Re: Get Data from a web page

Post by lohill » Wed Apr 21, 2010 6:34 pm

Thanks for the input. I do have REV Enterprise so I tried mwieder's suggestion for https. When it ran, I got a socket error. The following code (with just http) runs fine but the variable myRetrieve only contains the EPS Rating like I would get on a browser with no privs. (I have disguised my userName and password.)

Code: Select all

on mouseUp
    libUrlFollowHttpRedirects true
   put empty into field "LogField"
   libURLsetLogField "LogField"
   put urlEncode("userName") into tUser
   put urlEncode("password") into tPW
   put "http://" & tUser & ":" & tPW & "@www.investors.com/StockResearch/Quote.aspx?symbol=AAPL" into myURL
   put url myURL into myRetrieve
end mouseUp
If I just change the http to https I get the socket error which the log shows as:
socket selected: http://www.investors.com:443|6928
GET /StockResearch/Quote.aspx?symbol=AAPL HTTP/1.1

Host: http://www.investors.com

User-Agent: Revolution (MacOS)

Authorization: Basic bG9oaWxsQGNveC5uZXQ6bWF0dGll


socket error http://www.investors.com:443|6928
-Error with certificate at depth: 0 issuer = /C=US/O=Equifax Secure Inc./CN=Equifax Secure Global eBusiness CA-1 subject = /C=US/O=www.investors.com/OU=GT53336951/OU=See http://www.geotrust.com/resources/cps (c)09/OU=Domain Control Validated - QuickSSL(R)/CN=www.investors.com err 20:unable to get local issuer certificate
As to sturgis' suggestion. I have installed Live HTTP Headres to see what I can learn about what is going on. Wow - what a flood of information. There is not much in the way of documentation that I have been able to find (just some screen shots) but I think I can see when the server (HTTP) is talking and when the client (GET) is talking. I have saved the data from a run where my browser had privs and a run where the browser did have privs. There is a difference in what is logged and the privileged run has something called CSUserCookie in the GET or client statement. I can show it if it is helpful. I don't know where it comes from or whether it is something that constantly changes.

As to the suggestion of revBrowser - I have not used that before. It that the way I should be attacking things? Is there any documentation besides the dictionary. Does any one have a sample program that does what I am trying to do? I keep learning more all the time but with the shotgun approach I never know whether I'm going in the right direction or not.

Thanks for any ideas.
Larry

sturgis
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 1685
Joined: Sat Feb 28, 2009 11:49 pm

Re: Get Data from a web page

Post by sturgis » Wed Apr 21, 2010 8:11 pm

As for whether revbrowser is the best way to go, the answer is.. it depends. Having live feedback is a great thing sometimes. For example, I have a stack that scrapes episode information from a tv service that requires login. In my case, since I use revbrowser to display the actual episode, it was easiest for me to just integrate the 2 things. If I try to go to a page and my session has timed out, I can log right in on from the browser window and go on to my tv show/episode list/whatever. No need to mess with headers to get things functional with liburl.
Howevever, if all I wanted to do is gather the data i'd probably use liburl and do everything behind the scenes. Once you get things working it should be pretty solid and fast and you won't have to mess with the multitude of quirks associated with the revbrowser external.


By the way, not to hijack this thread, but I finally discovered a way to do a fullscreen switch with revbrowser that works the way I want, and without requiring the libkiosk external. I have 1 stack with basically nothing in it set invisible, fullscreen it, then lay my browser stack over the top of it set to the screen rect. This gets rid of the doc and menu bar but still allows user override with app switching hotkeys at which point behavior can be defined with suspendstack and resumestack. This also avoids the problem of the changing windowid of the browserstack since its not the stack that is being fullscreened.

lohill
Posts: 770
Joined: Tue Dec 08, 2009 6:37 pm

Re: Get Data from a web page

Post by lohill » Thu Apr 22, 2010 6:10 pm

There is something I have forgotten to mention on this subject. In the VBA environment of MS Excel on Windows I have been able to get the data I need from the IBD webpage. In fact I have done it in two different ways. The first method uses oHTTP and has VBA code that looks somewhat like this:

Code: Select all

    sURL = "http://www.investors.com/StockResearch/Quote.aspx?symbol=AAPL"
    Dim oHTTP As New XMLHTTP
    oHTTP.Open sURL
    oHTTP.Send
   myRetrieve = oHTTP.responseText
myRetrieve can then be parsed to pick out the data I want. The second method is not quite as pretty but is still effective. It uses code somewhat like the following:

Code: Select all

    With ActiveSheet.QueryTables.Add(Connection:="URL;http://www.investors.com/StockResearch/Quote.aspx?symbol=AAPL", Destination:=Range("AA1"))
        .BackgroundQuery = True
        .TablesOnlyFromHTML = True
        .Refresh BackgroundQuery:=False
        .SaveData = False
        .FieldNames = False
    End With
In this latter case I would be able to find, for example, the Group RS Rating in Cell AB21 of the worksheet. No parsing necessary. In both of these cases in Excel I would have previously had to have logged into the IBD website and instructed the browser to 'remember me on this computer'. Actually the way they phrase it is 'Keep me signed in'. It is not necessary to keep the browser open after that in order for these methods to work.

I did just take a quick look at revBrowser by looking at the sample program called Internet.rev. I tried my URL in the example 'Display source of a web page and returned the limited data that I would get if I were not a member of investors.com. My heart skipped a beat when I tried the sample 'Render a web page' with my URL. I could see all of the data I was looking for as though I were a logged-in user. Then I realized that I was just looking at an image and there was no way I would be able to parse out the data I wanted.

Although I can get the data in Excel for Windows, I still would like to be able to do it in REV. That would give me a cross platform solution which is the reason to go to REV in the first place. I won't be giving up soon on solving this so, until I hear from an expert that it is impossible, I'll keep trying. I only wish an expert would give me some guidance.

Thanks,
Larry

lohill
Posts: 770
Joined: Tue Dec 08, 2009 6:37 pm

Re: Get Data from a web page

Post by lohill » Thu Apr 22, 2010 9:15 pm

I have made some changes in my script for getting to the data. Here is what the log field looks like:
socket selected: 63.71.211.170:80|6932
GET /StockResearch/Quote.aspx?symbol=AAPL HTTP/1.1

Host: http://www.investors.com

User-Agent: Revolution (MacOS)

Authorization: Basic bG9oaWxsQGNveC5uZXQ6bWF0dGll


HTTP/1.1 200 OK

Cache-Control: public,no-cache,no-store,max-age=0,must-revalidate,proxy-revalidate

Date: Thu, 22 Apr 2010 20:03:13 GMT

Content-Length: 154512

Content-Type: text/html; charset=utf-8

Server: Microsoft-IIS/6.0

SID: 17

X-Powered-By: ASP.NET

X-AspNet-Version: 2.0.50727

CommunityServer: 4.0.30417.1769

Set-Cookie: CSUserCookie=2101; domain=.investors.com; path=/

Set-Cookie: CommunityServer-UserCookie2101=lv=Fri, 01 Jan 1999 00:00:00 GMT&mra=Thu, 22 Apr 2010 13:03:12 GMT; expires=Fri, 22-Apr-2011 20:03:12 GMT; path=/

Set-Cookie: CommunityServer-LastVisitUpdated-2101=; path=/

Set-Cookie: CommunityServer-UserCookie2101=lv=Fri, 01 Jan 1999 00:00:00 GMT&mra=Thu, 22 Apr 2010 13:03:12 GMT; expires=Fri, 22-Apr-2011 20:03:12 GMT; path=/

Set-Cookie: ASP.NET_SessionId=u45lwgqo05jbub55xphsk53v; path=/; HttpOnly

Pragma: no-cache

Expires: Thu, 22 Apr 2010 20:03:13 GMT


socket selected: 63.71.211.170:80|6932
GET /StockResearch/Quote.aspx?symbol=AAPL HTTP/1.1

Host: http://www.investors.com

User-Agent: Revolution (MacOS)

Cookie: CSUserCookie=2101;CommunityServer-UserCookie2101=lv=Fri, 01 Jan 1999 00:00:00 GMT&mra=Thu, 22 Apr 2010 13:03:12 GMT;CommunityServer-LastVisitUpdated-2101=;CommunityServer-UserCookie2101=lv=Fri, 01 Jan 1999 00:00:00 GMT&mra=Thu, 22 Apr 2010 13:03:12 GMT;ASP.NET_SessionId=u45lwgqo05jbub55xphsk53v


HTTP/1.1 200 OK

Cache-Control: public,no-cache,no-store,max-age=0,must-revalidate,proxy-revalidate

Date: Thu, 22 Apr 2010 20:03:14 GMT

Content-Length: 154512

Content-Type: text/html; charset=utf-8

Server: Microsoft-IIS/6.0

SID: 17

X-Powered-By: ASP.NET

X-AspNet-Version: 2.0.50727

CommunityServer: 4.0.30417.1769

Set-Cookie: CSUserCookie=2101; domain=.investors.com; path=/

Set-Cookie: CommunityServer-UserCookie2101=lv=Fri, 01 Jan 1999 00:00:00 GMT&mra=Thu, 22 Apr 2010 13:03:13 GMT; expires=Fri, 22-Apr-2011 20:03:13 GMT; path=/

Pragma: no-cache

Expires: Thu, 22 Apr 2010 20:03:14 GMT
Even though I have put together an answering cookie, it is still not treating me as a privileged user so I don't have my data yet. At least it runs through without any errors.

Perhaps someone will notice something in the log that is a clue as to why I'm not getting the results I want. I can include the script too if that will be helpful.

Thanks in advance,
Larry

Post Reply