[BEGINNER] I'm trying to retrieve the first 39 characters

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
shawnblc
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 342
Joined: Fri Jun 01, 2012 11:11 pm

[BEGINNER] I'm trying to retrieve the first 39 characters

Post by shawnblc » Sun May 19, 2013 4:11 am

I'm trying to retrieve the first 39 characters from a webpage. If someone could point me in the right direction, I'd appreciate it.

Code: Select all

put url ("http://domain.com/examples.php") into field "fld1" 
put characters 1 to 39 into field "fld1"

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: [BEGINNER] I'm trying to retrieve the first 39 characte

Post by Simon » Sun May 19, 2013 4:25 am

You're almost there:

Code: Select all

put char 1 to 39 of field "fld1" into field "fld1"
you need a from-to

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

shawnblc
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 342
Joined: Fri Jun 01, 2012 11:11 pm

Re: [BEGINNER] I'm trying to retrieve the first 39 characte

Post by shawnblc » Sun May 19, 2013 4:46 am

Simon wrote:You're almost there:

Code: Select all

put char 1 to 39 of field "fld1" into field "fld1"
you need a from-to

Simon
Ah. Thank you Simon. That helps a lot. So when I do something like that I need a From ----> To, then I can limit the text. Got it. Thank you again.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10331
Joined: Wed May 06, 2009 2:28 pm

Re: [BEGINNER] I'm trying to retrieve the first 39 characte

Post by dunbarx » Sun May 19, 2013 8:01 pm

Hi.

Simon showed you exactly what you were missing with your first attempt, and you both seem to call that "from...to". I see what you mean by that, and also think that you really understand it.

I just want you to go back to that first post, where you say:

put characters 1 to 39 into field "fld1"

and read this carefully. Even though just one line previously you loaded that very field with data, LiveCode will not know how to deal with this line. Characters 1 to 39 of what, exactly? In other words, you have to think like the engine does, and make sure that you can mentally parse a line of code into a sensible statement:

put characters 1 to 39 "OF SOME SOURCE OF DATA" into fld "fld1".

You had a valid target container, but you were missing an important part of a valid chunk expression.

Craig Newman

Maxiogee
Posts: 38
Joined: Thu May 05, 2011 5:45 pm

Re: [BEGINNER] I'm trying to retrieve the first 39 characte

Post by Maxiogee » Sat Sep 14, 2013 2:32 pm

I tried this when I wanted to obtain the contents of a webpage, but I got all the html coding

Is there a way to just obtain the text displayed to the viewer of a web-page?

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: [BEGINNER] I'm trying to retrieve the first 39 characte

Post by FourthWorld » Sat Sep 14, 2013 4:40 pm

Maxiogee wrote:Is there a way to just obtain the text displayed to the viewer of a web-page?
Sometimes. The challenge with web scraping is that HTML offers so much flexibility that really the only way to traverse its elements reliably is through the DOM, which would require using JavaScript in a LiveCode browser object.

But in many cases you can use this quick function to obtain the text without the head portion or styling attributes, though the result can sometimes still include body scripts, hidden div contents, etc.:

Code: Select all

function HtmlToText pHtml
   -- Save the state of the templateField:
   put the properties of the templateField into tSaveProps
   -- Set the htmlText, obtain the text:
   set the htmlText of the templateField to pHtml
   put the text of the templateField into tText
   -- Restore the state of the templateField:
   set the properties of the templateField to tSaveProps
   -- Return the text:
   return tText
end HtmlToText
Note that htmlText is not designed to be true web-ready HTML; it's designed only as a way to represent all of a LiveCode field's contents and styles in a plain-text format for easy parsing and reproduction. So expect to find many differences between HTML and htmlText that won't account for the full range of true HTML tags (and conversely, there are some htmlText tags unique to LiveCode fields that are not found in HTML, such as the threeDBox style and others).

If you do a lot of web scraping on varying pages you may prefer to use a more complete regex-based solution. But I find this function takes care of most of the tags very quickly, leaving the remainder easy enough to pull out any unwanted elements through other means if needed.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Maxiogee
Posts: 38
Joined: Thu May 05, 2011 5:45 pm

Re: [BEGINNER] I'm trying to retrieve the first 39 characte

Post by Maxiogee » Mon Sep 16, 2013 10:15 am

Thanks Richard,

I am doing the 'scraping' (great word) on the pages of a site which lists the music Top 40 for every week from 1960 to date.
I'm trying to set up a substantial cross-referenced database from the info.

I have to manually open each page, select all, copy, and then return to the LiveCode stack and a button-click strips out the unwanted lines before and after the 'meat' I am after. The number of lines is always the same and I have coded the 'meat-handling' part to strip out irrelevant text.

I would have loved to be able to set that open, select and copy but no matter how I tried it all I got was the HTML.

Aaah well, I'm already over half-way through.

Regards.
Tony

Post Reply