Page 1 of 1

I have a RSS feed (xml) file that I need to extract links

Posted: Fri Jan 16, 2015 6:39 am
by shawnblc
I have a RSS feed (xml) file that I need to extract links from. This is the part where I need to extract the link (there's several link in this file):

Code: Select all

<weblink>
<![CDATA[
https://www.domainName.com/p.php?l=0&p=0056&id=171
]]>
</weblink>
Having a difficult time figuring it out, if anyone can give me a hand or point me in the right direction. Thanks.

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Fri Jan 16, 2015 8:34 am
by Simon
Hi Shawn,
Isn't this

Code: Select all

put lineOffset("<![CDATA[",myXML) into myVar
add 1 to myVar
Then you get to use "lines to skip"


Simon

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Fri Jan 16, 2015 4:39 pm
by shawnblc
Hmmm. Not having any luck. I'll continue trying and post some code.

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Fri Jan 16, 2015 5:12 pm
by shawnblc
Ok. I went with the php file instead of the XML file and almost have what I need. Getting close. Here's some code, any help is greatly appreciated.
-- Things I need to do
A) loop through fld "fld1" and find a random link
B) see the second block of code, I need to find the string + 4 char
* the second code block is what I'm trying to achieve, but obviously doesn't work with my way of thinking.

Code: Select all

on mouseUp
   put URL "http://mydomain.com/rss.php" into tURL
   put tURL into fld "fld1"
   find string "https://www.myotherdomain.com/show.php?l=0&u=17156&id=" in fld "fld1"
   put the foundText into tFound
   put tFound into fld "fld2"
end mouseUp
This is what I'd like

Code: Select all

on mouseUp
   put URL "http://mydomain.com/rss.php" into tURL
   put tURL into fld "fld1"
   find random string "https://www.myotherdomain.com/show.php?l=0&u=17156&id=" & + 4 char in fld "fld1"
   put the foundText into tFound
   put tFound into fld "fld2"
end mouseUp

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Fri Jan 16, 2015 6:28 pm
by mattmaier
So you know how you want the target link to start? Maybe the "begins with" function will help http://livecode.com/developers/api/6.0. ... ns%20with/

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Fri Jan 16, 2015 7:02 pm
by shawnblc
I can find instances of the URL using this (although, not all of them in one swoop), but I need the next few characters too, which will always change, but always be 5 digits.

Code: Select all

on mouseUp
   find characters "https://www.mydomain.com/rss.php?z=0&p=156&id="
end mouseUp

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Sat Jan 17, 2015 3:27 am
by Simon
Hi shawn,
Can you post some of your XML/PHP whatever is returned?

And never loop through a field (way slow) always stick it into a variable and loop that.

Simon

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Thu Jan 22, 2015 4:29 pm
by MaxV
This page could help you, it explains you how to create an RSS feed reader using Livecode XML functions: http://livecodeitalia.blogspot.it/2014/ ... e-rss.html
Use the google translate butoon on the right to translate in your language. :D

Re: I have a RSS feed (xml) file that I need to extract link

Posted: Wed Jan 28, 2015 11:16 pm
by Martin Koob
I have been working on learning regex and I thought this question would be a good one to try to see if you could extract the URLs with regex.

If the data you are looking for is always in the form <![CDATA[ the url]]> then the regex <!\[CDATA\[(.*)\]\] will capture the URL.

I put a couple of lines with urls in this format in the following code to test.

Code: Select all

on mouseUp
   put "<![CDATA[https://www.domainName.com/p.php?l=0&p=0056&id=181]]>"  & CR & "<![CDATA[https://www.domainName.com/p.php?l=0&p=0064&id=151]]>" into tURLtoExtract
   local tStart,tEnd
   put matchtext(tURLtoExtract, "<!\[CDATA\[(.*)\]\]",tURL) into tSuccess
   put matchchunk(tURLtoExtract, "<!\[CDATA\[(.*)\]\]",tStart,tEnd) into tSuccess
   put tSuccess into line 1 of msg
   put tStart into line 2 of msg
   put tEnd into line 3 of msg
   put tURL into line 4 of msg
end mouseUp
MatchText will find and extract the first match and put it in tURL. This won't return subsequent matches so you would have to iterate through your text to find subsequent matches. If there is only one URL per line in your feed you could iterate for each line.

If that did not work you could also use matchChunk which returns the start and end position of the match. You could have a repeat loop that uses the end position to delete characters to that point in the text and then use matchText and matchChunk again to get the next URL.

Not sure if this will do what you want but would be interested to see if it did.

Martin