Page 1 of 1
I have a RSS feed (xml) file that I need to extract links
Posted: Fri Jan 16, 2015 6:39 am
by shawnblc
I have a RSS feed (xml) file that I need to extract links from. This is the part where I need to extract the link (there's several link in this file):
Code: Select all
<weblink>
<![CDATA[
https://www.domainName.com/p.php?l=0&p=0056&id=171
]]>
</weblink>
Having a difficult time figuring it out, if anyone can give me a hand or point me in the right direction. Thanks.
Re: I have a RSS feed (xml) file that I need to extract link
Posted: Fri Jan 16, 2015 8:34 am
by Simon
Hi Shawn,
Isn't this
Code: Select all
put lineOffset("<
Re: I have a RSS feed (xml) file that I need to extract link
Posted: Wed Jan 28, 2015 11:16 pm
by Martin Koob
I have been working on learning regex and I thought this question would be a good one to try to see if you could extract the URLs with regex.
If the data you are looking for is always in the form <![CDATA[ the url]]> then the regex <!\[CDATA\[(.*)\]\] will capture the URL.
I put a couple of lines with urls in this format in the following code to test.
Code: Select all
on mouseUp
put "<![CDATA[https://www.domainName.com/p.php?l=0&p=0056&id=181]]>" & CR & "<![CDATA[https://www.domainName.com/p.php?l=0&p=0064&id=151]]>" into tURLtoExtract
local tStart,tEnd
put matchtext(tURLtoExtract, "<!\[CDATA\[(.*)\]\]",tURL) into tSuccess
put matchchunk(tURLtoExtract, "<!\[CDATA\[(.*)\]\]",tStart,tEnd) into tSuccess
put tSuccess into line 1 of msg
put tStart into line 2 of msg
put tEnd into line 3 of msg
put tURL into line 4 of msg
end mouseUp
MatchText will find and extract the first match and put it in tURL. This won't return subsequent matches so you would have to iterate through your text to find subsequent matches. If there is only one URL per line in your feed you could iterate for each line.
If that did not work you could also use matchChunk which returns the start and end position of the match. You could have a repeat loop that uses the end position to delete characters to that point in the text and then use matchText and matchChunk again to get the next URL.
Not sure if this will do what you want but would be interested to see if it did.
Martin