Page 1 of 1
Noob question on regexp
Posted: Wed May 13, 2009 11:05 pm
by WaltBrown
Hi!
I am trying to find URLs embedded in large text files. I want to search for occurrences of "://" and "<stuff>.<stuff>.<stuff>" (three unknown chunks without spaces separated by dots to identify URLs and IP addresses that don't have "http://" appended).
Any suggestions?
Find chars "://" in field does not seem to like to search for "://", I get no results on a small test document that is full of URLs. And I cannot see how to make a regexp for the dotted address given unknown chunk contents and lengths
Thanks!
Walt
Posted: Wed May 13, 2009 11:10 pm
by WaltBrown
I got the first one to work, but any suggestions for the second form would be helpful.
Thanks,
Walt
Posted: Thu May 14, 2009 12:25 pm
by bn
Walt,
I give you my shot at it:
Code: Select all
-- from the dictionary
-- offset(charsToFind,stringToSearch[,charsToSkip])
on mouseUp
put field 1 into myVar
put "" into tCollector
put 0 into tCounter
repeat -- repeats until first offset returns 0 i.e. not found
--put offset(quote &"://",myVar,tCounter) into myHitStart -- if you start with a quote
put offset("://",myVar,tCounter) into myHitStart -- without quote
if myHitStart > 0 then
add myHitStart to tCounter
put offset(quote,myVar, tCounter) into myEnd -- looking for next quote
if myEnd = 0 then exit repeat -- second searchpattern not found
-- select char tCounter to (tCounter + myEnd) of field 1 -- if you want to look at the selection
-- wait 1 second -- if you want to look at the selection in field 1
put char tCounter to (tCounter + myEnd) of field 1 & return after tCollector -- make list of hits
add myEnd to tCounter
else
exit repeat
end if
end repeat
if tCollector <> "" then delete last char of tCollector -- return
put tCollector into field 2
end mouseUp
It gives you the a list of hits. Within these hits you will have to extract the things between the dots.
This will give you the stripped down version of the hits
Code: Select all
on mouseUp
put field 2 into myTemp
replace "://" with empty in myTemp
-- replace quote & "://" with empty in myTemp -- if you want to have a leading quote
set the itemdelimiter to "/"
repeat with i = 1 to the number of lines of myTemp
put item 1 of line i of myTemp into line i of myTemp
end repeat
put myTemp into field 2
end mouseUp
Just make 2 fields, field 1 has the source code of a html page. make two buttons and a second field. Give it a try.
regards
Bernd
Posted: Fri May 22, 2009 3:07 pm
by WaltBrown
Thanks, Bernd, I'm getting up to speed on the various flavors of regexp, especially the various need or not for some of the delimiting characters like parenthesis. You've been a great help.