Sorting data
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
Sorting data
Hi, I have a large amount of data in a text box brought down from the Pubmed database that looks something like this....
Junk text
Results: 1 to 20 of 199496
Select item 244094151.
Cloning of Soluble Human Stem Cell Factor in pET-26b(+) Vector.
Asghari S, Shekari Khaniani M, Darabi M, Mansoori Derakhshan S.
Adv Pharm Bull. 2014;4(1):91-5. doi: 10.5681/apb.2014.014. Epub 2013 Dec 23.
PMID:
<< First< Prev
Junk text
How would I go about separating the first five results from the junk text (its all going to be variable as it is brought down from the online database) and put each entry in five separate text boxes?
I look forward to hearing from you! Thanks for your help I greatly appreciate it!
Junk text
Results: 1 to 20 of 199496
Select item 244094151.
Cloning of Soluble Human Stem Cell Factor in pET-26b(+) Vector.
Asghari S, Shekari Khaniani M, Darabi M, Mansoori Derakhshan S.
Adv Pharm Bull. 2014;4(1):91-5. doi: 10.5681/apb.2014.014. Epub 2013 Dec 23.
PMID:
<< First< Prev
Junk text
How would I go about separating the first five results from the junk text (its all going to be variable as it is brought down from the online database) and put each entry in five separate text boxes?
I look forward to hearing from you! Thanks for your help I greatly appreciate it!
Re: Sorting data
Hi.
I see Klaus has addressed this in your other post, so i will just tell you a couple of things here.
LC has the ability to "find" data in a block of text, either by word, line or item (or even by character). The hard part is trusting how that block of text is organized. In your example, my first question would be:
"are the last five lines always present in the same way, below the junk text"? If so, you can do something like:
put line -5 to the number of lines of blockOfText into field "yourResults"
Or maybe the string "results:" itself (with colon) only occurs once in the block of text in that specific location. In that case, as Klaus mentioned, the "wordOffset" is your friend.
My point here is that LC has the tools, but the uncertainty is in the formatting of the raw text. Are there aspects of that format that you can rely on? For example, if "results:" appears elsewhere in the text, the method itself will break. This is an example of how to parse data. You need reliable markers to be able to take advantage of the tools.
Craig Newman
I see Klaus has addressed this in your other post, so i will just tell you a couple of things here.
LC has the ability to "find" data in a block of text, either by word, line or item (or even by character). The hard part is trusting how that block of text is organized. In your example, my first question would be:
"are the last five lines always present in the same way, below the junk text"? If so, you can do something like:
put line -5 to the number of lines of blockOfText into field "yourResults"
Or maybe the string "results:" itself (with colon) only occurs once in the block of text in that specific location. In that case, as Klaus mentioned, the "wordOffset" is your friend.
My point here is that LC has the tools, but the uncertainty is in the formatting of the raw text. Are there aspects of that format that you can rely on? For example, if "results:" appears elsewhere in the text, the method itself will break. This is an example of how to parse data. You need reliable markers to be able to take advantage of the tools.
Craig Newman
Re: Sorting data
Hi Craig, thank you I think I am beginning to see the wood from the trees, Results: does only appear in that block of text the one time. Still not quite sure how to write an offset function for this though, I had a try but it doesn't seem to work with the code given in my other post.
if matchText( yourText, "(?ms)'Select item 244094102.\n(.*?)\nPMID'", gotit) then
put gotit
else
put "Not found!"
end if
I'm not sure I was doing it right thought as I put the code behind a button after the find "244""in field "Results2" end mouseUp code?
Results2 is where the big block of text is now.
Hope you can help and aren't as confused as me now! Whytey
if matchText( yourText, "(?ms)'Select item 244094102.\n(.*?)\nPMID'", gotit) then
put gotit
else
put "Not found!"
end if
I'm not sure I was doing it right thought as I put the code behind a button after the find "244""in field "Results2" end mouseUp code?
Results2 is where the big block of text is now.
Hope you can help and aren't as confused as me now! Whytey
Re: Sorting data
I'm super new to this as I'm sure you can tell....trust me to start with a REALLY DIFFICULT APP!!! 

Re: Sorting data
Umm, you could try to change the 2 "\n" in the regular expression by "." ( a dot.Whytey wrote:Still not quite sure how to write an offset function for this though, I had a try but it doesn't seem to work with the code given in my other post.
if matchText( yourText, "(?ms)'Select item 244094102.\n(.*?)\nPMID'", gotit) then
put gotit
else
put "Not found!"
end if
(Don't know how are your newlines set in your text.)
Code: Select all
matchText( field "Results2", "(?ms)'Select item 244094102..(.*?).PMID'", gotit)
Why you are not posting your script?I'm not sure I was doing it right thought as I put the code behind a button after the find "244""in field "Results2" end mouseUp code?
It's a bit hard to follow you here - at least hard for me

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
Re: Sorting data
Hi Thierry, I don't really have any code to show you as I'm still trying to figure out how to do this.
I can select certain lines from the text using the handler
# select a certain line in text.
but this isn't really useful as it is a huge set of text and the word Results: is not going to appear in the same place every time.
I have also managed to find and select the word Results: using
# Find a certain word and select
but this is just the word Results:
what I really am looking for is a way to select the text from the word Results: to the word PMID: if that is possible (I guess this is similar to the select lines code I mentioned above) and copy this text into the box "Box1". I have looked at the offset function but it seems to tell me how many times the word Results is in the text rather than allowing me to select and copy (I may just not be quite understanding this function yet though, perhaps you could explain it a little more to me)?
Still confused....
Thanks for your help though, this forum is great! 
I can select certain lines from the text using the handler
# select a certain line in text.
Code: Select all
on mouseUp
select word 140-159 of field"Results2"
end mouseUp
I have also managed to find and select the word Results: using
# Find a certain word and select
Code: Select all
on mouseUp
find "Results:" in field "Results2"
select the foundChunk
end mouseUp
what I really am looking for is a way to select the text from the word Results: to the word PMID: if that is possible (I guess this is similar to the select lines code I mentioned above) and copy this text into the box "Box1". I have looked at the offset function but it seems to tell me how many times the word Results is in the text rather than allowing me to select and copy (I may just not be quite understanding this function yet though, perhaps you could explain it a little more to me)?
Still confused....


Re: Sorting data
If I understand the problem right:
And if you're brave, this could also be condensed into:
Code: Select all
put fld "source" into tText
put wordoffset("Results:",tText) into tStart
put wordoffset("PMID:",tText) into tEnd
put word tStart to tEnd of tText into fld "box1"
Code: Select all
put word wordoffset("Results:",tText) to wordoffset("PMID:",tText) of fld "source" into fld "box1"
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
Re: Sorting data
IT WORKED!!
THANK YOU SO MUCH JACQUE!!!!



Re: Sorting data
Oh good. Now you're proficient in the offset command. 

Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
Re: Sorting data
Lets not get ahead of ourselves 

Re: Sorting data
Hi Whytey,jacque wrote:Code: Select all
put fld "source" into tText put wordoffset("Results:",tText) into tStart put wordoffset("PMID:",tText) into tEnd put word tStart to tEnd of tText into fld "box1"
Glad that jacque pass by during my sleeping time

For the record, the pattern of my previous regex can't work in your case because
of extra quotes in the pattern which are not in your entry text.
So, using jacque piece of code as a definition, it would be rewritten as:
Code: Select all
get matchText(tText, "(?ms)Results:(.*?)PMID:", tStart2tEndOfText)
put tStart2tEndOfText into fld "box1"
put wordoffset("Results:",tText) into tStart
put wordoffset("PMID:",tText) into tEnd
matchText(tText, "(?ms)Results:(.*?)PMID:", tStart2tEndOfText)
and
put word tStart to tEnd of tText into fld "box1"
matchText(tText, "(?ms)Results:(.*?)PMID:", tStart2tEndOfText)
Few different ways to code; just go with what makes you feel confident..
Happy coding with Livecode

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
Re: Sorting data
Hi Jacque, thanks for your message, Ive ground to a halt again! So I am now trying to put the next entry down the list into box 2, the first entry (Cloning of Soluble Human Stem Cell Factor) goes in box one perfectly.
The text is as follows:
Select item 244094151.
Cloning of Soluble Human Stem Cell Factor in pET-26b(+) Vector.
Asghari S, Shekari Khaniani M, Darabi M, Mansoori Derakhshan S.
Adv Pharm Bull. 2014;4(1):91-5. doi: 10.5681/apb.2014.014. Epub 2013 Dec 23.
PMID:
24409415
[PubMed]
Related citations
Select item 244094102.
Nucleostemin depletion induces post-g1 arrest apoptosis in chronic myelogenous leukemia k562 cells.
Seyed-Gogani N, Rahmati M, Zarghami N, Asvadi-Kermani I, Hoseinpour-Feyzi MA, Moosavi MA.
Adv Pharm Bull. 2014;4(1):55-60. doi: 10.5681/apb.2014.009. Epub 2013 Dec 23.
PMID:
I've twiddled with the code slightly it is now:
but this only returns the number 24409415 in box 2. Where am I going wrong?
Thanks all for helping you guys rock!

The text is as follows:
Select item 244094151.
Cloning of Soluble Human Stem Cell Factor in pET-26b(+) Vector.
Asghari S, Shekari Khaniani M, Darabi M, Mansoori Derakhshan S.
Adv Pharm Bull. 2014;4(1):91-5. doi: 10.5681/apb.2014.014. Epub 2013 Dec 23.
PMID:
24409415
[PubMed]
Related citations
Select item 244094102.
Nucleostemin depletion induces post-g1 arrest apoptosis in chronic myelogenous leukemia k562 cells.
Seyed-Gogani N, Rahmati M, Zarghami N, Asvadi-Kermani I, Hoseinpour-Feyzi MA, Moosavi MA.
Adv Pharm Bull. 2014;4(1):55-60. doi: 10.5681/apb.2014.009. Epub 2013 Dec 23.
PMID:
I've twiddled with the code slightly it is now:
Code: Select all
on mouseUp
put fld "Results2" into tText
put wordoffset("244",tText) into tStart
put wordoffset("PMID:",tText) into tEnd
put word tStart to tEnd of tText into fld "box1"
delete word tStart to tEnd of tText
put wordoffset("244",tText) into tStart
put wordoffset("item",tText) into tEnd
delete word tStart to tEnd of tText
put wordoffset("244",tText) into tStart
put wordoffset("PMID:",tText) into tEnd
put word tStart to tEnd of tText into fld "box2"
end mouseUp

Thanks all for helping you guys rock!


Re: Sorting data
Sorry Thierry that was to you (and Jacque too) LOL!!! coding getting to my head! 

Re: Sorting data
I was wondering if that would happen, the example I gave will only find a single instance as you discovered. To find all instances you'll want a repeat loop that uses the third, optional "skip" parameter. That tells the offset function how much text to skip before it looks up the next instance. Note that the reported result is not the distance from character 1, it is the distance from the last-found instance. So you must add the length of all previous text to the found integer in order to get the correct position where the current lookup occurs.
Here's one way to do it:
I used "select" as the lookup term rather than 244 for a couple of reasons. The number will eventually change as the number of indexed articles grows. That may be far enough into the future that it won't affect your script, but it's more robust not to rely on it. The other, better reason is that "244" occurs twice after the first article, and I assume you only want the actual text of the second entry without the header above it. If you don't want the words "Select item" included in your results, you can delete the first two words of the found text before putting it into the array. Or just add 2 to tStart position after it's initially located.
This script collects the entries into an array. You'll need to place each one into a field. If your fields are named consistently, like "box 1" and "box 2" then it shouldn't be hard to use the array keys to construct and/or identify the correct field name that should hold the text. But if you need help with that, let us know.
Here's one way to do it:
Code: Select all
on lookup
put fld "source" into tText
put 0 into tSkip
put 1 into x
put the number of words in tText into tLength
repeat
put wordoffset("Select",tText,tSkip) + tSkip into tStart
if tStart >= tLength then exit repeat
put wordoffset("PMID:",tText,tSkip) + tSkip into tEnd
put word tStart to tEnd of tText into aFound[x]
add 1 to x
add tEnd to tSkip
end repeat
end lookup
This script collects the entries into an array. You'll need to place each one into a field. If your fields are named consistently, like "box 1" and "box 2" then it shouldn't be hard to use the array keys to construct and/or identify the correct field name that should hold the text. But if you need help with that, let us know.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
Re: Sorting data
Forgot to mention, you are getting the odd "244" line in field 2 because that's what follows the "PMID" line, so it matches a "244" lookup. But I'm wondering if the format of the text actually has that ID number on a separate line, or does it follow immediately after the "PMID:" ? That is, does it look like this:
Select item 244094151.
Cloning of Soluble Human Stem Cell Factor in pET-26b(+) Vector.
Asghari S, Shekari Khaniani M, Darabi M, Mansoori Derakhshan S.
Adv Pharm Bull. 2014;4(1):91-5. doi: 10.5681/apb.2014.014. Epub 2013 Dec 23.
PMID: 24409415
[PubMed]
Related citations
Select item 244094102.
Nucleostemin depletion induces post-g1 arrest apoptosis in chronic myelogenous leukemia k562 cells.
Seyed-Gogani N, Rahmati M, Zarghami N, Asvadi-Kermani I, Hoseinpour-Feyzi MA, Moosavi MA.
Adv Pharm Bull. 2014;4(1):55-60. doi: 10.5681/apb.2014.009. Epub 2013 Dec 23.
PMID:
Because if so, there's an easier way to parse out the entries than using offset.
Select item 244094151.
Cloning of Soluble Human Stem Cell Factor in pET-26b(+) Vector.
Asghari S, Shekari Khaniani M, Darabi M, Mansoori Derakhshan S.
Adv Pharm Bull. 2014;4(1):91-5. doi: 10.5681/apb.2014.014. Epub 2013 Dec 23.
PMID: 24409415
[PubMed]
Related citations
Select item 244094102.
Nucleostemin depletion induces post-g1 arrest apoptosis in chronic myelogenous leukemia k562 cells.
Seyed-Gogani N, Rahmati M, Zarghami N, Asvadi-Kermani I, Hoseinpour-Feyzi MA, Moosavi MA.
Adv Pharm Bull. 2014;4(1):55-60. doi: 10.5681/apb.2014.009. Epub 2013 Dec 23.
PMID:
Because if so, there's an easier way to parse out the entries than using offset.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com