Page 1 of 2
Just One Array Intersect Question
Posted: Fri Jan 18, 2008 5:51 pm
by deeverd
Hello again,
In my quest to come up with the fastest way to receive a number's only return when matching the contents of a text file to a database list, I am now convinced that using an intersect command is the quickest way, however...
Here's the problem:
I first turn my text file into an array, and then I turn my database list into an array, and then I do the following:
Code: Select all
intersect textArray with databaseArray.
All this works fine, but so far all I can get a return on is the number of keys in the textArray that match with the databaseArray. What I really need to get is the amount of items/contents that remain inside each of those keys that are inside the keys of the new intersected array.
According to the Rev documentation, the contents of my keys that match should remain unchanged from the original, but no matter how many hundreds of times I tried, I can't find a way to get a return on the number of the content of items that are shelved inside those keys, or to put the number of the items found inside each of those keys into some counter field. I know there has to be an easy way, but so far it's done a great job of eluding me.
Help! Thanks, deeverd
Posted: Fri Jan 18, 2008 7:41 pm
by Mark
Hi Deeverd,
I don't know what you are trying to do, but maybe this helps:
Code: Select all
put the number of lines of the keys of mySomeArray
Best,
Mark
Posted: Fri Jan 18, 2008 8:24 pm
by deeverd
Hi Mark,
Thanks, but that's exactly the script I tried, plus more than a hundred variations of it, including "items," "words" and anything else I could think of.
Code: Select all
put the number of lines of the keys of myTextArray
All this script does is to return a number of the keys, but it does not return a number of the amount of contents that are found in each of those keys.
Currently, my workaround solution to this problem is to make a list of the words that intersected, and then put them into a repeat loop that looks at a copy of the text, and then replaces each of those intersect words in the text with empty. I then just compare the beginning word count of the original text to the word count after replacing those matched words with empty to come up with a number that tells me how many total matches were actually made. So I have a way that works pretty fast, but I was certainly hoping for a way to count all the contents of a key after intersecting it.
Thanks for trying. All the best, deeverd
Posted: Fri Jan 18, 2008 11:45 pm
by Mark
Deeverd,
Maybe this?
Code: Select all
put the number of words of the keys of myArray into myNrOfWords
combine myArray by return and tab -- or other delimiters
put (the number of words of myArray) - myNrOfWords into myNrOfWords
split myArray by return and tab
At the end of this little code snippet, the variable myNrOfWords contains the total number of words of all elements of the array.
If the data contain tabs or returns, you need to use different delimiters (I often use numToChar(4) and numToChar(5) for instance.
Best,
Mark
Posted: Tue Jan 22, 2008 5:21 pm
by deeverd
Hi Mark,
Thanks heaps. I will most certainly give your script a try. I've been away from the internet for the past few days, so my apologies for not responding sooner.
The workaround "solution" I mentioned last week worked quite fast, but it was only today that I realized the obvious glitch in it. While the intersect does indeed create a superfast list of matching text (without telling me how many items that each of those keys contain... yet), only this morning it dawned on me that using the list of matching text in a repeat loop to "replace myMatchText with empty in myArrayVar" only succeeded in replacing parts of some of the words.
For instance, if a word like "rainbow" was in the matchingText, but the plural of that same word was "rainbows" was in the manuscriptText, it would replace rainbow with empty but leave the "s." This meant that when I did my before and after word count of the manuscript to get a numbers only return of matches, "s" was still counted as a word, and so my count was definitely not accurate. So this is what it's like to be a programmer?! I guess I could probably try setting the match whole text to true, and see if that works, but if your script works, it would definitely be a lot more fast and efficient.
Thanks again. Looking forward to telling you how it went. Cheers, deeverd
Posted: Tue Jan 22, 2008 7:00 pm
by deeverd
Hi Mark,
I'm still only getting a return of the number of keys, but I'm not getting a return of the contents inside them.
I've put together a scaled down script of my code, so you can see what I'm basically doing, but it's still not working:
Code: Select all
on mouseUp
# Just showing the user it's busy:
set the cursor to watch
# This opens up the text file that will later be compared to the database list
answer file "Select a text file for input:"
if it is empty then exit mouseUp
put it into textFile
open file textFile for read
read from file textFile until eof
put it into myTextArray
close file textFile
# This is my total word count of the manuscript text:
put the number of words of myTextArray into field "beforeCounter"
# Now I'm formatting the manuscript into an array
replace space with comma in myTextArray
replace comma with tab & return in myTextArray
sort myTextArray
put myTextArray into field "wordList"
split myTextArray by return and tab
# This opens up the database list --
answer file "Select a database list for comparison:"
if it is empty then exit mouseUp
put it into dbFile
open file dbFile for read
read from file dbFile until eof
put it into myDatabaseArray
close file dbFile
# Formatting the list to become an array:
replace space with comma in myDatabaseArray
replace comma with tab & return in myDatabaseArray
split myDatabaseArray by return and tab
#Intersect takes place here:
intersect myTextArray with myDatabaseArray
# This is where I'm trying to count the contents inside the keys
# that are leftover after the intersect takes place.
put the number of words of the keys of myTextArray into myNrOfWords
put myNrOfWords into field "totalMatches"
end mouseUp
The return I receive in field "totalMatches" is still just the number of keys.
According to the documentation, the intersect command shouldn't be causing a problem or deleting anything out of the keys, but it must.
Any suggestions?
All the best, deeverd
Posted: Tue Jan 22, 2008 7:06 pm
by Mark
Hi Deeverd,
Have you actually tried the example I have earlier? It should work.
Best,
Mark
Posted: Tue Jan 22, 2008 7:32 pm
by deeverd
Hi Mark,
I tried it right away with my original code with no success, but haven't tried it with the reduced code I posted as an example. There's always the chance that something I had in the bigger script got in the way. So I'll give it a try in a couple hours when I can get back to my computer with my Revolution program on it. I'll let you know right away what I find.
Thanks, deeverd
Posted: Tue Jan 22, 2008 8:09 pm
by Mark
Deeverd,
You should not try your original code! You should try my code. Take my example and apply your own data to it. That's all.
Best,
Mark
Posted: Tue Jan 22, 2008 8:10 pm
by deeverd
Hi Mark,
I was able to hurry up and get to my computer at the start of lunch. I just tried the script you sent and did it exactly how you said, which I had honestly tried it verbatim before. Here's the basic code I'm using, and I've pasted it here so that it can easily be cut and pasted elsewhere to instantly be run:
Code: Select all
on mouseUp
# Just showing the user it's busy:
set the cursor to watch
# This opens up the text file that will later be compared to the database list
answer file "Select a text file for input:"
if it is empty then exit mouseUp
put it into textFile
open file textFile for read
read from file textFile until eof
put it into myTextArray
close file textFile
# This is my total word count of the manuscript text:
put the number of words of myTextArray into field "beforeCounter"
# Now I'm formatting the manuscript into an array
replace space with comma in myTextArray
replace comma with tab & return in myTextArray
sort myTextArray
put myTextArray into field "wordList"
split myTextArray by return and tab
# This opens up the database list --
answer file "Select a database list for comparison:"
if it is empty then exit mouseUp
put it into dbFile
open file dbFile for read
read from file dbFile until eof
put it into myDatabaseArray
close file dbFile
# Formatting the list to become an array:
replace space with comma in myDatabaseArray
replace comma with tab & return in myDatabaseArray
split myDatabaseArray by return and tab
#Intersect takes place here:
intersect myTextArray with myDatabaseArray
# This is where I'm trying to count the contents inside the keys
# that are leftover after the intersect takes place.
put the number of words of the keys of myTextArray into myNrOfWords
combine myTextArray by return and tab
put (the number of words of myTextArray) - myNrOfWords into myNrOfWords
split myTextArray by return and tab
# I added these other test fields, just to get a visual of the numbers being returned:
put myNrOfWords into field "totalMatches"
put myTextArray into field "field"
put the number of words of field "field" into field "finalCount"
end mouseUp
When I use the script
Code: Select all
put (the number of words of myTextArray) - myNrOfWords into myNrOfWords
what it returns is 0 because it is still returning only the number of keys (but not the number of contents inside each of those keys) and subtracting that same number from itself.
I don't understand why it's not working because it looks as if it should work perfectly. I still think the intersect must somehow knock out the contents. I just don't know.
all the best, deeverd
Posted: Wed Jan 23, 2008 12:32 am
by Mark
Hi Deeverd,
This part of your script...
Code: Select all
replace space with comma in myTextArray
replace comma with tab & return in myTextArray
sort myTextArray
put myTextArray into field "word List"
split myTextArray by return and tab
is not completely wrong, but I would use the following alternative:
Code: Select all
replace comma with space in myTextArray
replace space with cr in myTextArray
filter myTextArray without empty
sort myTextArray
put myTextArray into field "word List"
split myDatabaseArray by return and tab
At the end of your script, you have:
Code: Select all
put myTextArray into field "field"
Using my example, you have turned the variable myTextArray back into an array before you try to put it into a field. Don't split the variable if you have no need for an array and want to display its contents in a field. If you do need the array, split it, but combine it again before displaying the contents in a field.
Unfortunately, I STILL don't know what you want! From your first two messages, I understand you expect to get an array with keys and elements and you want to know the total number of words in the elements of the array:
What I really need to get is the amount of items/contents that remain inside each of those keys that are inside the keys of the new intersected array.
Unfortuntely, your approach results in one array with keys and empty arrays! What exactly do you expect to see in the elements of the array? What data are you using exactly and how and why would it result into an array with non-empty elements? :?
Best,
Mark
Posted: Wed Jan 23, 2008 1:19 am
by Mark Smith
Deeverd, can I just make sure I understand what you're trying to do?
This is what I think:
You have two lots of text, and you want to know how many words are common to both lots of text.
Is that right?
Best,
Mark Smith
Posted: Wed Jan 23, 2008 1:26 am
by Mark
Hi Mark,
I think that's right, but I wonder why Deeverd chose the approach he is using.
Mark
Posted: Wed Jan 23, 2008 1:37 am
by Mark Smith
On second thoughts, are you after the number of common words, and also the number of words unique to each text? If so, then I've modified your script:
deeverd wrote:
Code: Select all
on mouseUp
# Just showing the user it's busy:
set the cursor to watch
# This opens up the text file that will later be compared to the database list
answer file "Select a text file for input:"
if it is empty then exit mouseUp
put it into textFile
open file textFile for read
read from file textFile until eof
put it into myTextArray
close file textFile
# This is my total word count of the manuscript text:
put the number of words of myTextArray into field "beforeCounter"
# Now I'm formatting the manuscript into an array
replace space with comma in myTextArray
replace comma with tab & return in myTextArray
sort myTextArray
put myTextArray into field "wordList"
split myTextArray by return and tab
-- here, duplicate myTextArray so you have a copy that won't be
--changed by the intersect
put myTextArray into origTextArray
# This opens up the database list --
answer file "Select a database list for comparison:"
if it is empty then exit mouseUp
put it into dbFile
open file dbFile for read
read from file dbFile until eof
put it into myDatabaseArray
close file dbFile
# Formatting the list to become an array:
replace space with comma in myDatabaseArray
replace comma with tab & return in myDatabaseArray
split myDatabaseArray by return and tab
#Intersect takes place here:
intersect myTextArray with myDatabaseArray
-- at this point, the keys of myTextArray are the common words, so:
put the number of lines in the keys of myTextArray into numCommon
put the number of lines in the keys of origTextArray - numCommon into numUniqueText
put the number of lines in the keys of myDataBaseArray - numCommon into numUniqueDataBase
....
Hope I've understood
Best,
Mark Smith
Posted: Wed Jan 23, 2008 4:50 pm
by deeverd
Hello Both Marks,
Thanks big time. I know I'm on the forum quite a bit (only, however, after hours of fruitless struggling beforehand), but it's still quite humbling each time to receive the help and consideration of strangers who are experts in programming in other parts of the world. On the other hand, there's nothing else like it...
Now to answer some of your questions. I have cut and pasted each of your scripts and tried them more than once to make sure I know exactly what they're doing, and so now I find that I am getting some interesting returns on figures that I wasn't really after but they may prove to be helpful nonetheless, because I hadn't thought about those returns before. For instance, your script now creates a return of the number of keys that are unique to both the text array and the database array.
Here's what I'm actually trying to get a return on:
Let's say you took any book that was in text format. Let's say that book was "Robinson Crusoe." Now let's say you have a database that contains a big list of island words as an example. If the database contained the word "shipwreck" and a match was found in the manuscript, I need to know not only how many unique words from the manuscript matched with words in the database, but also how many times that each of those matchwords occured in the manuscript altogether.
I'm thinking that the problem so far has been that I mistakenly thought it was possible in one line of script to receive a return on all the words found in all the keys of an array. I now think I have to ask for each of those match words by name with brackets to get an actual return of the amount of contents that are found in each key. I'm not exactly sure how to call out those contents or the number of their contents, but I'm thinking it would be something like
Code: Select all
put myTextArray[myWord] after field "matchResults"
and then I could quickly get rid of the delimiters and receive a return on the number of words that were placed in the field "matchResults."
I'm thinking that if I use a return statement with a counter, I could put each of the match words from the intersect into the variable myWord, one at a time, to get the contents out of each key of myTextArray. There's a big possibility that by using your idea to create a copy of the manuscript array before the intersect, I can get the keys out of the copy.
I'll let everyone know as soon as I can give it a try, which will be later in the day.
Oh yeah, there was the question of why I am using this approach. I've only been programming with Revolution for 11 months now (with no prior programming experience), so that probably answers a lot of that question right there. However, I spend more hours programming a week than anyone I know and have built scores of programs, so I've gotten a lot of experience in that limited time. As for this approach for this program, I've been able to successfully accomplish what I'm after in various other ways but they took way too long to get the results (sometimes up to 5 minutes with small manuscripts of less than 3,000 words.) Until I discovered the "intersect" command, it was way too slow to be practical in a big program that does lots of other things.
Anyway, I hope that all sheds some light on the madness of my method. Cheers, deeverd