Page 1 of 2
Randomize Lines of a Large Text File
Posted: Fri May 28, 2021 1:00 pm
by xoxiwe
hello everybody,
hope you're doing good.
by the way, I was trying to randomize the lines of large a text file (around 100 mb). so for the first time, I used this method:
Code: Select all
put empty into temp
answer file "Select a Text File:" with type "Text File|txt"
put it into ifile
put url ("file:" & ifile) into temp
set the itemdel to cr
sort lines of temp by random(number of lines of temp)
put temp into field "f1"
since it was taking a long time, and I never get the randomized output as the application was crashing after a while, I had to rewrite the code in a different following way:
Code: Select all
answer file "Select a Text File:" with type "Text File|txt"
put it into ifile
put url ("file:" & ifile) into temp
repeat the number of lines of temp
get random(the number of lines of temp)
put line it of temp & cr after temp2
delete line it of temp
end repeat
put temp2 into fld "f1"
put empty into temp
put empty into temp2
either way I couldn't get the desired output since the app was crashing after a while too.
if any of you can show me a way to do the procedure faster without app crash would be so much appreciated. in case someone wants to know whats inside my text file, here are a few samples:
5122487-simba2006
3080803-fantasia10
12516196-19diver70
7238503-kreidler1
1671514-yamaha
Re: Randomize Lines of a Large Text File
Posted: Fri May 28, 2021 2:00 pm
by dunbarx
Hi.
Both handlers use time-tested methods to do what you want. You have several unnecessary lines, but they should not impact the process.
Here is what I suspected, and tested. On a new card I made two fields and a button . I put a dozen short lines of text into fld 1, and this in the button script:
Code: Select all
on mouseUp
get fld 1
repeat 1000
put it & return after temp
end repeat
sort temp by random(the number of lines of temp)
put temp into fld 2
end mouseUp
Running this, I got (on Mac) the beachball of death, but it shortly resolved itself and the randomized text appeared in fld 2. The large dataset you are working with just probably takes quite a while. It is the "sort" line that the program is unsure of. The OS thinks the program has fallen into an infinite loop, and responds with the beachball.
LC has not "crashed", it is just working hard.
Try your first handler again, but simply place a number in the random "seed", something like "sort whatever by random(9999). This should fix it.
Craig
Re: Randomize Lines of a Large Text File
Posted: Fri May 28, 2021 2:19 pm
by dunbarx
Playing around with this, LC seems indeed to slow to a crawl with a couple of hundred thousand lines of original text.
Anyone know anything about the working limits of this sort of thing?
Craig
Re: Randomize Lines of a Large Text File
Posted: Fri May 28, 2021 5:54 pm
by FourthWorld
There are likely ways to serve the goal of this app quite efficiently, once we know what the goal is.
I'm guessing the user is not expected to read 100 MBs of randomized text - is that correct?
If so, is displaying the text in a field necessary?
What is done with the list after it's randomized?
Re: Randomize Lines of a Large Text File
Posted: Fri May 28, 2021 10:11 pm
by SparkOut
1) the apparent "crash" - which is just lack of responsive interface - is because a tight repeat loop will hog all the processor cycles and not give the engine an opportunity to update the screen or poll the keyboard, etc.
Inside the repeat loop add the line
and that will give the engine a moment as close to 0 as possible that gives it a chance to do those housekeeping tasks.
2) the speed will not be improved by that wait statement, and handling a file of 100MB might not be the best approach to take, as Richard mentioned. What is the desired outcome? For what purpose are you doing the randomisation in a file that large?
Re: Randomize Lines of a Large Text File
Posted: Fri May 28, 2021 10:20 pm
by bogs
By the way, you don't have to be so stingy, give it a whole 10ms, which is a WHOPPING 0.01 of a second. Interestingly, I was running some polling tests on the mouse in a return loop, and found increasing the 'wait with messages' time sometimes actually *did* improve performance, and dropped the cpu usage quite a bit, which made the whole routine not only run better, but seem quite a bit snappier.
Re: Randomize Lines of a Large Text File
Posted: Sat May 29, 2021 3:17 pm
by dunbarx
All good stuff.
The beachball can be eliminated by giving the OS just a little relief.
But Richard makes the real point, that is, why do we need to randomize a 100Mb dataset? And read it as well? An old saw, "if you are dealing with something outlandish, rethink from the beginning".
So then, what is desired and required?
Craig
Re: Randomize Lines of a Large Text File
Posted: Sat May 29, 2021 4:10 pm
by xoxiwe
thanks to all of you who took your valuable time for me to help.
btw, who are interested to know of my purpose...actually the data in that large text file are categorized...say for example, first 1/3 are cat 1, then 2/3 cat 2, and 3/3 is cat 3...but I don't want it like that...I want it to mix up all. therefore repeat for 1000 or like that wont give me my desired result as I want even the last line will have the chance to be the first or second or third line and like that.
if anyone can provide me some idea so my OS (Windows 10) can handle that large data without crash would be so much appreciated

Re: Randomize Lines of a Large Text File
Posted: Sat May 29, 2021 7:59 pm
by FourthWorld
The Bible is roughly 4MB of text. Your program is putting the equivalent of 25 bibles into a field for display. Given both the length and the nature of your text, I'm guessing you don't expect the user to read it.
Does it need to be displayed in a field at all?
Right off the bat you'll save significant time by not doing that. Field rendering requires a great many steps under the hood, all the way down to antialiased font rendering for every character.
If you would kindly tell us what your program does with the randomized text, we may be able to speed it up even more, far beyond the gains from bypassing unneeded field rendering.
But the methods for doing so are varied, so it won't be possible to provide useful guidance until we know how the text is used.
Re: Randomize Lines of a Large Text File
Posted: Sat May 29, 2021 8:54 pm
by SparkOut
What Richard said.
Just for info, I made a dummy file with a repeat loop that created a series of lines "This is line" && line number & cr
It took 5 million iterations to make a file over 100MB. While there are certainly sources that create that much data in a single file, operating over millions of data lines is not going to be optimal. The question is why do you need to process it, to what end?
Re: Randomize Lines of a Large Text File
Posted: Sun May 30, 2021 7:17 pm
by xoxiwe
well, my program checks every line in a text file based on some given condition, and gets me the lines according to the condition given
what I need is to randomize the lines so the chance to get the expected lines faster since the lines are categorized and I don't know from which line a new categorized record starts.
Also I forgot to mention that showing the outcome into a field doesnt necessary.
I have already made a tool using livecode which can reverse all the lines within short period of time, and it's the perfect one when I go for a reverse. what I need now is to shuffle the lines so the middle ones can have the chance to go upper.
I understand if it's somewhere complex and may not be to my expectation since handling such large file isn't so easy and might cause crash.
I'm also sorry to you all who have spent your valuable time after me. Thanks to you all

Re: Randomize Lines of a Large Text File
Posted: Sun May 30, 2021 8:33 pm
by jacque
I'd use an array, which is very fast and supports huge numbers of entries. The method would be to convert the data to an array, sort the keys randomly, and then check each key in its sorted order to see if the info you want is the key's element. This is one way:
Code: Select all
local sData -- array of all data
local sRandomList -- list of randomized keys
on mouseUp
put makeData(fld 1) into sData -- here you would import your data instead of creating test data
randomizeTxt sData -- this creates a randomly sorted list of keys
end mouseUp
on randomizeTxt
split sData by cr
put keys(sData) into tKeys
sort tKeys by random(the number of lines in tKeys)
put tKeys into sRandomList
end randomizeTxt
function makeData -- create false test data; you won't need this
get fld 2
repeat 100
put it & return after temp
end repeat
return temp
end makeData
If your data uses unique leading numbers, you could split the data by cr and "-". You probably wouldn't need to sort the keys in that case since the numbers in your example look pretty random to me already.
To search the array for specific info:
Code: Select all
on searchArray pText
repeat for each key k in sRandomList
if sData[k] = pText then return sData[k]
end repeat
end searchArray
Re: Randomize Lines of a Large Text File
Posted: Mon May 31, 2021 11:25 pm
by jiml
You might want to consider using a database.
Re: Randomize Lines of a Large Text File
Posted: Tue Jun 01, 2021 1:25 pm
by AxWald
Hi,
guess it's sorting a big bunch of data that's the problem here. So just avoid sorting a big bunch of data ;-)
This looked interesting, so I played with it a bit. See my results:
To avoid having to sort huge amounts of data I create an array with the line numbers as keys at first, and add 2 random values to each key. Then I sort this array 2 times, by each of the values. This gives a list containing each possible line number:
Code: Select all
function getLiNus theNum
/* Creates a list containing all numbers from 1 to "theNum",
in randomized order, each on its line */
put 1 into myCnt
repeat theNum
put random(99999) into myArr[myCnt][1] -- create sort array
put random(99999) into myArr[myCnt][2]
add 1 to myCnt
end repeat
get the keys of myArr
sort lines of it numeric by myArr[each][1] -- sort it
sort lines of it numeric by myArr[each][2]
return it
end getLiNus
This is sufficiently fast, and sufficiently randomized (you might play with the number of keys, and the random(x) value). And this is how you utilize it:
Code: Select all
on mouseUp
answer file "Which file?"
if it is empty then exit mouseUp
put URL ("file:" & it) into myData
put 1 into myCnt
repeat for each line L in myData -- create data array
put L into myDatArr[myCnt]
add 1 to myCnt
end repeat
put getLiNus(the number of lines of myData) into mySort -- fetch the sort list
repeat for each line L in mySort -- and apply it
put myDatArr[L] & CR after myRes
end repeat
delete char -1 of myRes
put myRes
end mouseUp
Hint: For easy access this code is stripped of all not really necessary. For instance, each repeat should start with:
Code: Select all
wait 0 millisec with messages
if the controlKey is down then exit repeat
A problem I ran in is that it becomes cumbersome to fetch the single lines of our text data - once it's a really phat chunk with thousands of lines. So I start with throwing all lines of our data into an array first, with [lineNumber] as key. This way I can access them much faster later, once I have my sort order :)
For a file with ~55MB & ~1.000.000 lines I get (millisecs):
Code: Select all
Make data array: 4757 | Rearrange lines of data: 5570
Make sort array: 6295 | Sort sort array: 3924
Over all: 20867
Lines: 1000436
Remark: Even with "wait 0 with messages" this still runs into "unresponsiveness" during the last loop (rearranging the lines).
Anyway, perhaps something here is useful for someone. Have fun!
Re: Randomize Lines of a Large Text File
Posted: Tue Jun 01, 2021 5:56 pm
by FourthWorld
xoxiwe wrote: ↑Sun May 30, 2021 7:17 pm
well, my program checks every line in a text file based on some given condition, and gets me the lines according to the condition given
In my reading that sounds like what you need is nearly the opposite of random, something very specific.
What is done with the line after it is retrieved?
And how often is the list file updated?