reading 1 line of text at a time

townsend · Post by **townsend** » Mon Jul 16, 2012 10:01 pm

Rather than read a text file into memory in one fell swoop with the

put url("file:monster.txt") into field "view"

I want to read one line at a time and selectively distribute lines to different controls.

This is just a test script to get the syntax right. The Dictionary nor the Lessons are clear about using the both the read from "file:filename" for 1 line at a time and the until EOF together. Maybe someone can tell me what's wrong with this code.

Code: Select all

on mouseUp
     local temp
     answer files "Select the files you wish to process:"
     put it into temp
     open file temp
     repeat until eof
          read from file temp for 1 line 
          put it into temp
          put temp after fld "view"
     end repeat
     close file temp
end mouseUp

dunbarx · Post by **dunbarx** » Mon Jul 16, 2012 10:26 pm

You may just want to read until "return". This should give you the file line by line.

HC did, and I think LC does also, maintain a pointer to the character last read from the same file. So you can read successive lines with separate invocations of your handler.

But it would be much faster and more robust to read the entire file, put it into a variable, and then process line by line in a single handler. Is this not feasible?

As I read the dictionary, the form "for 1 line" seem only to apply to something peculiar to Unix, unless I have this wrong.

Craig Newman

townsend · Post by **townsend** » Mon Jul 16, 2012 10:46 pm

But it would be much faster and more robust to read the entire file, put it into a variable, and then process line by line in a single handler. Is this not feasible?

Yes-- I've been doing that all along. I just thought, since it is a large file,
it would be more efficient to read the data in one line at a time,
there by never having two copies in memory at the same time.

Thanks for the reply Craig-- I'll take your advice and do it the easy way.
Preserving memory is probably a misplacement concern in this day and age.

mwieder · Post by **mwieder** » Mon Jul 16, 2012 10:53 pm

Nonetheless, this looks like a bug. I just tried the "until cr" form and it stays on the first line of the text file. I'm sure this used to work at some point in the past, and it's documented that the "read from file" command should resume from the last invocation if you don't specify a start offset, but it doesn't do that.

That said, I don't use this form, opting for the method Craig mentioned, reading the whole file in to a variable and then working with the variable. You'll find it's much faster that way as well since you don't have the overhead of multiple access to a disk file. You can also take advantage of the "repeat for each" loop, which will gain you another order of magnitude in speed.

sturgis · Post by **sturgis** » Mon Jul 16, 2012 11:08 pm

One problem I see is that you open a file and then put its handle into temp.
Then you read a line and.. put it into temp. Which means it would read the first line, shove it in to temp, then fail to read any more lines because temp no longer contains the file handle.

Also, I don't think you can "repeat until EOF" because it doesn't specify WHAT is EOF. What you need to do is read a line, check the result for "eof" if its eof set a flag to stop the repeat.

This is just a really quick ugly adjustment to your script but it seems to work ok.

Code: Select all

local sStop
on mouseUp
   if sStop is empty then put true into sStop -- flag var I used.  No reason to declare temp it persists through the single handler run
   put not sStop into sStop -- toggles so I can stop the loop
   if not sStop then -- if we're supposed to be running..
      put empty into field "View" -- empty the field
      answer file "Select the file you wish to process:" -- ask for the file
      put it into temp -- save file handle
      open file temp -- open the file handle
      repeat until sStop -- repeat until sStop is true
         read from file temp for 1 line -- read a line
         put it && the result & return before msg --can kill this line it is just so I could watch what was happening
         put it into tempLine -- put the read line into tempLine (NOT temp, as mentioned above)
         if the result is "eof" then put true into sStop -- check for EOF in the result and set sStop if need be
         put tempLine after fld "view" --put the line after the field
         wait 10 milliseconds with messages -- wait with messages so we aren't processor locked and can click again to toggle out. 
      end repeat
      close file temp -- will fire on eof or toggle out. 
   end if

end mouseUp

townsend wrote:Rather than read a text file into memory in one fell swoop with the
Code: Select all
put url("file:monster.txt") into field "view"
I want to read one line at a time and selectively distribute lines to different controls.

This is just a test script to get the syntax right. The Dictionary nor the Lessons are clear about using the both the read from "file:filename" for 1 line at a time and the until EOF together. Maybe someone can tell me what's wrong with this code.
Code: Select all
on mouseUp
     local temp
     answer files "Select the files you wish to process:"
     put it into temp
     open file temp
     repeat until eof
          read from file temp for 1 line 
          put it into temp
          put temp after fld "view"
     end repeat
     close file temp
end mouseUp

sturgis · Post by **sturgis** » Mon Jul 16, 2012 11:13 pm

Hey Mark, got a curiosity question..

It should be possible to do something like

repeat for each line tLine in URL "file:my/file/path.txt"
--process tLine
end repeat

But it would probably be deadly slow? Or does it slurp the whole file up on the start of the repeat and then grab the lines from memory?

mwieder · Post by **mwieder** » Mon Jul 16, 2012 11:20 pm

<too lazy to test> but my guess is that the URL "file:my/file/path.txt" part just takes the place of the temp file specification and you still end up with a read of a disk file with each iteration through the loop.

...and <slaps head> good sleuthing on both the overloading of the temp variable and the EOF semaphore. I completely missed the fact that temp was being mangled there. The following also works:

Code: Select all

on mouseUp
    local tFile, temp
     
    answer files "Select the file you wish to process:"
    put it into tFile
    open file tFile
    repeat until eof
        read from file tFile for 1 line 
        put it into temp
        if temp is empty then
            exit repeat
        end if
        put temp after fld "view"
    end repeat
    close file tFile
end mouseUp

sturgis · Post by **sturgis** » Mon Jul 16, 2012 11:25 pm

I might just have to do a comparison of the slurp, vs open file vs the repeat for each method. If repeat for each is fast enough when grabbing directly from a file it will become my new favorite method!

sturgis · Post by **sturgis** » Mon Jul 16, 2012 11:43 pm

Just did a quick test. I'm sure some improvement could be had with my file handling loop but it still shouldn't be THIS different. (reading line by line)

The file open, repeat through each lines method is first, the slurp the file up method is the second method, and the repeat for each hitting the url directly is the 3rd method. Here are the results (in milliseconds)

First method, file open: 9484
Second Method, slurp 1858
Third Method repeat for each url method: 1804
Number of lines in the file: 661

Still don't know if the whole URL "file:blahblah" is slurped up at the beginning of the repeat for each, but wow its fast. (this is on a slow hunk of a laptop or it would be zippier by far)

mwieder · Post by **mwieder** » Mon Jul 16, 2012 11:55 pm

Looks like the "repeat for each" url method is making a copy of the whole file in memory and applying the repeat construct to the in-memory copy. So basically those two attempts are the same code after the engine optimizes things and compiles down to bytecode. The readings are within a small margin of error, and I bet if you ran the benchmark ten times you'd see the results converge.

sturgis · Post by **sturgis** » Tue Jul 17, 2012 12:00 am

Yep, did just that and they do indeed converge. Or at least close to converging. Haven't run enough times to get a solid feel. So far the 3rd method does "seem" (statistics you know) to be faster more often by a slightly greater margin but I would have to flip the coins a BUNCH more times to get a real feel. Either way i'm sure you're right, it eats the whole file so I guess care would have to be taken when doing this with huge files.

LiveCode Forums.

reading 1 line of text at a time

reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time

Re: reading 1 line of text at a time