Page 1 of 1

Import CSV file with multi-line records

Posted: Sun Mar 06, 2011 6:40 pm
by pfmartin
Hi there,

I am trying to load a CSV file that has a bit of a strange format. The format looks like this:

Code: Select all

name, description
"The Name", "The Description which has 
multiple 
lines
like this."
"The 2nd Name", "The 2nd Description which
also has
multiple lines
enclosed in quotes"
etc.
The problem I have is pretty obvious. If I try to reference the 2nd line of the CSV file, it only grabs:

Code: Select all

"The Name", "The Description which has 
Is there some way to make sure that the parsing of the text takes into account that there is text enclosed in double quotes that span across several lines?

Thanks for any help you can provide!

Re: Import CSV file with multi-line records

Posted: Sun Mar 06, 2011 7:49 pm
by Dixie
Hi...

Does this help ? ... your CSV data in fld 1 ...

Code: Select all

on mouseUp
   put 1 into myVarLines
   repeat with count = 1 to the number of lines of fld 1
      
      if the last char of line count of fld 1 <> quote then 
         put line count of fld 1 & space after line myVarLines of myVar
         next repeat
      end if
      
      if the last char of line count of fld 1 = quote then
         put line count of fld 1 after line myVarLines of myVar
         add 1 to myVarLines
      end if
      
   end repeat
   
   put myVar into fld 2
end mouseUp
be well

Dixie

Re: Import CSV file with multi-line records

Posted: Sun Mar 06, 2011 8:01 pm
by wsamples
There are certainly many ways to do this, some more elegant than others, and the best method may depend on your ultimate goals. For example, when you say you want the second line, do you mean you want the item beginning "The 2nd name"? Are you wanting to retrieve the line for display? Will you want to preserve the line break formatting within the retrieved item?

If the first assumption is correct and you want to retrieve or process the line but do not need to preserve the linebreaks within each item, you could delete the linebreaks, which will butt the distinct items together with consecutive quotation marks (...end of item 1""start of item 2...) which you can replace with 'quote & return & quote' which will put each item on its own line. (This will not affect the "," pattern distinguishing minor elements of the item.)

If you want to be able to present the retrieved item with the linebreaks, you can do that by replacing them with some distinctive marker instead of deleting them in step one. When you replace the ' quote & <your marker> & quote ' with return in step two, the marker will still be in place within each item and can be replaced by a return when you display the final result.

This is the most obvious and "blunt instrument" approach. It should help give you some ideas, although I'm sure someone will come along and reveal some piece of logic which will be a marvel of elegance and make me feel ashamed :D

Re: Import CSV file with multi-line records

Posted: Mon Mar 07, 2011 12:56 am
by pfmartin
Thank you both for such a quick reply!

I tried Dixie's solution and it works beautifully. It's exactly what I needed, even though I am not quite sure I understand how it works just yet ;-) But it definitely solved the problem!!

Thank you very much!

Dixie wrote:

Code: Select all

on mouseUp
   put 1 into myVarLines
   repeat with count = 1 to the number of lines of fld 1
      
      if the last char of line count of fld 1 <> quote then 
         put line count of fld 1 & space after line myVarLines of myVar
         next repeat
      end if
      
      if the last char of line count of fld 1 = quote then
         put line count of fld 1 after line myVarLines of myVar
         add 1 to myVarLines
      end if
      
   end repeat
   
   put myVar into fld 2
end mouseUp

Re: Import CSV file with multi-line records

Posted: Mon Mar 07, 2011 3:37 pm
by pfmartin
Hey guys,

As a follow up, I found a more inclusive function for processes CSV files on the mailing list. I post it here just in case it helps someone:

Code: Select all

function CSV2Tab3 pData 
   local tNuData  -- contains tabbed copy of data 
   local tReturnPlaceholder -- replaces cr in field data to avoid line breaks which 
   -- would be misread as records; replaced later during display 
   local tEscapedQuotePlaceholder -- used for keeping track of quotes in data 
   local tInQuotedText -- flag set while reading data between quotes 
   put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder 
   put numtochar(2)  into tEscapedQuotePlaceholder -- used to simplify distinction between 
   -- quotes in data and those in delimiters 
   -- 
   -- Normalize line endings: 
   replace crlf with cr in pData          -- Win to UNIX 
   replace numtochar(13) with cr in pData -- Mac to UNIX 
   -- 
   -- Put placeholder in escaped quote (non-delimiter) chars: 
   replace ("\"&quote) with tEscapedQuotePlaceholder in pData 
   replace quote&quote with tEscapedQuotePlaceholder in pData --<NEW 
   -- 
   put space before pData   -- to avoid ambiguity of starting context 
   split pData by quote 
   put False into tInsideQuoted 
   put 0 into tCounter 
   repeat for each line tKey in (the keys of pData) 
      add 1 to tCounter 
      put pData[tCounter] into k 
      if (tInsideQuoted) then 
         replace cr with tReturnPlaceholder in k 
         put k after tNuData 
         put False into tInsideQuoted 
      else 
         replace comma with tab in k 
         put k after tNuData 
         put true into tInsideQuoted 
      end if 
   end repeat 
   -- 
   delete char 1 of tNuData -- remove the leading space 
   replace tEscapedQuotePlaceholder with quote in tNuData 
   return tNuData 
end CSV2Tab3 
I cannot post the link to the original mailing list thread because the forums are not letting me!

Thanks to the author(s) for their contribution!

Best wishes.

Re: Import CSV file with multi-line records

Posted: Tue Mar 08, 2011 12:56 am
by shaosean
This handles multi-lines, quotes inside of quotes and pretty much anything else messed up in CSV..

Re: Import CSV file with multi-line records

Posted: Tue Mar 08, 2011 1:15 am
by mwieder
...and when you're dealing with csv files, anything that can be messed up will be. And some things that can't, as well.

Re: Import CSV file with multi-line records

Posted: Tue Mar 08, 2011 3:26 pm
by FourthWorld
CSV is the devil. It needs to die, but there are apparently enough programmers out there who hate humanity that it's still alive, so we have to write CSV parsers for it. Serious drag.

After several long discussions on this on the use-livecode list Alex Tweedly came up with a solution which is perhaps most complete and fastest of the ones I've evaluated - it's described here, with a commonly-used sample data set that covers most of the anomalies inherent in the format, a link to the code itself, and another link that may be worth reading if you really want to appreciate just how stupid the CSV "format" is:
http://lists.runrev.com/pipermail/use-l ... 53227.html