Page 1 of 1
Import CSV file with multi-line records
Posted: Sun Mar 06, 2011 6:40 pm
by pfmartin
Hi there,
I am trying to load a CSV file that has a bit of a strange format. The format looks like this:
Code: Select all
name, description
"The Name", "The Description which has
multiple
lines
like this."
"The 2nd Name", "The 2nd Description which
also has
multiple lines
enclosed in quotes"
etc.
The problem I have is pretty obvious. If I try to reference the 2nd line of the CSV file, it only grabs:
Code: Select all
"The Name", "The Description which has
Is there some way to make sure that the parsing of the text takes into account that there is text enclosed in double quotes that span across several lines?
Thanks for any help you can provide!
Re: Import CSV file with multi-line records
Posted: Sun Mar 06, 2011 7:49 pm
by Dixie
Hi...
Does this help ? ... your CSV data in fld 1 ...
Code: Select all
on mouseUp
put 1 into myVarLines
repeat with count = 1 to the number of lines of fld 1
if the last char of line count of fld 1 <> quote then
put line count of fld 1 & space after line myVarLines of myVar
next repeat
end if
if the last char of line count of fld 1 = quote then
put line count of fld 1 after line myVarLines of myVar
add 1 to myVarLines
end if
end repeat
put myVar into fld 2
end mouseUp
be well
Dixie
Re: Import CSV file with multi-line records
Posted: Sun Mar 06, 2011 8:01 pm
by wsamples
There are certainly many ways to do this, some more elegant than others, and the best method may depend on your ultimate goals. For example, when you say you want the second line, do you mean you want the item beginning "The 2nd name"? Are you wanting to retrieve the line for display? Will you want to preserve the line break formatting within the retrieved item?
If the first assumption is correct and you want to retrieve or process the line but do not need to preserve the linebreaks within each item, you could delete the linebreaks, which will butt the distinct items together with consecutive quotation marks (...end of item 1""start of item 2...) which you can replace with 'quote & return & quote' which will put each item on its own line. (This will not affect the "," pattern distinguishing minor elements of the item.)
If you want to be able to present the retrieved item with the linebreaks, you can do that by replacing them with some distinctive marker instead of deleting them in step one. When you replace the ' quote & <your marker> & quote ' with return in step two, the marker will still be in place within each item and can be replaced by a return when you display the final result.
This is the most obvious and "blunt instrument" approach. It should help give you some ideas, although I'm sure someone will come along and reveal some piece of logic which will be a marvel of elegance and make me feel ashamed

Re: Import CSV file with multi-line records
Posted: Mon Mar 07, 2011 12:56 am
by pfmartin
Thank you both for such a quick reply!
I tried Dixie's solution and it works beautifully. It's exactly what I needed, even though I am not quite sure I understand how it works just yet

But it definitely solved the problem!!
Thank you very much!
Dixie wrote:
Code: Select all
on mouseUp
put 1 into myVarLines
repeat with count = 1 to the number of lines of fld 1
if the last char of line count of fld 1 <> quote then
put line count of fld 1 & space after line myVarLines of myVar
next repeat
end if
if the last char of line count of fld 1 = quote then
put line count of fld 1 after line myVarLines of myVar
add 1 to myVarLines
end if
end repeat
put myVar into fld 2
end mouseUp
Re: Import CSV file with multi-line records
Posted: Mon Mar 07, 2011 3:37 pm
by pfmartin
Hey guys,
As a follow up, I found a more inclusive function for processes CSV files on the mailing list. I post it here just in case it helps someone:
Code: Select all
function CSV2Tab3 pData
local tNuData -- contains tabbed copy of data
local tReturnPlaceholder -- replaces cr in field data to avoid line breaks which
-- would be misread as records; replaced later during display
local tEscapedQuotePlaceholder -- used for keeping track of quotes in data
local tInQuotedText -- flag set while reading data between quotes
put numtochar(11) into tReturnPlaceholder -- vertical tab as placeholder
put numtochar(2) into tEscapedQuotePlaceholder -- used to simplify distinction between
-- quotes in data and those in delimiters
--
-- Normalize line endings:
replace crlf with cr in pData -- Win to UNIX
replace numtochar(13) with cr in pData -- Mac to UNIX
--
-- Put placeholder in escaped quote (non-delimiter) chars:
replace ("\""e) with tEscapedQuotePlaceholder in pData
replace quote"e with tEscapedQuotePlaceholder in pData --<NEW
--
put space before pData -- to avoid ambiguity of starting context
split pData by quote
put False into tInsideQuoted
put 0 into tCounter
repeat for each line tKey in (the keys of pData)
add 1 to tCounter
put pData[tCounter] into k
if (tInsideQuoted) then
replace cr with tReturnPlaceholder in k
put k after tNuData
put False into tInsideQuoted
else
replace comma with tab in k
put k after tNuData
put true into tInsideQuoted
end if
end repeat
--
delete char 1 of tNuData -- remove the leading space
replace tEscapedQuotePlaceholder with quote in tNuData
return tNuData
end CSV2Tab3
I cannot post the link to the original mailing list thread because the forums are not letting me!
Thanks to the author(s) for their contribution!
Best wishes.
Re: Import CSV file with multi-line records
Posted: Tue Mar 08, 2011 12:56 am
by shaosean
This handles multi-lines, quotes inside of quotes and pretty much anything else messed up in CSV..
Re: Import CSV file with multi-line records
Posted: Tue Mar 08, 2011 1:15 am
by mwieder
...and when you're dealing with csv files, anything that can be messed up will be. And some things that can't, as well.
Re: Import CSV file with multi-line records
Posted: Tue Mar 08, 2011 3:26 pm
by FourthWorld
CSV is the devil. It needs to die, but there are apparently enough programmers out there who hate humanity that it's still alive, so we have to write CSV parsers for it. Serious drag.
After several long discussions on this on the use-livecode list Alex Tweedly came up with a solution which is perhaps most complete and fastest of the ones I've evaluated - it's described here, with a commonly-used sample data set that covers most of the anomalies inherent in the format, a link to the code itself, and another link that may be worth reading if you really want to appreciate just how stupid the CSV "format" is:
http://lists.runrev.com/pipermail/use-l ... 53227.html