Page 1 of 2

Reading a CSV file (with commas in 1 of the fields)

Posted: Sat Jul 23, 2011 9:26 pm
by marksmithhfx
Does anyone have an example of how to read a CSV file (comma separated) that looks like this:


Introduction to stacks,Wk1b,Pg1
"Stacks (palettes, dialog boxes, windows)",Wk1b,Pg1
"Controls (push buttons, check boxes, text fields, menus)",Wk1b,Pg1
"Grouping controls into a set, background groups",Wk1b,Pg1
Background groups,Wk1b,Pg1


Thats 3 fields, with the first field sometimes containing commas, and if so the field is quoted (this is how Excel exports a CSV)
The above was copied from a custom property after:


if there is a file tSavePath then
put url ("file:" & tSavePath) into ttemp
end if
set the cCSVdata of this stack to ttemp

At this point the custom property contains the data shown above. Unfortunately saying "set the itemdelimiter to comma" splits up the
first field because LC is not seeing it as "quoted". I am going to have to do some manual processing (I think) but I am not sure where to begin.

Maybe someone has dealt with this before? I'd like to put the three fields into a table field (so I can later add a search function on the first field).

Any suggestions welcome.

Thanks

-- Mark

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sat Jul 23, 2011 10:40 pm
by FourthWorld

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 3:02 am
by marksmithhfx

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 3:10 am
by marksmithhfx
marksmithhfx wrote:Does anyone have an example of how to read a CSV file (comma separated) that looks like this:

Introduction to stacks,Wk1b,Pg1
"Stacks (palettes, dialog boxes, windows)",Wk1b,Pg1
"Controls (push buttons, check boxes, text fields, menus)",Wk1b,Pg1
"Grouping controls into a set, background groups",Wk1b,Pg1
Background groups,Wk1b,Pg1

-- Mark
While browsing around the runrev site I happened across the following, which worked perfectly

Step 1: create tab delimited data (One way to create tab delimited data is to choose "save as..." in Excel, and then select "text file (tab delimited)". It is simple to load data into Livecode from such a file:

answer file "" -- get the tab delimited filename
put url ("file:" & it) into theData -- load the data into a variable
put true into firstLineContainsColumnNames -- or false, depending on the situation
set the dgText[firstLineContainsColumnNames] of group "my DataGrid" to theData

And that was it, a datagrid (called DataGrid) displayed the lines of my tab delimited datafile.

Very slick.

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 3:13 pm
by FourthWorld
Tab-delimited is the way to go. CSV is sloppy, ill-defined, and must die.

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 3:27 pm
by Klaus
FourthWorld wrote:CSV is sloppy, ill-defined, and must die.
AMEN, Brother! 8)

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 6:14 pm
by FourthWorld
Klaus wrote:
FourthWorld wrote:CSV is sloppy, ill-defined, and must die.
AMEN, Brother! 8)
I finally got around to putting all my notes on this into one page, to provide a little Google fodder to help raise awareness:
http://www.fourthworld.com/embassy/arti ... t-die.html

I don't imagine we'll see the complete eradication of CSV in our lifetime, but if they can do it with polio maybe there's a chance this disease can be cured as well. ;)

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 7:02 pm
by wsamples
Could there be a Nobel Prize for this, as well?

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Sun Jul 24, 2011 9:28 pm
by Klaus
wsamples wrote:Could there be a Nobel Prize for this, as well?
Although I like the idea, I doubt! :D

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Mon Jul 25, 2011 2:03 pm
by marksmithhfx
FourthWorld wrote:Tab-delimited is the way to go. CSV is sloppy, ill-defined, and must die.

Excel (and word) also added some extraneous characters (and quotes) to the original data that needed cleaning up. I'm fairly new to LC so there may be better ways but here is how I did that:


on cleanupsaindex
-- first, strip leading/trailing quotes from the topic field
set itemdelimiter to tab
put the number of lines of thedata into thecount -- we loop for this many lines
put empty into thecopy -- prepare to copy the data
repeat with x = 1 to thecount
put item 1 of line x of thedata into ttopic -- check each topic entry
put item 2 of line x of thedata into tsource
put item 3 of line x of thedata into tlocation
if ttopic begins with quote then -- if a leading quote is found
put char 2 to -2 of ttopic into ttopic -- trim first and last char
end if
put ttopic & tab & tsource & tab & tlocation & return after thecopy -- rebuild the cleaned up data in thecopy
end repeat
put thecopy into thedata -- copy the cleaned up version back
-- here are a couple more cleanup tasks
replace quote & quote with quote in thedata -- replace double quotes with single quotes
replace "Ó" with quote in theData -- replacing wonky trailing quote chars from word
replace "Ò" with quote in theData -- replace wonky leading quote chars from word
end cleanupsaindex

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Mon Jul 25, 2011 6:58 pm
by mwieder
Mark- what you've got looks fine. You probably won't run into problems with this if your data set isn't too big, but as a rule you'll find that your repeat loops will run an order of magnitude faster if you use the "repeat for each" form of the repeat command:

Code: Select all

repeat for each line tLine in thedata
put item 1 of tLine into ttopic -- check each topic entry
put item 2 of tLine into tsource
put item 3 of tLine into tlocation
if ttopic begins with quote then -- if a leading quote is found
put char 2 to -2 of ttopic into ttopic -- trim first and last char
end if
put ttopic & tab & tsource & tab & tlocation & return after thecopy -- rebuild the cleaned up data in thecopy
end repeat

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Mon Jul 25, 2011 11:34 pm
by marksmithhfx
mwieder wrote:Mark- what you've got looks fine. You probably won't run into problems with this if your data set isn't too big, but as a rule you'll find that your repeat loops will run an order of magnitude faster if you use the "repeat for each" form of the repeat command:

Code: Select all

repeat for each line tLine in thedata
put item 1 of tLine into ttopic -- check each topic entry
put item 2 of tLine into tsource
put item 3 of tLine into tlocation
if ttopic begins with quote then -- if a leading quote is found
put char 2 to -2 of ttopic into ttopic -- trim first and last char
end if
put ttopic & tab & tsource & tab & tlocation & return after thecopy -- rebuild the cleaned up data in thecopy
end repeat
Order of magnitude works for me... thanks for the tip.

-- Mark

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Tue Jul 26, 2011 12:08 am
by marksmithhfx
While I was working on the datagrid I wanted to find a way to search the grid, sort of how the LC dictionary works. I created a field and building on what I had just learned from loading and cleaning the data and a nifty 1 line filter statement I picked up from somewhere (sorry I don't recall the source) I plugged this code into the field and it worked exactly like the dictionary. In fact, between the 4 lines of code I used to load the datagrid and these 5 lines to search it I pretty much have the dictionary upper right nailed. (I do wonder how they get the 3 different panels of the dictionary working together in the same window... but that'll be a challenge for another day)

on rawkeyup
put fld "search" into tsearch
put the cindex of this stack into thecopy -- I stuffed theData from previous post into a custom property for safe keeping between launches
filter thecopy with "*" & tsearch & "*"
put false into firstLineContainsColumnNames
set the dgText [ firstLineContainsColumnNames ] of group "DataGrid" to thecopy
end rawkeyup

on returninfield
-- just ignore it
end returninfield

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Wed Jul 27, 2011 2:49 pm
by marksmithhfx
Richard, mea culpa. In my rush to get a solution working last weekend I didn't take the time to traverse all of your embedded links. The extended "rant" on the evils of CSV was hilarious. I am suitably chastised for thinking that CSV might have been the way to go.... As penance I promise to never write a CSV exporter :-)

-- Mark

Re: Reading a CSV file (with commas in 1 of the fields)

Posted: Wed Jul 27, 2011 3:02 pm
by marksmithhfx
BTW, after reading Richard's examples of processing CSV I'm even more impressed with how easy it is to import a tab delimited file into a datagrid and display it in LC. For the record, here are 4 lines of code that will do that (the only prep is that you must drag a datagrid onto a card and specify the column names in the property inspector):

answer file ""
put url ("file:" & it) into theData
put true into firstLineContainsColumnNames -- or false, depending
set the dgText[firstLineContainsColumnNames] of group "DataGrid 1" to theData

Tip: if you have column names in your data (i.e. specify TRUE above) you can rearrange how LC displays the columns by specifying a different order in the property inspector. If your named columns are a, b, c and you specify c, b, a in the inspector, LC will display them in the order you specify. Pretty slick.