Processing CSV Files
Posted: Mon May 15, 2017 4:48 am
I am a noob to LC and am trying to process a CSV file. I have read several topics in the tutorials on the subject as well as the excellent 'rant' by Richard Gaskin. I have used a version of his CSV processing function to put my CSV file thru his 'Laundry' which seems to work well and populates my Data Grid without problem.
The issue I have is that the CSV file has some unprintable characters embedded into the data which I would like to strip out. When I pass the "Laundered" CSV file to my function to strip out the unwanted characters and try to then put that data into the Data Grid, it down no display anything.
The code below shows what I am doing.
Everything works well up until the point where I call by 'fixfile' function (with is now coded to effectively just copy the file putting a CR at the end of each line). When I display the Laundered file with
the data appears in the Grid no problem. The issue occurs when the fixfile function is called. The data grid is blanked out.
I have included a test3.csv file that I am using.
[/code]
The issue I have is that the CSV file has some unprintable characters embedded into the data which I would like to strip out. When I pass the "Laundered" CSV file to my function to strip out the unwanted characters and try to then put that data into the Data Grid, it down no display anything.
The code below shows what I am doing.
Everything works well up until the point where I call by 'fixfile' function (with is now coded to effectively just copy the file putting a CR at the end of each line). When I display the Laundered file with
Code: Select all
set the dgText[true] of group "text" to tCSVLaundry
I have included a test3.csv file that I am using.
Code: Select all
local tFileName, tFileContents, tColumnTitles, tStackName, tCSVLaundry, tRecCount, tFileTemp
on OpenStack
put the title of this stack into tStackName
end OpenStack
on mouseUp
set the title of this stack to tStackName
set the dgData of group "text" to empty
set the dgProp[ "columns" ] of group "text" to empty
put empty into tFileTemp
put empty into tCSVLaundry
answer file "Please choose a file to import" with type "Comma Separated Values|csv|CSV"
if the result is not "cancel" then
put it into tFileName
put the title of this stack into tStackName
set the title of this stack to it
## Get the contents of the file
put URL ("file:" & tFileName) into tFileContents
put line 1 of tFileContents into tColumnTitles ## Creates Columns from the contents of the CSV's first line
replace comma with return in tColumnTitles
set the dgProp["columns"] of group "text" to tColumnTitles
#Clean the csv file and replace commas with tabs
put CSVToTab4(tFileContents) into tCSVLaundry
set the dgText[true] of group "text" to tCSVLaundry
put fixfile(tCSVLaundry) into tFileTemp
set the dgText[true] of group "text" to tFileTemp
end if
end mouseUp
function fixfile pText
local tNewList, tLine
put empty into tNewlist
repeat for each line tLine in pText
#do some stuff to strip out unprintable characters and put line back
put tLine and return after tNewlist
end repeat
return tNewList
end fixfile
function CSVToTab4 pData, pOldLineDelim, pOldItemDelim, pNewCR, pNewTAB
-- fill in defaults g
if pOldLineDelim is empty then put CR into pOldLineDelim
if pOldItemDelim is empty then put COMMA into pOldItemDelim
if pNewCR is empty then put numtochar(11) into pNewCR
-- Use <VT> for quoted CRs
if pNewTAB is empty then put numtochar(29) into pNewTAB
-- Use <GS> (group separator) for quoted TABs
local tNuData -- contains tabbed copy of data
local tStatus, theInsideStringSoFar
-- Normalize line endings: REMOVED
-- Will normaly be correct already, only binfile: or similar chould make this necessary
-- and that exceptional case should be the caller's responsibility
put "outside" into tStatus
set the itemdel to quote
repeat for each item k in pData
-- put tStatus && k & CR after msg
switch tStatus
case "inside"
put k after theInsideStringSoFar
put "passedquote" into tStatus
next repeat
case "passedquote"
-- decide if it was a duplicated escapedQuote or a closing quote
if k is empty then -- it's a duplicated quote
put quote after theInsideStringSoFar
put "inside" into tStatus
next repeat
end if
-- not empty - so we remain inside the cell, though we have left the quoted section
-- NB this allows for quoted sub-strings within the cell content !!
replace pOldLineDelim with pNewCR in theInsideStringSoFar
replace TAB with pNewTAB in theInsideStringSoFar
put theInsideStringSoFar after tNuData
case "outside"
replace pOldItemDelim with TAB in k
-- and deal with the "empty trailing item" issue in Livecode
replace (pNewTAB & pOldLineDelim) with pNewTAB & pNewTAB & CR in k
put k after tNuData
put "inside" into tStatus
put empty into theInsideStringSoFar
next repeat
default
put "defaulted"
break
end switch
end repeat
-- and finally deal with the trailing item isse in input data
-- i.e. the very last char is a quote, so there is no trigger to flush the
-- last item
if the last char of pData = quote then
put theInsideStringSoFar after tNuData
end if
return tNuData
end CSVToTab4
[code]