Processing CSV Files

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
wlaughto
Posts: 30
Joined: Fri Apr 07, 2017 5:06 am

Processing CSV Files

Post by wlaughto » Mon May 15, 2017 4:48 am

I am a noob to LC and am trying to process a CSV file. I have read several topics in the tutorials on the subject as well as the excellent 'rant' by Richard Gaskin. I have used a version of his CSV processing function to put my CSV file thru his 'Laundry' which seems to work well and populates my Data Grid without problem.

The issue I have is that the CSV file has some unprintable characters embedded into the data which I would like to strip out. When I pass the "Laundered" CSV file to my function to strip out the unwanted characters and try to then put that data into the Data Grid, it down no display anything.

The code below shows what I am doing.

Everything works well up until the point where I call by 'fixfile' function (with is now coded to effectively just copy the file putting a CR at the end of each line). When I display the Laundered file with

Code: Select all

set the dgText[true] of group "text" to tCSVLaundry
the data appears in the Grid no problem. The issue occurs when the fixfile function is called. The data grid is blanked out.

I have included a test3.csv file that I am using.

Code: Select all

local tFileName, tFileContents, tColumnTitles, tStackName, tCSVLaundry, tRecCount, tFileTemp

on OpenStack
   put the title of this stack into tStackName
end OpenStack

on mouseUp
   
   set the title of this stack to tStackName
   set the dgData of group "text" to empty
   set the dgProp[ "columns" ]  of group "text" to empty
   put empty into tFileTemp
   put empty into tCSVLaundry
   answer file "Please choose a file to import" with type "Comma Separated Values|csv|CSV"
   if the result is not "cancel" then
      put it into tFileName
      put the title of this stack into tStackName
      set the title of this stack to it
      ## Get the contents of the file
      put URL ("file:" & tFileName) into tFileContents
      put line 1 of tFileContents into tColumnTitles    ## Creates Columns from the contents of the CSV's first line 
      replace comma with return in tColumnTitles
      set the dgProp["columns"] of group "text" to tColumnTitles
      #Clean the csv file and replace commas with tabs
      put CSVToTab4(tFileContents) into tCSVLaundry
      set the dgText[true] of group "text" to tCSVLaundry
      put fixfile(tCSVLaundry) into tFileTemp
      set the dgText[true] of group "text" to tFileTemp
   end if
end mouseUp

function fixfile pText
   local tNewList, tLine
   put empty into tNewlist	
   repeat for each line tLine in pText
      #do some stuff to strip out unprintable characters and put line back
       put tLine and return after tNewlist
  end repeat
   return  tNewList
end fixfile

function CSVToTab4 pData, pOldLineDelim, pOldItemDelim, pNewCR, pNewTAB 
    -- fill in defaults g
    if pOldLineDelim is empty then put CR into pOldLineDelim 
    if pOldItemDelim is empty then put COMMA into pOldItemDelim 
    if pNewCR is empty then put numtochar(11) into pNewCR   
    -- Use <VT> for quoted CRs 
    if pNewTAB is empty then put numtochar(29) into pNewTAB
    -- Use <GS> (group separator) for quoted TABs 

    local tNuData                         -- contains tabbed copy of data 

    local tStatus, theInsideStringSoFar 

    -- Normalize line endings: REMOVED 
    -- Will normaly be correct already, only binfile: or similar chould make this necessary 
    -- and that exceptional case should be the caller's responsibility 

    put "outside" into tStatus 
    set the itemdel to quote 
    repeat for each item k in pData 
       -- put tStatus && k & CR after msg 
       switch tStatus 

          case "inside" 
             put k after theInsideStringSoFar 
             put "passedquote" into tStatus 
             next repeat 

          case "passedquote" 
             -- decide if it was a duplicated escapedQuote or a closing quote 
             if k is empty then   -- it's a duplicated quote 
                put quote after theInsideStringSoFar 
                put "inside" into tStatus 
                next repeat 
             end if 
             -- not empty - so we remain inside the cell, though we have left the quoted section 
             -- NB this allows for quoted sub-strings within the cell content !! 
             replace pOldLineDelim with pNewCR in theInsideStringSoFar 
             replace TAB with pNewTAB in theInsideStringSoFar 
             put theInsideStringSoFar after tNuData 

          case "outside" 
             replace pOldItemDelim with TAB in k 
             -- and deal with the "empty trailing item" issue in Livecode 
             replace (pNewTAB & pOldLineDelim) with pNewTAB & pNewTAB & CR in k 
             put k after tNuData 
             put "inside" into tStatus 
             put empty into theInsideStringSoFar 
             next repeat 
          default 
             put "defaulted" 
             break 
       end switch 
    end repeat 

    -- and finally deal with the trailing item isse in input data 
    -- i.e. the very last char is a quote, so there is no trigger to flush the 
    --        last item 
    if the last char of pData = quote then 
       put theInsideStringSoFar after tNuData 
    end if 

    return tNuData 
end CSVToTab4 
[code]
[/code]

MaxV
Posts: 1580
Joined: Tue May 28, 2013 2:20 pm
Contact:

Re: Processing CSV Files

Post by MaxV » Mon May 15, 2017 12:07 pm

Try this fix:
########CODE to copy and paste#######
function fixfile pText
repeat for each char tChar in pText
if charToNum(tChar) < 32 then #It's a not printable char!!!
switch tChar
case TAB
case CR
case LF
put tChar after tNewList
break
default
put empty after tNewList
break
end switch
end if
end repeat
return tNewList
end fixfile
#####END OF CODE generated by http://tinyurl.com/j8xf3xq with livecode 9.0.0-dp-6#####
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Processing CSV Files

Post by FourthWorld » Mon May 15, 2017 3:40 pm

wlaughto wrote:The issue I have is that the CSV file has some unprintable characters embedded into the data which I would like to strip out. When I pass the "Laundered" CSV file to my function to strip out the unwanted characters and try to then put that data into the Data Grid, it down no display anything.
What is the encoding of the source file?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

wlaughto
Posts: 30
Joined: Fri Apr 07, 2017 5:06 am

Re: Processing CSV Files

Post by wlaughto » Tue May 16, 2017 12:37 am

MaxV wrote:Try this fix:
No sadly this does not fix the issue. Same result as previously.

wlaughto
Posts: 30
Joined: Fri Apr 07, 2017 5:06 am

Re: Processing CSV Files

Post by wlaughto » Tue May 16, 2017 12:39 am

FourthWorld wrote:
wlaughto wrote:The issue I have is that the CSV file has some unprintable characters embedded into the data which I would like to strip out. When I pass the "Laundered" CSV file to my function to strip out the unwanted characters and try to then put that data into the Data Grid, it down no display anything.
What is the encoding of the source file?
I beleive it is ANSI - if I open the file with Notepad (Windows) and click Save As, the "ANSI" option is hilighted.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10052
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Processing CSV Files

Post by FourthWorld » Tue May 16, 2017 12:42 am

It may be ANSI, or that may simply be that program's default.

If you save a copy as ANSI, is it the same size?

And if it's ANSI, what are those non-printable characters?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: Processing CSV Files

Post by mwieder » Tue May 16, 2017 2:41 am

Possibly this?

Code: Select all

function fixfile pText
   local tNewList
   
   repeat for each char tChar in pText
      if charToNum(tChar) < 32 or tChar > "z" then #It's a not printable char!!!
         switch tChar
            case TAB
            case CR
            case LF
               put tChar after tNewList
               break 
            default
               put empty after tNewList
               break
         end switch
      else
         put tChar after tNewList
      end if
   end repeat
   return tNewList
end fixfile

wlaughto
Posts: 30
Joined: Fri Apr 07, 2017 5:06 am

Re: Processing CSV Files

Post by wlaughto » Tue May 16, 2017 11:48 pm

mwieder wrote:Possibly this?

Code: Select all

function fixfile pText
   local tNewList
   
   repeat for each char tChar in pText
      if charToNum(tChar) < 32 or tChar > "z" then #It's a not printable char!!!
         switch tChar
            case TAB
            case CR
            case LF
               put tChar after tNewList
               break 
            default
               put empty after tNewList
               break
         end switch
      else
         put tChar after tNewList
      end if
   end repeat
   return tNewList
end fixfile

Ah ha! Yes this seems to have fixed the issue thank you.

mwieder
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3581
Joined: Mon Jan 22, 2007 7:36 am
Contact:

Re: Processing CSV Files

Post by mwieder » Tue May 16, 2017 11:50 pm

Wonderful. Just a simple missing "else" clause there to handle the printable characters.

wlaughto
Posts: 30
Joined: Fri Apr 07, 2017 5:06 am

Re: Processing CSV Files

Post by wlaughto » Wed May 17, 2017 4:24 am

mwieder wrote:Wonderful. Just a simple missing "else" clause there to handle the printable characters.
Yes, but my problem was trying to deal with the unprintable characters line by line, instead of character by character - duh! A stupid mistake on my part, but we learn!

BTW Fourth World, the files were the same size, however I did save the file as an 'UTF-8', close the file, re-open and then 'save as' and the file type was UTF-8. So I would suggest that 'Save As' saves a file in it's current encoding.

Post Reply