Page 1 of 1

CSV'ed off

Posted: Wed Aug 28, 2019 8:59 am
by richmond62
As one does, I downloaded the output from an online linguistic data analysis program in .csv format . . .

I then imported the file into a Scrolling List Field like this:

Code: Select all

on mouseUp
   set the ink of me to srcCopy
   answer file "Choose a TEXT file to import"
   if the result = "cancel" 
   then exit mouseUp
   else
      set the text of fld "List1" to URL ("file:" & it)
   end if
end mouseUp
working on the assumption (dangerous) that the default itemDelimiter was the comma (","), and
got thoroughly cheesed-off because, while the .csv data was imported it got chopped up in
an odd sort of way (see picture).
-
Screenshot 2019-08-28 at 10.59.22.png
-
For instance, line 1 of the data file contains the column headers
like this:

Key Item Type Publication Year Author

Now, as far as I can see that should all be imported into Line 1 of my field with commas something like this:

Key, Item Type, Publication Year, Author,


BUT what I got was something like this:

"Key","Item Type","Publication
Year","Author"


So, inevitable questions crop up:

1. What are all those double quotes doing there?

2. Why did LiveCode 'decide' to split line one in the middle of "Publication Year" ?

#2 is particularly problematic.

Re: CSV'ed off

Posted: Wed Aug 28, 2019 9:17 am
by FourthWorld

Re: CSV'ed off

Posted: Wed Aug 28, 2019 9:38 am
by richmond62
I can assure you I have NO stake in writing a CSV exporter.

The problem is that MOST linguists know as much about file formats and so on as the
woman who cleans the toilet in my school: while she is a good cleaner she spends here
spare time going to classical music concerts (which is just fine), and not worrying about
suffixes on the end of documents random academics might happen to bung her way. 8)

I can, of course, spend an awful lot of time writing a "document chopper" that does
a sort of exotic dance round possible item delimiters to import data from CSV files
and make it behave the way linguists might find useful . . .

. . . But, as my late father (who is probably somewhere in the cosmos
splitting his sides at how goofy his son is getting all sweaty about CSV files)
used to say, "I love banging my head on the wall, because it feels so great when I stop."

Re: CSV'ed off

Posted: Wed Aug 28, 2019 11:44 am
by bogs
richmond62 wrote:
Wed Aug 28, 2019 9:38 am
I can assure you I have NO stake in writing a CSV exporter.
Actually, if you go to the link he posted, you would find you have no reason to create one, he actually did that work for you :wink:
The link... wrote: To illustrate the complexity inherent in such an algorithm, here's a LiveCode function to translate CSV into a simple tab-delimited format, courtesy of Alex Tweedly via Mike Kerner's Github repository. LiveCode makes a good example here, because it's readable enough that programmers familiar with nearly anything else can probably follow it well enough:

Code: Select all


    function CSVToTab pData, pOldLineDelim, pOldItemDelim, pNewCR, pNewTAB
       -- v 3 -> 4   put back in replace TAB with pNewTAB in 'passedquote'
       -- v 4 -> 5   put in the two replace statements in the trailing empty case
          -- fill in defaults
          if pOldLineDelim is empty then put CR into pOldLineDelim
          if pOldItemDelim is empty then put COMMA into pOldItemDelim
          if pNewCR is empty then put numtochar(11) into pNewCR   -- Use  for quoted CRs
          if pNewTAB is empty then put numtochar(29) into pNewTAB      -- Use  (group separator) for quoted TABs
          --
          local tNuData                         -- contains tabbed copy of data
          local tStatus, theInsideStringSoFar
          --
          put "outside" into tStatus
          set the itemdel to quote
          repeat for each item k in pData
                -- put tStatus && k & CR after msg
                switch tStatus
                      case "inside"
                            put k after theInsideStringSoFar
                            put "passedquote" into tStatus
                            next repeat
                      case "passedquote"
                            -- decide if it was a duplicated escapedQuote or a closing quote
                            if k is empty then   -- it's a duplicated quote
                                  put quote after theInsideStringSoFar
                                  put "inside" into tStatus
                                  next repeat
                            end if
                            -- not empty - so we remain inside the cell, though we have left the quoted section
                            -- NB this allows for quoted sub-strings within the cell content !!
                            replace pOldLineDelim with pNewCR in theInsideStringSoFar
                            replace TAB with pNewTAB in theInsideStringSoFar
                            put theInsideStringSoFar after tNuData
                      case "outside"
                            replace pOldItemDelim with TAB in k
                            -- and deal with the "empty trailing item" issue in Livecode
                            replace (pNewTAB & pOldLineDelim) with pNewTAB & pNewTAB & CR in k
                            put k after tNuData
                            put "inside" into tStatus
                            put empty into theInsideStringSoFar
                            next repeat
                      default
                            put "defaulted"
                            break
                end switch
          end repeat
          --
          -- and finally deal with the trailing item isse in input data
          -- i.e. the very last char is a quote, so there is no trigger to flush the
          --      last item
          if the last char of pData = quote then
                      replace pOldLineDelim with pNewCR in theInsideStringSoFar
                      replace TAB with pNewTAB in theInsideStringSoFar
                put theInsideStringSoFar after tNuData
          end if
          --
          return tNuData
    end CSVToTab 

Re: CSV'ed off

Posted: Wed Aug 28, 2019 2:03 pm
by Mikey
and if you like, we have a repo on github with more goodies on this very topic.
https://github.com/macMikey/csvToText

Re: CSV'ed off

Posted: Wed Aug 28, 2019 4:18 pm
by bogs
Very neat stuff there Mikey!

Re: CSV'ed off

Posted: Wed Aug 28, 2019 6:01 pm
by jacque
2. Why did LiveCode 'decide' to split line one in the middle of "Publication Year" ?
That looks like text wrap to me. LC wraps at spaces when it can, and there are no spaces after that single one. Set dontwrap to true and see how it looks.