(lispkit csv)
Last updated
Last updated
Library (lispkit csv)
provides a simple API for reading and writing structured data in CSV format from a text file. The API provides two different levels of abstraction: reading and writing at
line-level (lower-level API), and
record-level (higher-level API).
A text file in CSV format typically has the following structure:
The first line is called the header. It defines the names and the order of the columns. Columns are separated by a separator character (which is ,
in the example above). The column names can optionally be wrapped by a quotation character, which is needed if the name contains, for instance, the separator character.
Each following line defines one data record which provides values for the columns defined in the header. The values are again separated by the separator character and they may be optionally wrapped by the quotation character. If a value is wrapped with a quotation character, the same character can be used within the value if it is escaped. The quotation character can be escaped by a sequence of two quotation characters (e.g. if "
is used as a quotation character, ""
encodes a single "
character within the string value).
The client of the API decides how to handle inconsistencies between the lines, e.g. if lines have too few or too many values.
Both levels use a CSV port to configure the textual input/output port, the separator and quotation character.
(csv-port? obj)
Returns #t
if obj is a CSV port; returns #f
otherwise.
(csv-input-port? obj)
Returns #t
if obj is a CSV port for reading data; returns #f
otherwise.
Returns #t
if obj is a CSV port for writing data; returns #f
otherwise.
Returns a new CSV port for reading or writing data via an underlying textual port tport. If tport is an output port, the CSV port can be used for writing. If tport is an input port, the CSV port can be used for reading. The default for tport is the current input port current-input-port
exported from library (lispkit port)
.
The separation character used by the CSV port is sep, the quotation character is quote. The default for sep is #\,
and for quote the default is #\"
.
Returns the textual port on which the CSV port csvp is based on.
Returns the separation character used by the CSV port csvp.
Returns the quotation character used by the CSV port csvp.
The line-level API provides means to read a whole CSV file via csv-read
and write data in CSV format via csv-write
.
Reads from CSV port csvp first the header, if readheader?
is set to #t
, and then all the lines until the end of the input is reached. Procedure csv-read
returns two values: the header line (a list of strings representing the column names), and a vector of all data lines, which itself are lists of strings representing the individual field values. The default for readheader? is #t
. If readheader? is set to #f
, the first result of csv-read
is always #f
.
Writes to CSV port csvp first the header (a list of strings representing the column names) unless header is set to #f
. Then procedure csv-write
writes each line of lines
. lines
is a vector of lists representing the individual field values in string form.
The higher level API has a notion of records. The default representation for records are association lists. The functions for reading and writing records are csv-read-records
and csv-write-records
:
Reads from CSV port csvp first the header and then all the data lines until the end of the input is reached. Header names (strings) are mapped via procedure make-col into column identifiers or column factories (i.e. procedures which take one argument, a column value, and they return either a representation of this column if the value is valid, or #f
if the column value is invalid). With make-record
a list of column identifiers and column factories as well as a list of column values (strings) of a data line are mapped into a record. Procedure csv-read-records
returns a vector of records.
The default make-col procedure is make-symbol-column
. The default make-record
function is make-alist-record/excess
.
First writes the header to CSV port csvp by mapping header
, which is a list of column identifiers. to a list of header names using procedure col->str. Then, csv-write-records
writes all the records from the vector records by mapping each record to a data line. This is done by applying field->str to all column identifiers for the record. field->str takes two arguments: a column identifier and the record.
The default implementation for procedure col->str is symbol->string
. The default implementation for procedure field->str is alist-field->string
.
Returns a symbol representing the trimmed string str. If the trimmed string is empty, make-symbol-column
returns #t
. This procedure can be used for creating column identifers out of column names in procedure csv-read-records
.
Returns a new record given a list of column identifiers or column factories (i.e. procedures which take one argument, a column value, and they return either a representation of this column if the value is valid, or #f
if the column value is invalid) cols, and a list of column values fields.
This procedure represents records as association lists, iterating through all cols and fields values. If there are more fields values than cols expressions, than they are skipped. If there are more cols expressions than fields values, #f
is used as a default for missing fields values. If a cols expression is a procedure, the association entry gets created by calling the procedure with the corresponding fields value. For all other cols expression types, a pair is created with the cols expression being the car and the fields value being the cdr.
Returns a new record given a list of column identifiers or column factories (i.e. procedures which take one argument, a column value, and they return either a representation of this column if the value is valid, or #f
if the column value is invalid) cols, and a list of column values fields.
This procedure represents records as association lists, iterating through all cols and fields values. If there are more fields values than cols expressions, than #f
is used as a default cols expression. If there are more cols expressions than fields values, #f
is used as a default for missing fields values. If a cols expression is a procedure, the association entry gets created by calling the procedure with the corresponding fields value. For all other cols expression types, a pair is created with the cols expression being the car and the fields value being the cdr.
Returns the column value of column col from association list record. alist-field->string
assumes that record is an association list whose values are strings. This is how the procedure is defined:
(csv-output-port? obj)
(make-csv-port) (make-csv-port tport) (make-csv-port tport sep) (make-csv-port tport sep quote)
(csv-base-port csvp)
(csv-separator csvp)
(csv-quotechar csvp)
(csv-read csvp) (csv-read csvp readheader?)
(csv-write csvp header lines)
(csv-read-records csvp) (csv-read-records csvp make-col) (csv-read-records csvp make-col make-record)
(csv-write-records csvp header records) (csv-write-records csvp header records col->str) (csv-write-records csvp header records col->str field->str)
(make-symbol-column str)
(make-alist-record cols fields)
(make-alist-record/excess cols fields)
(alist-field->string record col)