A small API to read and analyze CSV files by inferring types for each column of data.
Currently, only int, float and string types are supported. from collections import namedtuple
def cast(table)
cast
type casts all of the values in table
to their
corresponding types in types
.
The only special case here is missing values or NULL columns. If a
value is missing or a column has type NULL (i.e., all values are
missing), then the value is replaced with None
.
N.B. cast is idempotent. i.e., cast(x) = cast(cast(x))
.
def cell_str(cell_contents)
cell_str
is a convenience function for converting cell contents
to a string when there are still NULL values.
N.B. If you choose to work with data while keeping NULL values, you will likely need to write more functions similar to this one.
def column(table, colname)
column
returns a named tuple Column
of the column in
table
with name colname
.
def columns(table)
columns
returns a list of all columns in the data set, where each
column has type Column
.
def convert_columns(table, **kwargs)
convert_columns
executes converter functions on specific columns,
where the parameter names for kwargs
are the column names, and
the parameter values are functions of one parameter that return a
single value.
For example
convert_columns(names, rows, colname=lambda s: s.lower())
would convert all values in the column with name colname
to
lowercase.
def convert_missing_cells(table, dstr='', dint=0, dfloat=0.0)
convert_missing_cells
changes the values of all NULL cells to the
values specified by dstr
, dint
and dfloat
. For example, all
NULL cells in columns with type str
will be replaced with the
value given to dstr
.
def convert_types(table, fstr=None, fint=None, ffloat=None)
convert_types
works just like convert_columns
, but on
types instead of specific columns.
def frequencies(column)
frequencies
returns a dictionary where the keys are unique values
in the column, and the values correspond to the frequency of each
value in the column.
def map_data(table, f)
map_data
executes f
on every cell in table
with five
arguments, in order: column type, column name, row index, column
index, contents. The result of the function is placed in the
corresponding cell location.
A new Table
is returned with the converted values.
def map_names(table, f)
map_names
executes f
on every column header in table
, with
three arguments, in order: column type, column index, column
name. The result of the function is placed in the corresponding
header location.
A new Table
is returned with the new column names.
def print_data_table(table)
print_data_table
is a convenience function for pretty-printing
the data in tabular format, including header names and type
annotations.
def read(fname, delimiter=',', skip_header=False)
read
loads cell data, column headers and type information
for each column given a file path to a CSV formatted file. A
Table
namedtuple is returned with fields types
,
names
and rows
.
All cells have left and right whitespace trimmed.
All rows must be the same length.
delimiter
is the string the separates each field in a row.
If skip_header
is set, then no column headers are read, and
column names are set to their corresponding indices (as strings).
class Column
Column(type, name, cells)
class Table
Table(types, names, rows)
Documentation generated by
pdoc
.