Module qcsv

A small API to read and analyze CSV files by inferring types for each column of data.

Currently, only int, float and string types are supported. from collections import namedtuple

Index

Functions

def cast(table)

cast type casts all of the values in table to their corresponding types in types.

The only special case here is missing values or NULL columns. If a value is missing or a column has type NULL (i.e., all values are missing), then the value is replaced with None.

N.B. cast is idempotent. i.e., cast(x) = cast(cast(x)).

def cell_str(cell_contents)

cell_str is a convenience function for converting cell contents to a string when there are still NULL values.

N.B. If you choose to work with data while keeping NULL values, you will likely need to write more functions similar to this one.

def column(table, colname)

column returns a named tuple Column of the column in table with name colname.

def columns(table)

columns returns a list of all columns in the data set, where each column has type Column.

def convert_columns(table, **kwargs)

convert_columns executes converter functions on specific columns, where the parameter names for kwargs are the column names, and the parameter values are functions of one parameter that return a single value.

For example

convert_columns(names, rows, colname=lambda s: s.lower())

would convert all values in the column with name colname to lowercase.

def convert_missing_cells(table, dstr='', dint=0, dfloat=0.0)

convert_missing_cells changes the values of all NULL cells to the values specified by dstr, dint and dfloat. For example, all NULL cells in columns with type str will be replaced with the value given to dstr.

def convert_types(table, fstr=None, fint=None, ffloat=None)

convert_types works just like convert_columns, but on types instead of specific columns.

def frequencies(column)

frequencies returns a dictionary where the keys are unique values in the column, and the values correspond to the frequency of each value in the column.

def map_data(table, f)

map_data executes f on every cell in table with five arguments, in order: column type, column name, row index, column index, contents. The result of the function is placed in the corresponding cell location.

A new Table is returned with the converted values.

def map_names(table, f)

map_names executes f on every column header in table, with three arguments, in order: column type, column index, column name. The result of the function is placed in the corresponding header location.

A new Table is returned with the new column names.

def print_data_table(table)

print_data_table is a convenience function for pretty-printing the data in tabular format, including header names and type annotations.

def read(fname, delimiter=',', skip_header=False)

read loads cell data, column headers and type information for each column given a file path to a CSV formatted file. A Table namedtuple is returned with fields types, names and rows.

All cells have left and right whitespace trimmed.

All rows must be the same length.

delimiter is the string the separates each field in a row.

If skip_header is set, then no column headers are read, and column names are set to their corresponding indices (as strings).

def type_str(typ)

type_str returns a string representation of a column type.

Classes

class Column

Column(type, name, cells)

Ancestors (in MRO)

  • __pdoc_file_module__.Column
  • __builtin__.tuple
  • __builtin__.object

Static methods

def __new__(_cls, type, name, cells)

Create new instance of Column(type, name, cells)

Instance variables

var cells

A list of list of all data in this column. Each datum is guaranteed to have type float, int, str or will be the None value.

var name

The name of this column.

var type

The type of this column as a Python type constructor, or None if the type could not be inferred.

class Table

Table(types, names, rows)

Ancestors (in MRO)

  • __pdoc_file_module__.Table
  • __builtin__.tuple
  • __builtin__.object

Static methods

def __new__(_cls, types, names, rows)

Create new instance of Table(types, names, rows)

Instance variables

var names

A list of column names from the header row of the source data.

var rows

A list of rows, where each row is a list of data. Each datum is guaranteed to have type float, int, str or will be the None value.

var types

Contains inferred type information for each column in the table as a dictionary mapping type name to a Python type constructor. When a type cannot be inferred, it will have type None.


Documentation generated by pdoc.