Classes of User-Specified Functions for operating on CEROs

The set of functions that could be applied to the CERO, and data series within the CERO, is infinitely large, so it is obviously impossible to provide all these functions. It is therefore necessary that the user provide functions as they are needed, by writing the appropriate python 3 code and including this function in libfuncs.py. To minimise the difficulty and complexity of achieving this, ConCERO includes 3 classes of wrapper functions, that significantly reduce the difficulty for the user in extending the power of FromCERO.

A wrapper function is a function that encapsulates another function, and therefore has access to both the inputs and outputs of the encapsulated function. Because the wrapper function has access to the inputs, it can provide pre-processing on the input to reshape it into a specific form, and because it has access to the output of the function, it can post-process the output of the function - mutating it into a desirable form.

A wrapper function can be directed to encapsulate a function by preceding the function with a decorator. A decorator is a simple one line statement that starts with the ‘@’ symbol and then the name of the wrapper function. For example, to encapsulate func with the dataframe_op wrapper, the code is:

@dataframe_op
def func(*args, **kwargs):
    ...
    return cero

The wrapper functions themselves are stored in the libfuncs_wrappers module, but the wrappers themselves should never be altered by the user.

What the 3 classes of wrappers are, and how to apply the function wrappers, are explained below, in addition to the case where no wrapper/decorator is provided.

Class 1 Functions - DataFrame Operations

Class 1 functions are the most general type of wrapper functions, and can be considered a superset of the other two. Class 1 functions operate on a pandas.DataFrame object, and therefore can operate on an entire CERO if need be. A class 1 function must have the following function signature:

@dataframe_op
def func_name(df, *args, **kwargs):
    ...
    return cero

Note the following key features:

  • The function is proceeded by the dataframe_op decorator (imported from libfuncs_wrappers).
  • The first argument provided to func_name, that is df, will be a CERO (an instance of a pandas.DataFrame), reduced by the arrays/inputs options.
  • The returned object (cero) must be a valid CERO. A valid CERO is a pandas.DataFrame object with a ``DatetimeIndex``for columns and tuples/string-type values for the index values.

The libfuncs function merge provides a simple example of how to apply this wrapper:

@dataframe_op
def merge(df):
    df.iloc[0, :] = df.sum(axis=0) # Replaces the first series with the sum of all the series
    return df

Class 2 Functions - Series Operations

Class 2 functions operate on a single pandas.Series object. Note that a single row of a pandas.DataFrame is an instance of a pandas.Series. The series operations class can be considered a subset of DataFrame operations, and a superset of all recursive operations (discussed below).

Similar to class 1 functions, class 2 functions must fit the form:

@series_op
def func(series, *args, **kwargs):
    ...
    return pandas_series

With similar features:

  • The function is proceeded by the @series_op decorator (imported from libfuncs_wrappers).
  • The first argument (series) must be of pandas.Series type.
  • Return an object of pandas.Series type (pandas_series). pandas_series must be of the same shape as series.

Class 3 Functions - Recursive Operations

Recursive operations must fit the form:

@recursive_op
def func(*args, **kwargs):
    ...
    return calc

Noting that:

  • Positional arguments are provided in the same order as their sequence in the data series.
  • The return value calc must be a single floating-point value.

Note that options can be provided to an operation object to alter the behaviour of the recursive operation. Those options are:

  • init: list(float) - values that precede the data series that serve as initialisation values.
  • post: list(float) - values that follow the data series for non-causal recursive functions.
  • auto_init: init - automatically prepend the first value in the array an auto_init number of times to the series (and therefore using that as the initial conditions).
  • auto_post: int - automatically postpend the last value in the array an auto_post number of times to the series (and therefore using that as the post conditions).
  • init_cols: list(int) - specifies the year(s) to use as initialisation values.
  • post_cols: list(int) - specifies the year(s) to use as post-pended values.
  • init_icols: list(int) - specifies the index (zero-indexed) to use as initialisation values.
  • post_icols: list(int) - specifies the index (zero-indexed) to use as post-pended values.
  • inplace: bool - If True, then the recursive operation will be applied on the array inplace, such that the result from a previous iteration is used in subsequent iterations. If False, the operation proceeds ignorant of the results of previous iterations. True by default.

How these items are to be applied is probably best explained with an example - consider the recursive operation is a 3 sample moving point averaging filter. This can be implemented by including mv_avg_3() (below) in libfuncs.py:

@recursive_op
def mv_avg_3(a, b, c):
    return (a + b + c)/3

It is also necessary to provide the arguments, init and post in the configuration file, so the operation object looks somthing like:

func: mv_avg_3
init:
    - 1
post:
    - 2

This operation would transform the data series [2, 1, 3] to the values [1.3333, 1.7777, 2.2593] - i.e. [(1+2+1)/3, (1.333+1+3)/3, (1.7777+3+2)/3]. If, instead, the configuration file looks like:

func: mv_avg_3
init:
    - 1
post:
    - 2
inplace: False

Then the output of the same series would be [1.3333, 2, 2] - that is, [(1+2+1)/3, (2+1+3)/3, (1+3+2)/3].

Wrapper-less Functions

It is strongly recommended that a user use the defined wrappers to encapsulate functions. This section should only be used as guidance to understand how the wrappers operate with the FromCERO module, and for understanding how to write additional wrappers (which is a non-trivial exercise).

A function that is not decorated with a pre-defined wrapper (as discussed previously) must have the following function signature to be compatible with the FromCERO module:

def func_name(df, *args, locs=None, **kwargs):
    ...
    return cero

Where:

  • df receives the entire CERO (as handled by the calling class), and
  • locs receives a list of all identifiers specifying which series of the CERO have been specified, and
  • cero is the returned dataframe and must be of CERO type. The FromCERO module will overwrite any values of its own CERO with those provided by cero, based on an index match (after renaming takes place).

Other Notes

  • Avoid trying to create a renaming function - use the cero.rename_index_values() method - it has been designed to work around a bug in Pandas (Issue #19497).
  • The system module libfuncs serves as a source of examples for how to use the function wrappers.

Technical Specifications of Decorators

libfuncs_wrappers.dataframe_op(func)[source]

This decorator is designed to provide func (the encapsulated function) with a restricted form of df (a CERO). A restricted df is the original df limited to a subset of rows and/or columns. Note that a restriction on df.columns will be compact (the mathematical property), but this is not necessarily the case for restriction on df.index.

libfuncs_wrappers.series_op(func)[source]

This decorator provides func (the encapsulated function) with the first pandas.Series in a pandas.DataFrame (i.e. the first row in df). Note that this wrapper is encapsulated within the dataframe_op wrapper.

libfuncs_wrappers.recursive_op(func)[source]

Applies the encapsulated function (func) iteratively to the elements of array from left to right, with init prepended to array and post postpended.

libfuncs_wrappers.log_func(func)[source]

Logging decorator - for debugging purposes. To apply to function func:

@log_func
def func(*args, **kwargs):
    ...

Created on Thu Dec 21 16:36:02 2017

@author: Lyle Collins @email: Lyle.Collins@csiro.au