CADISHI modules

The following documentation was automatically generated from the docstrings present in the source code files.

cadishi

Cadishi distance histogram calculation package.

cadishi.base

Cadishi base library.

Provides the basic data container class and some more base classes that are used throughout the Cadishi (and Capriqorn) code.

Moreover, the loc_* strings are defined here centrally to point to the default locations for various data in dictfs and, consequently, in HDF5 files.

class cadishi.base.Container(number=-1, mkdir=[])[source]

Central container to hold/accumulate data while it is proparaged through Cadishi’s workers or Capriqorn’s pipelines.

append_data(other, location, skip_keys=['radii'])[source]

Append data at location from other to self. If location does not exist in the current instance, it is created.

contains_key(location)[source]

Check if the current object instance has data stored at location.

del_data(location)[source]

Delete data at location from the container.

get_data(location)[source]

Get data at location from the container.

get_geometry(valid_geom=['Sphere', 'Cuboid', 'Ellipsoid', 'ReferenceStructure', 'MultiReferenceStructure', 'Voxels'])[source]

Search the pipeline log backwards for a geometry filter that was potentially used, and return the result as a string, or return None.

get_keys(location, skip_keys=None)[source]

Get a list of the keys of the data stored at location.

get_meta()[source]

Return the instance’s pipeline log list.

put_data(location, data)[source]

Add data to the container at location.

put_meta(meta)[source]

Append pipeline log information to the instance’s log list.

query_meta(path)[source]

Obtain a value from the pipeline log list by using a Unix-path-like string identifier.

scale_data(C, location, skip_keys=['radii', 'frame'])[source]

Scale (ie multiply) data at location by the factor C.

sum_data(other, location, skip_keys=['radii', 'frame'])[source]

Add (+) data at location from other to self. If location does not exist in the current instance, it is created.

class cadishi.base.Filter(source=-1, verbose=False)[source]

Filter base class, to be overloaded by an actual implementation.

get_meta()[source]

Return information on the present filter, ready to be added to a frame object’s list of pipeline meta information.

set_input(source)[source]
class cadishi.base.PipelineElement[source]

Base class common to Filter, Reader, and Writer. Provides methods needed to implement dependency checking between pipeline elements. Note: The “object” parameter makes it a new style class which is necessary to make the “super()” mechanism work to implement inheritance of the _depends and _conflicts lists.

conflicts()[source]
depends()[source]
class cadishi.base.Reader[source]
get_meta()[source]

Return information on the present filter, ready to be added to a frame object’s list of pipeline meta information.

class cadishi.base.TrajectoryInformation[source]

Handle trajectory meta data.

get_pipeline_parameter(_id)[source]

Return the value of the _last_ occurrence of “id” in the pipeline, ie. the pipeline is searched in reversed order.

class cadishi.base.Writer[source]
set_input(source)[source]

cadishi.io.ascii

Cadishi ASCII IO library.

ASCII data writer for base.Container instances, mainly designed for debugging purposes. A reader is currently not implemented.

class cadishi.io.ascii.ASCIIReader[source]

ASCII data reader, currently not implemented.

class cadishi.io.ascii.ASCIIWriter(directory='.', source=-1, verbose=False)[source]

ASCII data writer for base.Container instances.

dump()[source]
write_frame(frm)[source]

cadishi.io.dummy

Cadishi dummy IO module, useful to test and develop Capriqorn pipelines.

class cadishi.io.dummy.DummyReader(n_frames, n_objects=[], verbose=False)[source]

Dummy reader, generates random coordinate data on the fly. Intended for development/testing purposes.

get_frame(i)[source]
next()[source]
class cadishi.io.dummy.DummyWriter(source, verbose=False)[source]

Dummy trajectory writer, provides a sink for a pipeline, discards frame by frame. Mainly for development purposes.

dump()[source]

cadishi.io.hdf5

Cadishi HDF5 IO library.

HDF5 data reader/writer for base.Container instances. Heavily used by Cadishi and Capriqorn.

class cadishi.io.hdf5.H5Reader(file=['default.h5'], first=1, last=None, step=1, shuffle=False, shuffle_reproducible=False, verbose=False)[source]

HDF5 reader returning base.Container instances.

close_h5fp()[source]
get_frame(idx_tuple)[source]

Read a frame identified by its (file_idx, frame_idx) and return a Container object.

get_h5fp(file_idx)[source]
get_meta()[source]

Return information on the HDF5 reader, ready to be added to a frame object’s list of pipeline meta information.

get_trajectory_information()[source]

Collect information from the first frame, assume it to be representative for all the frames, and return it via a trajectory information object.

class cadishi.io.hdf5.H5Writer(file='default.hdf5', source=-1, compression=None, mode='w', verbose=False)[source]

HDF5 writer for base.Container instances.

close_file_safely()[source]
default_compression = 'lzf'
dump()[source]

Save a series of frames (== trajectory). The dump() method saves all the frames pending from the Writer’s data source (self.src). If the user code sets up a data processing pipeline, the dump() routine drives it by providing the final sink.

get_meta()[source]

Return information on the HDF5 writer, ready to be added to a frame object’s list of pipeline meta information.

put_frame(frm)[source]

Save a single frame into a HDF5 group labeled with the frame number.

valid_compression = [None, 'gzip', 'lzf']

cadishi.io.pickle

Cadishi IO library using pickle.

The name was chosen deliberately to read ‘pickel’ to avoid name conflicts.

May be used as a fallback in case HDF5 is not available. It is, however, significantly slower than HDF5.

class cadishi.io.pickel.PickleReader(file='default_', first=None, last=None, step=1, verbose=False)[source]

Pickle reader for base.Container instances.

get_frame(number)[source]

Read a frame identified by its number and return a container object.

get_meta()[source]

Return information on the pickle reader, ready to be added to a frame object’s list of pipeline meta information.

class cadishi.io.pickel.PickleWriter(file='default_', source=-1, verbose=False)[source]

Pickle writer for base.Container instances.

dump()[source]

Save a series of base.Container instances pending from the writer’s data source to individual pickle files.

get_meta()[source]

Return information on the pickle writer, ready to be added to a frame object’s list of pipeline meta information.

put_frame(frm)[source]

Save a single frame into a pickle file labeled with the frame number.

cadishi.dict_util

Various NumPy- and dictionary-related utilities.

Implements add, append, and scale operations for numerical data (ie. NumPy arrays) stored in dictionaries. In addition, an ASCII output routine is provided.

cadishi.dict_util.append_values(X, Y, skip_keys=['radii'])[source]

Implement X.append(Y) where X and Y are Python dictionaries that contain NumPy data types. The operation is applied to X for any value in Y, excluding keys that are in the list skip_keys. Typically, the values of X, Y are NumPy arrays (e.g. particle numbers) that are appended.

Parameters:
  • X (dict) – X is a dictionary with string keys that contains NumPy arrays.
  • Y (dict) – Y is a dictionary with string keys that contains NumPy arrays.
  • skip_keys (list of strings) – skip_keys is a list of strings for which the append operation is skipped.
Returns:

The function scale_values operates on X directly and does not return anything.

Return type:

None

cadishi.dict_util.scale_values(X, C, skip_keys=['radii', 'frame'])[source]

Implement X = X times C where X is a Python dictionary that contains supported data types. The operation is applied to any value in X, excluding keys that are in the list skip_keys. Typically, the values of X are NumPy arrays (histograms) that are rescaled after summation using a scalar C (e.g. to implement averaging operation).

Parameters:
  • X (dict) – X is a dictionary with string keys that contains NumPy arrays.
  • C (scalar, NumPy array) – C is a multiplier, either a scalar of a NumPy array of size compatible with the contents of X.
  • skip_keys (list of strings) – skip_keys is a list of strings for which the sum operation is skipped.
Returns:

The function scale_values operates on X directly and does not return anything.

Return type:

None

cadishi.dict_util.sum_values(X, Y, skip_keys=['radii', 'frame'])[source]

Implement X += Y where X and Y are Python dictionaries (with string keys) that contain summable data types. The operation is applied to X for any value in Y, excluding keys that are in the list skip_keys. Typically, the values of X, Y are NumPy arrays (e.g. histograms) that are summed.

Parameters:
  • X (dict) – X is a dictionary with string keys that contains NumPy arrays.
  • Y (dict) – Y is a dictionary with string keys that contains NumPy arrays.
  • skip_keys (list of strings) – skip_keys is a list of strings for which the sum operation is skipped.
Returns:

The function sum_values operates on X directly and does not return anything.

Return type:

None

cadishi.dict_util.write_dict(dic, path, level=0)[source]

Write a dictionary containing NumPy arrays or other Python data structures to text files. In case the dictionary contains other dictionaries, the function is called recursively. The keys should be strings to guarantee successful operation.

Parameters:
  • dic (dictionary) – A dictionary containing NumPy arrays or other Python data structures.
  • path (string) – Path where the dictionary and its data shall be written to.
  • level (int, optional) – Level in the nested-dictionary hierarchy during recursive operation. This parameter was added for debugging purposes and does not have any practical relevance.
Returns:

The function write_dict does not return anything.

Return type:

None

cadishi.dictfs

dictfs, the dictionary-based in-memory “file system”.

Store and retrieve data from nested dictionaries in memory using path name strings similar to a UNIX file system. Can be used in tandem with HDF5 IO.

cadishi.dictfs.delete(node, path)[source]

Delete the object at path relative to node.

cadishi.dictfs.exists(node, path)[source]

Query the object’s existence at path relative to node.

cadishi.dictfs.load(node, path)[source]

Return the object at path relative to node.

cadishi.dictfs.save(node, path, obj)[source]

Save a deepcopy of obj at path relative to node.

cadishi.h5pickle

Python-to-HDF5 serialization.

h5pickle.py provides load() and save() routines to write Python data structures into HDF5 files. It works with NumPy arrays and basic Python data types. Nested dictionaries are used to map HDF5 group hierarchies.

Note: The code is likely to fail with more complicated Python data types.

Working with the typical data sets used with Cadishi and Capriqorn, the HDF5 serialization implemented by h5pickle turns out to be a factor of 10 faster than Python’s native Pickle.

cadishi.h5pickle.load(h5_grp)[source]

Load a HDF5 group recursively into a Python dictionary, and return the dictionary.

cadishi.h5pickle.save(h5_grp, key, data, compression=None)[source]

Save commonly used Python data structures to a HDF5 file/group. For dictionaries, this function is called recursively, using the keys as labels to create sub-groups.

cadishi.util

Miscellaneous useful and convenient functions used by Cadishi and Capriqorn, of potential general use.

class cadishi.util.PrintWrapper[source]

Wrapper to implement infrequent message printing.

every(context, msg)[source]

Print at every nth invocation.

once(context, msg, time_stamp=None)[source]

Print context and message exactly once.

cadishi.util.appendLineToFile(file_name, string)[source]

Append string as a line to the end of the file identified by file_name.

cadishi.util.check_parameter(parameters, label, dtype, default_value, valid_values=None, min_value=None, max_value=None, file_existence=False)[source]

Check a parameter for validity.

Throws ValueError or IOError with useful end-user-friendly messages.

cadishi.util.check_parameter_labels(input, reference)[source]

Check the keys of a two-level nested dictionary structure against a reference structure.

Raises KeyError in case of an invalid key.

cadishi.util.compare(histo1, histo2)[source]

Compare two NumPy arrays (e.g. histograms) if they are identical.

cadishi.util.compare_approximately(histo1, histo2, ks_stat_max=0.01, p_value_min=0.99)[source]

Check if two histograms (1D numpy arrays) or two sets of histograms (2D numpy arrays) are reasonably similar.

The routine can be used to check the results of single precision computations against a reference file. Computes the Kolmogorov-Smirnov statistic on 2 samples using http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ks_2samp.html

cadishi.util.compare_strictly(histo1, histo2)[source]

Check if two histograms (1D numpy arrays) or two sets of histograms (2D numpy arrays) are identical.

Only suitable to check the results of double precision computations.

cadishi.util.do_cprofile(func)[source]

Decorator to run a function through cProfile.

cadishi.util.do_lprofile(follow=[])[source]

Decorator to run the line profiler on a function.

cadishi.util.dump_histograms(file_name, histograms, r_max, n_bins)[source]

Save histograms into a NumPy text file. Legacy routine.

cadishi.util.generate_random_coordinate_set(n_atoms=[512, 1024, 2048], coord_min=(0.0, 0.0, 0.0), coord_max=(1.0, 1.0, 1.0), blowup_factor=1.0)[source]

Return pseudo-random coordinate sets in a box.

cadishi.util.generate_random_point_in_sphere(R)[source]

Return a coordinate triple of a randomly located point inside a sphere of radius R.

cadishi.util.generate_random_point_on_spherical_surface(R)[source]

Return a coordinate triple of a randomly located point inside a sphere of radius R.

cadishi.util.get_elements(header)[source]

Return a complete list of all the chemical elements present in a header. Header may be a string or a list containing single element IDs or pair combinations thereof. “#” and “radii” are skipped automatically.

cadishi.util.get_executable_name()[source]

Return the name of the present executable.

cadishi.util.get_n_cpu_cores()[source]

Determine the number of CPU cores on a Linux host.

Returns:number of CPU cores
Return type:int
cadishi.util.get_n_cpu_sockets()[source]

Determine the number of CPU sockets on a Linux host.

Returns:
  • int – number of CPU sockets
  • May throw an IOError exception in case /proc/cpuinfo does not exist.
cadishi.util.get_numa_domains()[source]

Parse and return the output of the <numactl –hardware> command.

cadishi.util.load_class(module_name, class_name)[source]

Load a class from a module, where class and module are specified as strings. Useful to dynamically build Capriqorn pipelines.

cadishi.util.load_json(file_name)[source]
cadishi.util.load_parameter_file(file_name)[source]

Load parameters from a JSON or YAML file and return it as a nested structure of dictionaries.

cadishi.util.load_yaml(file_name)[source]
cadishi.util.ls(resource, files=True, directories=False)[source]

Return a list of files and optionally directories located at resource.

cadishi.util.make_iterable(obj)[source]

Pack obj into a list if it is not iterable, yet.

cadishi.util.md(resource)[source]

Create a directory (for a file), if necessary. The behaviour highly depends on the resource (string) parameter.

If resource is of the form “string”,
nothing is done.
If resource is of the form “string/”,
a directory labeled “string” is created.
If resource is of the form “string/foo”,
a directory labeled “string” is created.
If resource is of the form “string/foo/”,
a directory structure “string/foo” is created.
cadishi.util.open_r(file_name)[source]

Open an uncompressed or GZIP-compressed text file for reading. Return the file pointer.

cadishi.util.pipeline_entry(label, param)[source]

Return a dictionary with a single key-value pair, in particular label (key) being a string label, and param being a dictionary with string keys and arbitrary values. The return value may be appended eg. to a pipeline_log list.

cadishi.util.quote(string)[source]

Add quotes to the beginning and the end of a string if not already present.

cadishi.util.redirectOutput(file_name)[source]

Redirect stdout and stderr of the present process to the file specified by file_name.

cadishi.util.rm(resource)[source]

Remove a file. If the file does not exist, no error is raised.

cadishi.util.rmrf(resource)[source]

Remove file or directory tree. No error is raised if the target does not exist.

cadishi.util.save_json(data, file_name)[source]
cadishi.util.save_yaml(data, file_name)[source]
cadishi.util.savetxtHeader(file_name, header, array)[source]

Save data including its header.

cadishi.util.scratch_dir()[source]

Return and create a per-user scratch directory for unit test data.

cadishi.util.search_pipeline(label, pipeline)[source]

Iterate through the pipeline list backwards in order to find the entry (dict) identified by label (string).

cadishi.util.set_numa_domain(numa_id, numa_topology)[source]

Pin the current process onto a numa domain.

cadishi.util.testcase()[source]

Try to locate the test case that comes with cadishi. Works for a check-out (or tarball) of the source files as well as for an installation. Returns the full path to the testcase including a trailing slash.

cadishi.util.timeStamp(dateAndTime=False, t0=0.0)[source]

Return a string with a time stamp suitable to prefix log lines with.

cadishi.util.timefunc(f)[source]

Decorator to run simple timer on a function.

cadishi.util.tokenize(path, sep='/')[source]

Remove any separators sep from path and return a list of the strings in between. If path is empty or ‘/’, the list has the empty string as a single entry.

cadishi.util.write_xyzFile(coords, names, file_name)[source]

Write coordinates in xyz format to the file labeled file_name.

cadishi.version

cadishi.version.get_printable_version_string()[source]
cadishi.version.get_short_version_string()[source]

Return the version number without the patchlevel.

cadishi.version.get_version_string()[source]

Return the full version number.