CADISHI modules¶
The following documentation was automatically generated from the docstrings present in the source code files.
cadishi¶
Cadishi distance histogram calculation package.
cadishi.base¶
Cadishi base library.
Provides the basic data container class and some more base classes that are used throughout the Cadishi (and Capriqorn) code.
Moreover, the loc_* strings are defined here centrally to point to the default locations for various data in dictfs and, consequently, in HDF5 files.
-
class
cadishi.base.
Container
(number=-1, mkdir=[])[source]¶ Central container to hold/accumulate data while it is proparaged through Cadishi’s workers or Capriqorn’s pipelines.
-
append_data
(other, location, skip_keys=['radii'])[source]¶ Append data at location from other to self. If location does not exist in the current instance, it is created.
-
get_geometry
(valid_geom=['Sphere', 'Cuboid', 'Ellipsoid', 'ReferenceStructure', 'MultiReferenceStructure', 'Voxels'])[source]¶ Search the pipeline log backwards for a geometry filter that was potentially used, and return the result as a string, or return None.
-
query_meta
(path)[source]¶ Obtain a value from the pipeline log list by using a Unix-path-like string identifier.
-
-
class
cadishi.base.
Filter
(source=-1, verbose=False)[source]¶ Filter base class, to be overloaded by an actual implementation.
-
class
cadishi.base.
PipelineElement
[source]¶ Base class common to Filter, Reader, and Writer. Provides methods needed to implement dependency checking between pipeline elements. Note: The “object” parameter makes it a new style class which is necessary to make the “super()” mechanism work to implement inheritance of the _depends and _conflicts lists.
cadishi.io.ascii¶
Cadishi ASCII IO library.
ASCII data writer for base.Container instances, mainly designed for debugging purposes. A reader is currently not implemented.
cadishi.io.dummy¶
Cadishi dummy IO module, useful to test and develop Capriqorn pipelines.
cadishi.io.hdf5¶
Cadishi HDF5 IO library.
HDF5 data reader/writer for base.Container instances. Heavily used by Cadishi and Capriqorn.
-
class
cadishi.io.hdf5.
H5Reader
(file=['default.h5'], first=1, last=None, step=1, shuffle=False, shuffle_reproducible=False, verbose=False)[source]¶ HDF5 reader returning base.Container instances.
-
get_frame
(idx_tuple)[source]¶ Read a frame identified by its (file_idx, frame_idx) and return a Container object.
-
-
class
cadishi.io.hdf5.
H5Writer
(file='default.hdf5', source=-1, compression=None, mode='w', verbose=False)[source]¶ HDF5 writer for base.Container instances.
-
default_compression
= 'lzf'¶
-
dump
()[source]¶ Save a series of frames (== trajectory). The dump() method saves all the frames pending from the Writer’s data source (self.src). If the user code sets up a data processing pipeline, the dump() routine drives it by providing the final sink.
-
get_meta
()[source]¶ Return information on the HDF5 writer, ready to be added to a frame object’s list of pipeline meta information.
-
valid_compression
= [None, 'gzip', 'lzf']¶
-
cadishi.io.pickle¶
Cadishi IO library using pickle.
The name was chosen deliberately to read ‘pickel’ to avoid name conflicts.
May be used as a fallback in case HDF5 is not available. It is, however, significantly slower than HDF5.
-
class
cadishi.io.pickel.
PickleReader
(file='default_', first=None, last=None, step=1, verbose=False)[source]¶ Pickle reader for base.Container instances.
-
class
cadishi.io.pickel.
PickleWriter
(file='default_', source=-1, verbose=False)[source]¶ Pickle writer for base.Container instances.
-
dump
()[source]¶ Save a series of base.Container instances pending from the writer’s data source to individual pickle files.
-
cadishi.dict_util¶
Various NumPy- and dictionary-related utilities.
Implements add, append, and scale operations for numerical data (ie. NumPy arrays) stored in dictionaries. In addition, an ASCII output routine is provided.
-
cadishi.dict_util.
append_values
(X, Y, skip_keys=['radii'])[source]¶ Implement X.append(Y) where X and Y are Python dictionaries that contain NumPy data types. The operation is applied to X for any value in Y, excluding keys that are in the list skip_keys. Typically, the values of X, Y are NumPy arrays (e.g. particle numbers) that are appended.
Parameters: - X (dict) – X is a dictionary with string keys that contains NumPy arrays.
- Y (dict) – Y is a dictionary with string keys that contains NumPy arrays.
- skip_keys (list of strings) – skip_keys is a list of strings for which the append operation is skipped.
Returns: The function scale_values operates on X directly and does not return anything.
Return type: None
-
cadishi.dict_util.
scale_values
(X, C, skip_keys=['radii', 'frame'])[source]¶ Implement X = X times C where X is a Python dictionary that contains supported data types. The operation is applied to any value in X, excluding keys that are in the list skip_keys. Typically, the values of X are NumPy arrays (histograms) that are rescaled after summation using a scalar C (e.g. to implement averaging operation).
Parameters: - X (dict) – X is a dictionary with string keys that contains NumPy arrays.
- C (scalar, NumPy array) – C is a multiplier, either a scalar of a NumPy array of size compatible with the contents of X.
- skip_keys (list of strings) – skip_keys is a list of strings for which the sum operation is skipped.
Returns: The function scale_values operates on X directly and does not return anything.
Return type: None
-
cadishi.dict_util.
sum_values
(X, Y, skip_keys=['radii', 'frame'])[source]¶ Implement X += Y where X and Y are Python dictionaries (with string keys) that contain summable data types. The operation is applied to X for any value in Y, excluding keys that are in the list skip_keys. Typically, the values of X, Y are NumPy arrays (e.g. histograms) that are summed.
Parameters: - X (dict) – X is a dictionary with string keys that contains NumPy arrays.
- Y (dict) – Y is a dictionary with string keys that contains NumPy arrays.
- skip_keys (list of strings) – skip_keys is a list of strings for which the sum operation is skipped.
Returns: The function sum_values operates on X directly and does not return anything.
Return type: None
-
cadishi.dict_util.
write_dict
(dic, path, level=0)[source]¶ Write a dictionary containing NumPy arrays or other Python data structures to text files. In case the dictionary contains other dictionaries, the function is called recursively. The keys should be strings to guarantee successful operation.
Parameters: - dic (dictionary) – A dictionary containing NumPy arrays or other Python data structures.
- path (string) – Path where the dictionary and its data shall be written to.
- level (int, optional) – Level in the nested-dictionary hierarchy during recursive operation. This parameter was added for debugging purposes and does not have any practical relevance.
Returns: The function write_dict does not return anything.
Return type: None
cadishi.dictfs¶
dictfs, the dictionary-based in-memory “file system”.
Store and retrieve data from nested dictionaries in memory using path name strings similar to a UNIX file system. Can be used in tandem with HDF5 IO.
cadishi.h5pickle¶
Python-to-HDF5 serialization.
h5pickle.py provides load() and save() routines to write Python data structures into HDF5 files. It works with NumPy arrays and basic Python data types. Nested dictionaries are used to map HDF5 group hierarchies.
Note: The code is likely to fail with more complicated Python data types.
Working with the typical data sets used with Cadishi and Capriqorn, the HDF5 serialization implemented by h5pickle turns out to be a factor of 10 faster than Python’s native Pickle.
cadishi.util¶
Miscellaneous useful and convenient functions used by Cadishi and Capriqorn, of potential general use.
-
cadishi.util.
appendLineToFile
(file_name, string)[source]¶ Append string as a line to the end of the file identified by file_name.
-
cadishi.util.
check_parameter
(parameters, label, dtype, default_value, valid_values=None, min_value=None, max_value=None, file_existence=False)[source]¶ Check a parameter for validity.
Throws ValueError or IOError with useful end-user-friendly messages.
-
cadishi.util.
check_parameter_labels
(input, reference)[source]¶ Check the keys of a two-level nested dictionary structure against a reference structure.
Raises KeyError in case of an invalid key.
-
cadishi.util.
compare
(histo1, histo2)[source]¶ Compare two NumPy arrays (e.g. histograms) if they are identical.
-
cadishi.util.
compare_approximately
(histo1, histo2, ks_stat_max=0.01, p_value_min=0.99)[source]¶ Check if two histograms (1D numpy arrays) or two sets of histograms (2D numpy arrays) are reasonably similar.
The routine can be used to check the results of single precision computations against a reference file. Computes the Kolmogorov-Smirnov statistic on 2 samples using http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ks_2samp.html
-
cadishi.util.
compare_strictly
(histo1, histo2)[source]¶ Check if two histograms (1D numpy arrays) or two sets of histograms (2D numpy arrays) are identical.
Only suitable to check the results of double precision computations.
-
cadishi.util.
dump_histograms
(file_name, histograms, r_max, n_bins)[source]¶ Save histograms into a NumPy text file. Legacy routine.
-
cadishi.util.
generate_random_coordinate_set
(n_atoms=[512, 1024, 2048], coord_min=(0.0, 0.0, 0.0), coord_max=(1.0, 1.0, 1.0), blowup_factor=1.0)[source]¶ Return pseudo-random coordinate sets in a box.
-
cadishi.util.
generate_random_point_in_sphere
(R)[source]¶ Return a coordinate triple of a randomly located point inside a sphere of radius R.
-
cadishi.util.
generate_random_point_on_spherical_surface
(R)[source]¶ Return a coordinate triple of a randomly located point inside a sphere of radius R.
-
cadishi.util.
get_elements
(header)[source]¶ Return a complete list of all the chemical elements present in a header. Header may be a string or a list containing single element IDs or pair combinations thereof. “#” and “radii” are skipped automatically.
-
cadishi.util.
get_n_cpu_cores
()[source]¶ Determine the number of CPU cores on a Linux host.
Returns: number of CPU cores Return type: int
-
cadishi.util.
get_n_cpu_sockets
()[source]¶ Determine the number of CPU sockets on a Linux host.
Returns: - int – number of CPU sockets
- May throw an IOError exception in case /proc/cpuinfo does not exist.
-
cadishi.util.
get_numa_domains
()[source]¶ Parse and return the output of the <numactl –hardware> command.
-
cadishi.util.
load_class
(module_name, class_name)[source]¶ Load a class from a module, where class and module are specified as strings. Useful to dynamically build Capriqorn pipelines.
-
cadishi.util.
load_parameter_file
(file_name)[source]¶ Load parameters from a JSON or YAML file and return it as a nested structure of dictionaries.
-
cadishi.util.
ls
(resource, files=True, directories=False)[source]¶ Return a list of files and optionally directories located at resource.
-
cadishi.util.
md
(resource)[source]¶ Create a directory (for a file), if necessary. The behaviour highly depends on the resource (string) parameter.
- If resource is of the form “string”,
- nothing is done.
- If resource is of the form “string/”,
- a directory labeled “string” is created.
- If resource is of the form “string/foo”,
- a directory labeled “string” is created.
- If resource is of the form “string/foo/”,
- a directory structure “string/foo” is created.
-
cadishi.util.
open_r
(file_name)[source]¶ Open an uncompressed or GZIP-compressed text file for reading. Return the file pointer.
-
cadishi.util.
pipeline_entry
(label, param)[source]¶ Return a dictionary with a single key-value pair, in particular label (key) being a string label, and param being a dictionary with string keys and arbitrary values. The return value may be appended eg. to a pipeline_log list.
-
cadishi.util.
quote
(string)[source]¶ Add quotes to the beginning and the end of a string if not already present.
-
cadishi.util.
redirectOutput
(file_name)[source]¶ Redirect stdout and stderr of the present process to the file specified by file_name.
-
cadishi.util.
rmrf
(resource)[source]¶ Remove file or directory tree. No error is raised if the target does not exist.
-
cadishi.util.
scratch_dir
()[source]¶ Return and create a per-user scratch directory for unit test data.
-
cadishi.util.
search_pipeline
(label, pipeline)[source]¶ Iterate through the pipeline list backwards in order to find the entry (dict) identified by label (string).
-
cadishi.util.
set_numa_domain
(numa_id, numa_topology)[source]¶ Pin the current process onto a numa domain.
-
cadishi.util.
testcase
()[source]¶ Try to locate the test case that comes with cadishi. Works for a check-out (or tarball) of the source files as well as for an installation. Returns the full path to the testcase including a trailing slash.
-
cadishi.util.
timeStamp
(dateAndTime=False, t0=0.0)[source]¶ Return a string with a time stamp suitable to prefix log lines with.