Analysis

Overview

Every Simulation has a CreativeAnalyzer attached. This CreativeAnalyzer logs data from the simulation as it runs. If all the data is stored in memory, then long simulations suffer from memory creep and will eventually bog down. Instead a logfile is created for each type of data, and each subsequent generation simply appends to those logfiles. The analysis data is therefore always available.

The data can be read by any analyzer, by pointing it to the correct directory (“Test/analyzer” by default). That means that the data can be read and plotted after the simulation finishes. Partial plots can even be generated while the simulation is still running, by creating an Analyzer in another terminal and reading the data generated thus far.

Data Collection

In general I’ve tried to collect only the data necessary. If you try and collect all the data that might ever be desired, you’ll create log bloat very quickly. If too much data is collected (especially if the environment is logged each generation), the data can easily exceed 20 GB. Instead, do as much processing as possible before hand and store only the aggregate data (e.g. average fitness rather than each organism’s fitness).

Computing Complexity

Each chemical can be assigned a complexity number based on how much of its gathering is due to complexes. For instance if an organism’s function vector for a chemical is 100, 1 due to default, 50 due to proteins, 10 due to len = 2 complexes and 39 due to len = 3 complexes, then for that organism, that chemical would have a complexity rating of \((1*0+50*1+10*2+39*3)/100 = 1.87\). This data can be plotted for each chemical individually, or in aggregate as an organism-specific complexity score.

We can also compute “irreducible complexity”. Each functional complex can be assigned a component complexity rating based on how many of its components are functional. The algorithm is similar to computing complexity. In this case, For each chemical, each functional complex contributes \(amount*function*N/orgfunction\) to the chemical score where \(N\) is the number of non-functional components of that complex. The chemical scores are then averaged to get the organism score.

Documentation

The analysis module is responsible for tracking and displaying the data from a BC simulation. There are two classes, Analyzer which is used to read and display stored data, and CreativeAnalyzer which extends Analyzer to allow tracking and storing data.

class pykaryote.utils.analysis.Analyzer(data_folders)

Reads and plots data collected from a simulation.

The Analyzer class defines methods for reading and plotting simulation data. Data is read from the files created by a CreativeAnalyzer.

src/analyze.py is a command line usable script to use Analyzer to display stored data.

Note that the various plot_... methods provided by Analyzer are functionally equivalent to a pylab plot([some data]) command. (although they do set their own title)

Args: data_folders (list): A list of strings for the directory(s)
containing the analysis data. Can be 1 simulations analyzer/ or multiple ones (but with compatible config file)
Data:

The Analyzer provides access to the analysis data. This access is provided through quick_read(data) and export_as_csv(data). The options for data are:

  • fitness - Average fitness
  • genome_length - Average genome length
  • chemicals - Average amount of each chemical owned
  • time_spent - Breakdown of proportional time spent gathering, moving, building
  • failed_builds - Moles of attempted-protein that were returned because of a lack of resources to finish
  • complexes - Average amount of complexes, broken down by length
  • complex_diversity - Average number of unique complexes, broken down by length
  • complexes_invented - Total number of complexes invented, broken down by length
  • family_sizes - Average family size, broken down by length
  • family_lengths - Number of families, broken down by length
  • chemical_complexity - Complexity score of each chemical. See compute_chem_complexity() documentation for details
  • percent_functional - Percent of complexes in use that are functional. See percent_functional() for more details
  • component_percent_functional - Percent of components of functional complexes that are themselves functional.
  • <#>ancestors where # is the number of generations back specified by globals - Number of organisms contributing to the genome # generations ahead
  • complexity - Complexity score of each organism. See compute_complexity()
  • irreducible_complexity- Irreducible complexity score of each organism. See compute_complexity()
  • num_distinct_complexes_current
  • num_distinct_complexes_current_functional
  • num_distinct_complexes_current_nonfunctional
  • num_distinct_complexes_ever - synonymous with complexes_invented
  • num_distinct_complexes_ever_functional
  • num_distinct_complexes_ever_nonfunctional
  • num_families_current
  • num_families_current_functional
  • num_families_current_nonfunctional
  • num_families_ever
  • num_families_ever_functional
  • num_families_ever_nonfunctional
export_all_as_csv(output_file)

Exports all the analyzer data to a csv with headings. Note that for that is separated by chemical number or complex length columns are created for each chemical or length e.g. chemicals_0, chemicals_1,...

All headings match the data variables discussed in the Analyzer class documentation.

Args:

output_file (str): The filename of the new csv data
export_varialble_as_csv(data, output_file)

Converts an analyzer data log to csv.

Args:

data (str): the data file to read from.

output_file (str): The filename of the new csv data

plot_ancestry(**kwargs)

Plots the number of organisms that contribute to the gene pool a few generations ahead. How many controlled by track_ancestors.

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/ancestry.png
plot_avg_time_spent(**kwargs)

Plots the average time spent on each type of action. The white section corresponds to the time spent switching modes in which no useful work is done.

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/time_spent.png
plot_chemical_complexity(**kwargs)

Plots the average complexity value for each chemical

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/chemical_complexity.png
plot_chemicals(**kwargs)

Plots the average number of each chemical owned by organisms in each generation

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/chemicals.png
plot_complex_diversity(**kwargs)

Plots the average number of distinct complexes of each length owned by an organism

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/complex_diversity.png
plot_complexes(**kwargs)

Plots the average units of complexes owned by an organism, broken down by complex length.

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/complexes.png
plot_complexity(breakdown=True, mass_mode=False, **kwargs)

Plots the complexity, averaging over all chemicals. Does not weight based on chemical amounts or effectiveness.

Args:

breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False

mass_mode (bool) - special options for mass_run’s plot. Default: False

**kwargs: a list of keyword args that can be passed to the plotting function

../../_images/complexity.png
plot_complexity_and_fitness(breakdown=True, mass_mode=False, **kwargs)

Plots two graphs: one of fitness and one of complexity

Args:

breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False

**kwargs: a list of keyword args that can be passed to the plotting function

plot_component_percent_functional(**kwargs)

Plots the percent of components of complexes that are functional

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/comp_perc_func.png
plot_failed_builds(**kwargs)

Plots the average number of moles of attempted-protein that were returned due to insufficient resources to finish.

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
plot_family_lengths(**kwargs)

Plots the number of families of each length

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/fam_length.png
plot_family_sizes(**kwargs)

Plots the average size of families of each length

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/fam_size.png
plot_fitness(breakdown=True, **kwargs)

Plots the average fitness

Args:

breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False

**kwargs: a list of keyword args that can be passed to the plotting function

../../_images/fitness.png
plot_gen_time(**kwargs)

Plots the time taken by each generation.

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
plot_genome_length(breakdown=True, **kwargs)

Plots the average genome length

Args:

breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False

**kwargs: a list of keyword args that can be passed to the plotting function

../../_images/genome.png
plot_invented_complexes(**kwargs)

Plots the total number of complexes invented

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/invented_complexes.png
plot_num_distinct(comp_or_fam, timeframe='current', functional=None, derivative=False, **kwargs)

Plots the number of complexes/families that currently/have ever existed, and may be restricted to non- or functional.

Args:

comp_or_fam (str) either “complexes” or “families” (only first letter necessary).

timeframe (str) either “current” or “ever”

functional whether to plot only functional rather than only non-functional. Both are printed if this is None.

derivative whether to plot the derivative.

**kwargs: a list of keyword args that can be passed to the plotting function

plot_percent_functional(**kwargs)

Plots the percent of complexes that are functional

Args:
**kwargs: a list of keyword args that can be passed to the plotting function
../../_images/percent_functional.png
plot_time_spent_in_gaussian()

Plot how long each organism spent on each chemical.

Creates a series of subplots where each bar corresponds to an the amount of time an organsim spent in each chemical gaussian. Each subplot in the series represents a single generation.

data is a three dimentional array where::
data[run_number, organism, gaussian_index] = number_of_codon_reads
(number of codon reads while in that gaussian)
quick_read(data, mode='standard', ndmin=1, format='text')

Loads data as saved by the CreativeAnalyzer class.

Reads from a simulation or batch of simulations.

Average, mean, and standard deveation are calculated using the middle 80%. Number of blanks uses the whole data set.

Args:

data (str): the data file to read from.

mode (str): The reading mode. Options are:
  • standard returns the mean of multiple runs

  • threshold returns the mean, standard deviation, and

    number of simulations which did not reach the threshold. Used to aggregate data specified in the [threshold-metrics] section of a simulation’s configuration file.

  • split returns data from multiple runs separately

  • stddev retuns tuple (avg, stddev)

  • breakdown returns a tuple (min, lower_part, avg,

    upper_part, max)

format (str): File format to expect data in. Either ‘text’ or
‘binary’

Returns:

A numpy array or a tuple of arrays
save_all(out_dir)

Plots and saves graphs of simulation data.

save_environment_drawing(out_dir)

Draws a picture of the chemical environment.

This function does not use matplotlib.

class pykaryote.utils.analysis.CreativeAnalyzer(sim)

Saves data from a running simulation.

Data, in the form of text files containing numpy arrays, are saved to an ‘analyzer’ directory.

Args:
sim: The simulation for which data will be recorded.
close_all()

Closes metric data files.

Also writes threshold metric data files.

quick_append(metric, new_data)

Appends to the analyzer data logs.

Args:

metric (str): the name of the metric being recorded

new_data: The new data to add

save_as_data(ndarray, name, format='text')

Saves the given array as a datafile with name name.

format (str): either ‘text’ or ‘binary’

If it is a 1 or 2d array, save in text format. If it has three or more dimensions, save in .np binary format.

update()

Updates the Analyzer with data from a new generation

pykaryote.utils.analysis.compute_complexity(org)

Computes a score for the complexity and irreducible complexity of the organism based on the complexity level of each chemical function.

More information on computing complexities can be found in Computing Complexity

Table Of Contents

Previous topic

Petri

Next topic

Globals

This Page