Every Simulation has a CreativeAnalyzer attached. This CreativeAnalyzer logs data from the simulation as it runs. If all the data is stored in memory, then long simulations suffer from memory creep and will eventually bog down. Instead a logfile is created for each type of data, and each subsequent generation simply appends to those logfiles. The analysis data is therefore always available.
The data can be read by any analyzer, by pointing it to the correct directory (“Test/analyzer” by default). That means that the data can be read and plotted after the simulation finishes. Partial plots can even be generated while the simulation is still running, by creating an Analyzer in another terminal and reading the data generated thus far.
In general I’ve tried to collect only the data necessary. If you try and collect all the data that might ever be desired, you’ll create log bloat very quickly. If too much data is collected (especially if the environment is logged each generation), the data can easily exceed 20 GB. Instead, do as much processing as possible before hand and store only the aggregate data (e.g. average fitness rather than each organism’s fitness).
Each chemical can be assigned a complexity number based on how much of its gathering is due to complexes. For instance if an organism’s function vector for a chemical is 100, 1 due to default, 50 due to proteins, 10 due to len = 2 complexes and 39 due to len = 3 complexes, then for that organism, that chemical would have a complexity rating of \((1*0+50*1+10*2+39*3)/100 = 1.87\). This data can be plotted for each chemical individually, or in aggregate as an organism-specific complexity score.
We can also compute “irreducible complexity”. Each functional complex can be assigned a component complexity rating based on how many of its components are functional. The algorithm is similar to computing complexity. In this case, For each chemical, each functional complex contributes \(amount*function*N/orgfunction\) to the chemical score where \(N\) is the number of non-functional components of that complex. The chemical scores are then averaged to get the organism score.
The analysis module is responsible for tracking and displaying the data from a BC simulation. There are two classes, Analyzer which is used to read and display stored data, and CreativeAnalyzer which extends Analyzer to allow tracking and storing data.
Reads and plots data collected from a simulation.
The Analyzer class defines methods for reading and plotting simulation data. Data is read from the files created by a CreativeAnalyzer.
src/analyze.py is a command line usable script to use Analyzer to display stored data.
Note that the various plot_... methods provided by Analyzer are functionally equivalent to a pylab plot([some data]) command. (although they do set their own title)
The Analyzer provides access to the analysis data. This access is provided through quick_read(data) and export_as_csv(data). The options for data are:
Exports all the analyzer data to a csv with headings. Note that for that is separated by chemical number or complex length columns are created for each chemical or length e.g. chemicals_0, chemicals_1,...
All headings match the data variables discussed in the Analyzer class documentation.
Args:
output_file (str): The filename of the new csv data
Converts an analyzer data log to csv.
Args:
data (str): the data file to read from.
output_file (str): The filename of the new csv data
Plots the number of organisms that contribute to the gene pool a few generations ahead. How many controlled by track_ancestors.
Plots the average time spent on each type of action. The white section corresponds to the time spent switching modes in which no useful work is done.
Plots the average complexity value for each chemical
Plots the average number of each chemical owned by organisms in each generation
Plots the average number of distinct complexes of each length owned by an organism
Plots the average units of complexes owned by an organism, broken down by complex length.
Plots the complexity, averaging over all chemicals. Does not weight based on chemical amounts or effectiveness.
Args:
breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False
mass_mode (bool) - special options for mass_run’s plot. Default: False
**kwargs: a list of keyword args that can be passed to the plotting function
Plots two graphs: one of fitness and one of complexity
Args:
breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False
**kwargs: a list of keyword args that can be passed to the plotting function
Plots the percent of components of complexes that are functional
Plots the average number of moles of attempted-protein that were returned due to insufficient resources to finish.
Plots the number of families of each length
Plots the average size of families of each length
Plots the average fitness
Args:
breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False
**kwargs: a list of keyword args that can be passed to the plotting function
Plots the time taken by each generation.
Plots the average genome length
Args:
breakdown (bool): For single runs, this plot can show regions marking min, lower Q, average, upper Q, max. To disable this feature and plot only the average line, set breakdown to False
**kwargs: a list of keyword args that can be passed to the plotting function
Plots the total number of complexes invented
Plots the number of complexes/families that currently/have ever existed, and may be restricted to non- or functional.
Args:
comp_or_fam (str) either “complexes” or “families” (only first letter necessary).
timeframe (str) either “current” or “ever”
functional whether to plot only functional rather than only non-functional. Both are printed if this is None.
derivative whether to plot the derivative.
**kwargs: a list of keyword args that can be passed to the plotting function
Plots the percent of complexes that are functional
Plot how long each organism spent on each chemical.
Creates a series of subplots where each bar corresponds to an the amount of time an organsim spent in each chemical gaussian. Each subplot in the series represents a single generation.
Loads data as saved by the CreativeAnalyzer class.
Reads from a simulation or batch of simulations.
Average, mean, and standard deveation are calculated using the middle 80%. Number of blanks uses the whole data set.
Args:
data (str): the data file to read from.
- mode (str): The reading mode. Options are:
standard returns the mean of multiple runs
- threshold returns the mean, standard deviation, and
number of simulations which did not reach the threshold. Used to aggregate data specified in the [threshold-metrics] section of a simulation’s configuration file.
split returns data from multiple runs separately
stddev retuns tuple (avg, stddev)
- breakdown returns a tuple (min, lower_part, avg,
upper_part, max)
- format (str): File format to expect data in. Either ‘text’ or
- ‘binary’
Returns:
A numpy array or a tuple of arrays
Plots and saves graphs of simulation data.
Draws a picture of the chemical environment.
This function does not use matplotlib.
Saves data from a running simulation.
Data, in the form of text files containing numpy arrays, are saved to an ‘analyzer’ directory.
Closes metric data files.
Also writes threshold metric data files.
Appends to the analyzer data logs.
Args:
metric (str): the name of the metric being recorded
new_data: The new data to add
Saves the given array as a datafile with name name.
format (str): either ‘text’ or ‘binary’
If it is a 1 or 2d array, save in text format. If it has three or more dimensions, save in .np binary format.
Updates the Analyzer with data from a new generation
Computes a score for the complexity and irreducible complexity of the organism based on the complexity level of each chemical function.
More information on computing complexities can be found in Computing Complexity