A family is a group of similar complexes.
All members of a family have variations on a common function vector (see Function Vectors). These variations are computed by multiplying each entry by a random factor scaled by mutate_factor. All members of a family bind with the same other families, although the strength of the binding is determined separately by the resulting Complex‘s binding affinity.
Families are the solution to the problem of tracking “point mutations” in proteins. In nature a point mutation in a protein is unlikely to have any effect on its behavior. Although our proteins tend to be far shorter than real ones, we want similar behavior. Therefore whenever a new protein is created, it is checked against existing proteins to see if it is related to one of them, that is, if their composition is identical at all but sibling_distance points. Next the new protein is checked agains the sibling’s family’s base composition vector to see if it is within max_relation_distance, thus preventing excessive family sprawl. If a new protein is related to any existing proteins, then it still has a chance, given by separation_chance, to form a new family, otherwise the protein will randomly join one of the families that it is related to.
Complex families consist of all the bindings of each of their component protein families. That is, if we have two families:
A = {A', A'', A'''}
B = {B', B''}
then the complex family will potentially have all of the members:
AB = {A'B', A'B'', A''B', A''B'', A'''B', A'''B''}
However, each member complex is only created when an organism has both of its components simultaneously, thus while all the above could be members of AB, it is likely that only a few of them will actually have existed.
A new family can be created in one of 2 ways. A new family for length 1 complexes is formed when a new protein is created that isn’t put in an existing family as described above. New families for length > 1 complexes are created when family bindings are checked. When an organism wants to see if 2 families bind, it checks the combination with the FamilyTracker. If those 2 families have never been checked before, then the tracker decides whether or not they should bind based on family_binding_chance. If they do bind, then a new family is created.
The only exception is when the new family matches an existing family. Complex families only care about the component families, not about order. In other words, both bindings AB + C and AC + B produce complexes in family ABC. Thus if 2 families bind to form another family that already exists, a reference to that family is returned instead of making a brand new family.
Pykaryote Families
Family(id, environment, base=None)
A family contains a set of related complexes.
Args:
id (frozenset): a python set containing the id numbers of the protein families that make up this family
environment: a reference to the simulation Environment.
base: list
environment: object
function_vector: numpy.ndarray
id: object
length: ‘int’
name: str
size: ‘int’
FamilyTracker(environment)
Tracks and manages all existing families
Args:
environment: A reference to the simulation Evironment.
Returns the family matching the given ID. If no such family exists, creates the family.
Args:
id (frozenset or None): For complexes: a frozenset of proteins making up the family. for proteins: None, an id will be generated for the protein
Checks if two families bind and returns the bound family.
Return None if the families do not bind. If their binding hasn’t been checked before, initializes it properly
Args:
fam1id, fam2id: two families
family_bindings: dict
family_lookup: dict
Generates a function vector for a family with ‘length’ proteins (length = 1 => a family of proteins)
Args:
length (int): The complex length that this function vector corresponds to.
Converts sets (Familiy ids) into strings for comparison/hashing