Bloom Filter Trie
Data Structures | Typedefs
bft.h File Reference

Interface containing all functions to use a BFT. More...

Go to the source code of this file.

Data Structures

struct  BFT_annotation
 Annotation associated with a BFT_kmer. More...
 

Typedefs

typedef BFT_Root BFT
 Root vertex of a BFT. More...
 
typedef size_t(* BFT_func_ptr) (BFT_kmer *bft_kmer, BFT *bft, va_list args)
 Pointer on function used by iterate_over_kmers() and v_iterate_over_kmers(). More...
 

Functions

Annotation functions

These functions manipulate annotations (color sets).

uint8_t intersection_annots (const uint8_t a, const uint8_t b)
 
uint8_t union_annots (const uint8_t a, const uint8_t b)
 
uint8_t sym_difference_annots (const uint8_t a, const uint8_t b)
 
BFT_annotationcreate_BFT_annotation ()
 Function creating an empty BFT_annotation. More...
 
void free_BFT_annotation (BFT_annotation *bft_annot)
 Function freeing a BFT_annotation. More...
 
BFT_annotationget_annotation (BFT_kmer *bft_kmer)
 Function extracting the annotation (set of colors) associated with a k-mer of a BFT. More...
 
bool presence_genome (uint32_t id_genome, BFT_annotation *bft_annot, BFT *bft)
 Function testing if a k-mer occured in a genome. More...
 
BFT_annotationintersection_annotations (BFT *bft, uint32_t nb_annotations,...)
 Function computing the intersection of a set of annotations. More...
 
BFT_annotationunion_annotations (BFT *bft, uint32_t nb_annotations,...)
 Function computing the union of a set of annotations. More...
 
BFT_annotationsym_difference_annotations (BFT *bft, uint32_t nb_annotations,...)
 Function computing the symmetric difference of a set of annotations. More...
 
uint32_t * get_list_id_genomes (BFT_annotation *bft_annot, BFT *bft)
 Function extracting a list of genome identifiers from an annotation. More...
 
uint32_t get_count_id_genomes (BFT_annotation *bft_annot, BFT *bft)
 Function counting the number of genome identifiers in an annotation. More...
 
uint32_t * intersection_list_id_genomes (uint32_t *list_a, uint32_t *list_b)
 
Graph functions

These functions manipulate a colored de Bruijn graph stored in a BFT.

BFTcreate_cdbg (int k, int treshold_compression)
 Function creating a colored de Bruijn graph stored in a BFT. More...
 
void free_cdbg (BFT *bft)
 Free an allocated colored de Bruijn graph stored in a BFT. More...
 
Insertion functions

These functions insert genomes in a colored de Bruijn graph stored in a BFT.

void insert_genomes_from_files (int nb_files, char **paths, BFT *bft, char *prefix_bft_filename)
 Function inserting genomes (k-mer file) in a BFT. More...
 
void insert_kmers_new_genome (int nb_kmers, char **kmers, char *genome_name, BFT *bft)
 Function inserting k-mers of a new genome in a BFT. More...
 
void insert_kmers_last_genome (int nb_kmers, char **kmers, BFT *bft)
 Function inserting k-mers of the last inserted genome in a BFT. More...
 
K-mer functions

These functions manipulate k-mers.

BFT_kmercreate_kmer (const char *kmer, int k)
 Function creating a BFT_kmer object from a k-mer encoded as an ASCII string (char*). More...
 
BFT_kmercreate_empty_kmer ()
 Function creating an empty BFT_kmer object (all its components are NULL). More...
 
void free_BFT_kmer (BFT_kmer *bft_kmer, int nb_bft_kmer)
 Function freeing allocated BFT_kmers. More...
 
void free_BFT_kmer_content (BFT_kmer *bft_kmer, int nb_bft_kmer)
 Function freeing the content of allocated BFT_kmers. More...
 
void extract_kmers_to_disk (BFT *bft, char *filename_output, bool compressed_output)
 Function extracting the k-mers of a BFT in a file. More...
 
size_t write_kmer_ascii_to_disk (BFT_kmer *bft_kmer, BFT *bft, va_list args)
 Function writing an ASCII k-mer in a file. More...
 
size_t write_kmer_comp_to_disk (BFT_kmer *bft_kmer, BFT *bft, va_list args)
 Function writing an 2 bits encoded k-mer in a file. More...
 
Query functions

These functions query for k-mers or sequences.

BFT_kmerget_kmer (const char *kmer, BFT *bft)
 Function searching for a k-mer in a BFT. More...
 
bool is_kmer_in_cdbg (BFT_kmer *bft_kmer)
 Function testing if a k-mer is in a BFT. More...
 
uint32_t * query_sequence (BFT *bft, char *sequence, double threshold, bool canonical_search)
 Function querying a BFT for a sequence. More...
 
Pattern matching functions

These functions provide pattern matching functionalities over the k-mers or paths of a colored de Bruijn graph stored as a BFT.

bool prefix_matching (BFT *bft, char *prefix, BFT_func_ptr f,...)
 Function for prefix matching over the k-mers of a BFT. More...
 
Marking functions

These functions allow to mark k-mers of a colored de Bruijn graph with flags.

void set_marking (BFT *bft)
 Function locking and preparing the graph for vertices marking (no insertion can happen before unlocking). More...
 
void unset_marking (BFT *bft)
 Function unlocking and the graph locked for vertices marking. More...
 
void set_flag_kmer (uint8_t flag, BFT_kmer *bft_kmer, BFT *bft)
 Function marking a k-mer of a BFT with a flag. More...
 
uint8_t get_flag_kmer (BFT_kmer *bft_kmer, BFT *bft)
 Function getting a k-mer of a BFT with a flag. More...
 
Traversal functions

These functions allow to traverse a colored de Bruijn graph stored as a BFT.

void set_neighbors_traversal (BFT *bft)
 Function locking the graph for traversal. More...
 
void unset_neighbors_traversal (BFT *bft)
 Function unlocking a locked graph for traversal. More...
 
BFT_kmerget_neighbors (BFT_kmer *bft_kmer, BFT *bft)
 Function extracting the neighbors of a k-mer. More...
 
BFT_kmerget_predecessors (BFT_kmer *bft_kmer, BFT *bft)
 Function extracting the predecessors of a k-mer. More...
 
BFT_kmerget_successors (BFT_kmer *bft_kmer, BFT *bft)
 Function extracting the successors of a k-mer. More...
 
Iteration functions

These functions iterate over the k-mers of a colored de Bruijn graph stored as a BFT.

void iterate_over_kmers (BFT *bft, BFT_func_ptr f,...)
 Function iterating over the k-mers of a BFT. More...
 
void v_iterate_over_kmers (BFT *bft, BFT_func_ptr f, va_list args)
 Function iterating over the k-mers of a BFT. More...
 
Disk I/O functions

These functions write and load a BFT from disk.

void write_BFT (BFT *bft, char *filename, bool compress_annotations)
 Function writing a BFT to disk. More...
 
BFTload_BFT (char *filename)
 Function loading a BFT from disk. More...
 

Detailed Description

Interface containing all functions to use a BFT.

Code snippets using this interface are provided in snippets.h.

Typedef Documentation

typedef BFT_Root BFT

Root vertex of a BFT.

A BFT_Root contains the k-mer size as well as the number and name of the inserted genomes. Other contained structures and variables are for internal use only and must not be modified.

typedef size_t(* BFT_func_ptr) (BFT_kmer *bft_kmer, BFT *bft, va_list args)

Pointer on function used by iterate_over_kmers() and v_iterate_over_kmers().

Such a function (user written) is called on every k-mer of a BFT.

Parameters
bft_kmeris a k-mer from a BFT.
bftis the BFT from which bft_kmer is from.
argscontains all additional parameters given to iterate_over_kmers() / v_iterate_over_kmers().
Returns
a size_t type object. It can be use to return an unsigned integer or a pointer.

Function Documentation

BFT_annotation* create_BFT_annotation ( )
inline

Function creating an empty BFT_annotation.

BFT* create_cdbg ( int  k,
int  treshold_compression 
)

Function creating a colored de Bruijn graph stored in a BFT.

Parameters
kis the length of k-mers.
treshold_compressionindicates when the color compression should be triggered (every treshold_compression genome inserted).
Returns
a BFT pointer.
BFT_kmer* create_empty_kmer ( )

Function creating an empty BFT_kmer object (all its components are NULL).

Returns
a BFT_kmer pointer.
BFT_kmer* create_kmer ( const char *  kmer,
int  k 
)

Function creating a BFT_kmer object from a k-mer encoded as an ASCII string (char*).

Parameters
kmeris an an ASCII encoded k-mer string (char*).
kis the k-mer length.
Returns
a BFT_kmer pointer.
void extract_kmers_to_disk ( BFT bft,
char *  filename_output,
bool  compressed_output 
)

Function extracting the k-mers of a BFT in a file.

Parameters
bftis a BFT containing the k-mers to iterate over.
filename_outputis the name of a file to which the k-mers are written. File is overwritten if it already exists.
compressed_outputis a boolean indicating if the k-mers should be written in their 2 bits form (true) or ASCII form (false).
void free_BFT_annotation ( BFT_annotation bft_annot)

Function freeing a BFT_annotation.

Parameters
bft_annotis a pointer to the BFT_annotation to free.
void free_BFT_kmer ( BFT_kmer bft_kmer,
int  nb_bft_kmer 
)

Function freeing allocated BFT_kmers.

Parameters
bft_kmeris a pointer to an array of at least one BFT_kmer.
nb_bft_kmeris the number of BFT_kmer in bft_kmer.
void free_BFT_kmer_content ( BFT_kmer bft_kmer,
int  nb_bft_kmer 
)

Function freeing the content of allocated BFT_kmers.

Parameters
bft_kmeris a pointer to an array of at least one BFT_kmer.
nb_bft_kmeris the number of BFT_kmer in bft_kmer.
void free_cdbg ( BFT bft)

Free an allocated colored de Bruijn graph stored in a BFT.

Parameters
bftis an allocated BFT.
BFT_annotation* get_annotation ( BFT_kmer bft_kmer)

Function extracting the annotation (set of colors) associated with a k-mer of a BFT.

Parameters
bft_kmeris a k-mer obtained via search or iteration over a BFT (via get_kmer() for example).
Returns
a BFT_annotation pointer.
uint32_t get_count_id_genomes ( BFT_annotation bft_annot,
BFT bft 
)

Function counting the number of genome identifiers in an annotation.

Parameters
bft_annotis an annotation.
bftis a BFT from which the annotation was extracted.
Returns
a count of genome identifiers.
uint8_t get_flag_kmer ( BFT_kmer bft_kmer,
BFT bft 
)

Function getting a k-mer of a BFT with a flag.

Parameters
bft_kmeris a k-mer obtained via search/iteration over a BFT for which the function returns the flag.
bftis a BFT locked for vertices marking.
BFT_kmer* get_kmer ( const char *  kmer,
BFT bft 
)

Function searching for a k-mer in a BFT.

Parameters
kmeris an an ASCII encoded k-mer string (char*) to search for in the BFT.
bftis a BFT in which k-mer is searched
Returns
a BFT_kmer pointer.
uint32_t* get_list_id_genomes ( BFT_annotation bft_annot,
BFT bft 
)

Function extracting a list of genome identifiers from an annotation.

Parameters
bft_annotis an annotation from which the ids must be extracted.
bftis a BFT from which the annotation was extracted.
Returns
a pointer to an array of genome identifiers (uint32_t). The first element of this array (position 0) indicates how many ids there are in this array. Therefore, the length of the array is array[0] + 1.
BFT_kmer* get_neighbors ( BFT_kmer bft_kmer,
BFT bft 
)

Function extracting the neighbors of a k-mer.

Parameters
bft_kmeris a k-mer obtained via search/iteration over a BFT.
bftis a BFT from which was extracted bft_kmer
Returns
a pointer to an array of 8 BFT_kmer: positions 0 to 3 are the possible predecessors and 4 to 7 the possible successors.
BFT_kmer* get_predecessors ( BFT_kmer bft_kmer,
BFT bft 
)

Function extracting the predecessors of a k-mer.

Parameters
bft_kmeris a k-mer obtained via search/iteration over a BFT.
bftis a BFT from which was extracted bft_kmer
Returns
a pointer to an array of 4 BFT_kmer that are the possible predecessors.
BFT_kmer* get_successors ( BFT_kmer bft_kmer,
BFT bft 
)

Function extracting the successors of a k-mer.

Parameters
bft_kmeris a k-mer obtained via search/iteration over a BFT.
bftis a BFT from which was extracted bft_kmer
Returns
a pointer to an array of 4 BFT_kmer that are the possible successors.
void insert_genomes_from_files ( int  nb_files,
char **  paths,
BFT bft,
char *  prefix_bft_filename 
)

Function inserting genomes (k-mer file) in a BFT.

Parameters
nb_filesis the number of files to insert.
pathsis an nb_files size array of strings (char*). Each string is the name of a file (+ eventually its path) to insert.
bftis a BFT where the genomes are inserted.
prefix_bft_filenameis a prefix filename (including path) where temporary data can be written to. The prefix must be unique in its directory.
void insert_kmers_last_genome ( int  nb_kmers,
char **  kmers,
BFT bft 
)

Function inserting k-mers of the last inserted genome in a BFT.

Parameters
nb_kmersis the number of k-mers to insert.
kmersis a pointer to an array of strings (char *) that are the k-mers to insert. The arrayis of length nb_kmers.
bftis a colored de Bruijn graph stored as a BFT.
void insert_kmers_new_genome ( int  nb_kmers,
char **  kmers,
char *  genome_name,
BFT bft 
)

Function inserting k-mers of a new genome in a BFT.

Parameters
nb_kmersis the number of k-mers to insert.
kmersis a pointer to an array of strings (char *) that are the k-mers to insert. The array is of length nb_kmers.
genome_nameis the name of the new genome to which the inserted k-mers come from.
bftis a colored de Bruijn graph stored as a BFT.
BFT_annotation* intersection_annotations ( BFT bft,
uint32_t  nb_annotations,
  ... 
)

Function computing the intersection of a set of annotations.

Parameters
bftis a BFT from which the input annotations are originated.
nb_annotationsindicates how many annotations must be included in the intersection.
...is a list of nb_annotations BFT_annotation pointers of which the intersection is computed.
Returns
a BFT_annotation pointer to an annotation which is the intersection of the input annotations.
bool is_kmer_in_cdbg ( BFT_kmer bft_kmer)

Function testing if a k-mer is in a BFT.

Parameters
bft_kmeris a k-mer obtained via search or iteration over a BFT (via get_kmer() for example).
Returns
a boolean indicating the presence (true) or absence (false) of the k-mer in a BFT.
void iterate_over_kmers ( BFT bft,
BFT_func_ptr  f,
  ... 
)

Function iterating over the k-mers of a BFT.

Parameters
bftis a BFT containing the k-mers to iterate over.
fis a pointer on function that will be called on each k-mer. If f returns 0, the calling function returns.
...are the additional arguments that must be transmitted to f. They can be extracted in f via its parameter of type va_list.
BFT* load_BFT ( char *  filename_and_path)

Function loading a BFT from disk.

Parameters
filename_and_pathis the path and name of the file in which the BFT to load is be written.
bool prefix_matching ( BFT bft,
char *  prefix,
BFT_func_ptr  f,
  ... 
)

Function for prefix matching over the k-mers of a BFT.

Parameters
bftis a BFT containing the k-mers to match.
prefixis string containing a prefix the k-mers must match.
fis a pointer on function that will be called on each k-mer matching the prefix. If f returns 0, the calling function returns.
...are the additional arguments that must be transmitted to f. They can be extracted in f via its parameter of type va_list.
Returns
A boolean indicating whether at least one k-mer matched the prefix (true) or not (false).
bool presence_genome ( uint32_t  id_genome,
BFT_annotation bft_annot,
BFT bft 
)

Function testing if a k-mer occured in a genome.

Parameters
id_genomeis the genome identifier.
bft_annotis the annotation of the k-mer to test the presence in genome.
bftis a BFT in which the k-mer is is stored.
Returns
a boolean indicating the presence (true) or absence (false) of the k-mer in a the genome.
uint32_t* query_sequence ( BFT bft,
char *  sequence,
double  threshold,
bool  canonical_search 
)

Function querying a BFT for a sequence.

Parameters
bftis a BFT to be queried.
sequenceis a string to query.
thresholdis a float (0 < threshold <= 1) indicating the minimum percentage of k-mers from the queried sequence that must be present in a genome to have the queried sequence reported present in this genome.
canonical_searchis a boolean indicating if the searched k-mers of the queried sequence must be canonical (lexicographically smaller one between a k-mer and its reverse-complement) or not.
Returns
a pointer to a sorted array of genome identifiers in which the queried sequence occurs (according to parameter threshold) or NULL if the queried sequence is not present in at least one genome (according to parameter threshold). The first element of the array (position 0) indicates how many ids are in this array.
void set_flag_kmer ( uint8_t  flag,
BFT_kmer bft_kmer,
BFT bft 
)

Function marking a k-mer of a BFT with a flag.

Parameters
flagis the mark to add to a k-mer. It can have value 0, 1, 2 or 3.
bft_kmeris a k-mer obtained via search/iteration over a BFT that must be marked.
bftis a BFT locked for vertices marking.
void set_marking ( BFT bft)

Function locking and preparing the graph for vertices marking (no insertion can happen before unlocking).

By default, all k-mers of the graph are initialized with a 0 flag value.

Parameters
bftis a BFT to lock and prepare for vertices marking.
void set_neighbors_traversal ( BFT bft)

Function locking the graph for traversal.

It is not necessary to lock the graph for traversal (no insertion can happen during the locking) but traversing a locked graph is faster than traversing an unlocked graph.

Parameters
bftis a BFT to lock for traversal.
BFT_annotation* sym_difference_annotations ( BFT bft,
uint32_t  nb_annotations,
  ... 
)

Function computing the symmetric difference of a set of annotations.

Parameters
bftis a BFT from which the input annotations are originated.
nb_annotationsindicates how many annotations must be included in the symmetric difference.
...is a list of nb_annotations BFT_annotation pointers of which the symmetric difference is computed.
Returns
a BFT_annotation pointer to an annotation which is the symmetric difference of the input annotations.
BFT_annotation* union_annotations ( BFT bft,
uint32_t  nb_annotations,
  ... 
)

Function computing the union of a set of annotations.

Parameters
bftis a BFT from which the input annotations are originated.
nb_annotationsindicates how many annotations must be included in the union.
...is a list of nb_annotations BFT_annotation pointers of which the union is computed.
Returns
a BFT_annotation pointer to an annotation which is the union of the input annotations.
void unset_marking ( BFT bft)

Function unlocking and the graph locked for vertices marking.

Parameters
bftis a BFT locked for vertices marking.
void unset_neighbors_traversal ( BFT bft)

Function unlocking a locked graph for traversal.

Parameters
bftis a locked BFT for traversal that must be unlocked.
void v_iterate_over_kmers ( BFT bft,
BFT_func_ptr  f,
va_list  args 
)

Function iterating over the k-mers of a BFT.

This function should be used only when called from a function with a variable number of arguments. If not, you must use iterate_over_kmers().

Parameters
bftis a BFT containing the k-mers to iterate over.
fis a pointer on function that will be called on each k-mer. If f returns 0, the calling function returns.
argsshould contain all additional arguments to pass to f. They can be extracted in f via its parameter of type va_list.
void write_BFT ( BFT bft,
char *  filename,
bool  compress_annotations 
)

Function writing a BFT to disk.

Parameters
bftis the BFT to write on disk.
filenameis the name of the file in which bft will be written.
compress_annotationsis a boolean indicating if the annotations of the BFT must be compressed before writing to disk.
size_t write_kmer_ascii_to_disk ( BFT_kmer bft_kmer,
BFT bft,
va_list  args 
)

Function writing an ASCII k-mer in a file.

This function is of type BFT_func_ptr and is intended to be a parameter of iterate_over_kmers() or v_iterate_over_kmers().

Parameters
bft_kmeris a k-mer to write to disk.
bftis a BFT from which bft_kmer was extracted.
argsis a variable list of arguments. It contains a pointer to a file where to write bft_kmer.
size_t write_kmer_comp_to_disk ( BFT_kmer bft_kmer,
BFT bft,
va_list  args 
)

Function writing an 2 bits encoded k-mer in a file.

This function is of type BFT_func_ptr and is intended to be a parameter of iterate_over_kmers() or v_iterate_over_kmers().

Parameters
bft_kmeris a k-mer to write to disk.
bftis a BFT from which bft_kmer was extracted.
argsis a variable list of arguments. It contains a pointer to a file where to write bft_kmer.