libPoMo.vcf¶
This module provides functions to read, write and access vcf files.
Objects¶
- Classes:
- Exception Classes:
- Functions:
update_base()
, read a line into a baseget_nuc_base_from_line()
, create a new NucBase from a linecheck_fixed_field_header()
, check a VCF fixed field header stringget_indiv_from_field_header()
, extract list of individuals from headerinit_seq()
, open VCF file and initialize VCFStreamopen_seq()
, open VCF file and save it to a VCFSeqget_header_line_string()
, print vcf header line
-
exception
libPoMo.vcf.
NotANucBaseError
[source]¶ Exception raised if given nucleotide base is not valid.
-
exception
libPoMo.vcf.
NotAVariantCallFormatFileError
[source]¶ Exception raised if given VCF file is not valid.
-
class
libPoMo.vcf.
NucBase
[source]¶ Stores a nucleotide base.
FIXME: Bases are split by ‘/’. They should also be split by ‘|’.
A class that stores a single nucleotide base and related information retrieved from a VCF file. Please see http://www.1000genomes.org/ for a detailed description of the vcf format.
Variables: - chrom (str) – Chromosome name.
- pos (int) – 1-based position on the chromosome.
- id (str) – ID.
- ref (str) – Reference base.
- alt (str) – Alternative base(s).
- qual (str) – Quality.
- filter (str) – Filter.
- info (str) – Additional information.
- format (str) – String with format specification.
- speciesData ([str]) – List with strings of the species data (e.g. 0/1:...).
- ploidy (int) – Ploidy (number of sets of chromosomes) of the
sequenced individuals. Can be set with
set_ploidy()
.
-
get_base_ind
(iI, iC)[source]¶ Return the base of a specific individual.
Parameters: - indiv (int) – 0-based index of individual.
- chrom (int) – 0-based index of chromosome (for n-ploid individuals).
Return type: character with nucleotide base.
-
get_speciesData
()[source]¶ Return species data as a list.
data[0][0] = data of first species/individual on chromatide A
- data[0][1] = only set for non-haploids; data of first
species/individual on chromatide B
Sets data[i][j] to None if the base of individual i on chromosome j could not be read (e.g. it is not valid).
Return type: matrix of integers
-
class
libPoMo.vcf.
VCFSeq
[source]¶ Store data retrieved from a VCF file.
Initialized with
open_seq()
.Variables: -
get_header_line_string
(indiv)[source]¶ Return a standard VCF File header string with individuals indiv.
-
-
class
libPoMo.vcf.
VCFStream
(seqName, vcfFileObject, speciesList, firstBase)[source]¶ Store base data from a VCF file line per line.
It can be initialized with
init_seq()
. This class stores a single base retrieved from a VCF file and the file itself. It is used to parse through a VCF file line by line processing the bases without having to read the whole file at one.Parameters: Variables: - name (str) – Name of the stream.
- fo (fo) – Stored VCF file object.
- speciesL ([str]) – List with species / individuals.
- nSpecies (int) – Number of species / individuals.
- base (NusBase) – Stored
NucBase
.
-
libPoMo.vcf.
check_fixed_field_header
(ln)[source]¶ Check if the given line ln is the header of the fixed fields.
Sample header line:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SpeciesL
-
libPoMo.vcf.
get_header_line_string
(indiv)[source]¶ Return a standard VCF File header string with individuals indiv.
-
libPoMo.vcf.
get_indiv_from_field_header
(ln)[source]¶ Return species from a fixed field header line ln.
Sample header line:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SpeciesL
-
libPoMo.vcf.
get_nuc_base_from_line
(ln, info=False, ploidy=None)[source]¶ Retrieve base data from a VCF file line ln.
Split a given VCF file line and returns a NucBase object. If info is set to False, only #CHROM, POS, REF, ALT and speciesData will be read.
Parameters: - info (Bool) – Determines if info is retrieved from ln.
- ploidy (int) – If ploidy is known and given, it is set.
-
libPoMo.vcf.
init_seq
(VCFFileName, maxskip=100, name=None)[source]¶ Open a (gzipped) VCF4.1 file.
Try to open the given VCF file, checks if it is in VCF format. Initialize a
VCFStream
object that contains the first base.Please close the associated file object with
VCFStream.close()
when you don’t need it anymore.Parameters: - VCFFileName (str) – Name of the VCF file.
- maxskip (int) – Only look maxskip lines for the start of the bases (defaults to 80).
- name (str) – Set the name of the sequence to name, otherwise set it to the filename.
-
libPoMo.vcf.
open_seq
(VCFFileName, maxskip=100, name=None)[source]¶ Open a VCF4.1 file.
Try to open the given VCF file, checks if it is in VCF format and reads the bases(s). It returns an
VCFSeq
object that contains all the information.Parameters: - VCFFileName (str) – Name of the VCF file.
- maxskip (int) – Only look maxskip lines for the start of the bases (defaults to 80).
- name (str) – Set the name of the sequence to name, otherwise set it to the filename.