Detect CRISPR Cas9 editing effect
Coniguration
gRNA_region_coordinates_ori.txt
Column 1: chromosome (same as in bam file) Column 2: start (1-based) Column 3: end (1-based) Column 4: Name (unique) eg. Region2, Region3 … Column 5: +/- strand sgRNA targets to which strand Column 6: designed protospacer sequence. By default, PAM is also given following protospacer sequence. Column 7: target gene (gene symbol)
By default, cut site is defined as 3nt to 4nt upstream (5’) of the Protospacer Adjacent Motif (PAM).
Requiremenets
2bit genome downloaded from UCSC
umi_tools
Simplified process
The script first generates coordinates based on input window size, and reference sequence and later they are used by Samtools to split alignment file (bam file) into bam files which contain reads that aligned to window region. FeatureCount from UMI-tools is employed to filter out reads that not align properly to this target gene. Consensus sequences are generated by union reads within detection window with the same UMI and cell barcode. Reference sequence for each region is compared with the consensus sequence to identify mutations. Deletions that pass cutsite are defined as editing effect.
Output files
See section