seqA = "ATCCCGGCAGC"
seqB = "ATCCACGGTCAGC"
align=pairwise2.align.globalms(seqA,seqB, 2, -1, -1, -.5, one_alignment_only=True)[0]
align2mut(align)[('-', 4, 'A'), ('-', 8, 'T')]
is_dgrec() function tests for the adenine bias signature characteristic of DGR mutagenesis (≥70% A mutations, ≥2 mutations).
Functions to align sequences and extract mutations.
Converts a sequence alignment result from Bio.pairwise2.Align.globalms into a list of mutations. Positions are those of the alignment.
[('-', 4, 'A'), ('-', 8, 'T')]
Reindexes the positions of the mutations to go from their position in the sequence alignment to their position in the original sequence.
ATCC-CGG-CAGC
|||| ||| ||||
ATCCACGGTCAGC
Score=20
Output of align2mut:
[('-', 4, 'A'), ('-', 8, 'T')]
Output of mut_rix:
[('-', 4, 'A'), ('-', 7, 'T')]
Aligns two sequences and returns a list of mutations. Each mutation is a tuple of (reference_base, position, query_base).
ATC--CGG-CAGCAGGTCGTGAGC
||| ||| |||||.|||||| ||
ATCCACGGTCAGCACGTCGTG-GC
Score=33.5
[('-', 3, 'C'), ('-', 3, 'A'), ('-', 6, 'T'), ('G', 11, 'C'), ('A', 18, '-')]
Compares two equal-length sequences and returns a list of mutations without alignment. Each mutation is a tuple of (reference_base, position_str, query_base).
Convert between mutation lists and genotype string format (e.g., A50T,G75C).
Converts list of mutations to a comma separated string
'-3C,-3A,-6T,G11C,A18-'
Converts genotype string to a list of mutations
[['-', 4, 'A'], ['-', 7, 'T'], ['G', 12, 'C']]
Checks if a genotype string is a valid dgrec genotype.
Reconstruct full sequences from reference and genotype strings.
Reconstructs a mutated sequence by applying mutations from a genotype string to a reference sequence. Handles substitutions, insertions, and deletions.
Returns the reverse complement of a DNA sequence.
Translate DNA-level mutations to amino acid changes.
Translates DNA-level mutations into protein-level mutations. Applies the genotype to the reference, translates both in the given reading frame and orientation, and returns a genotype string of amino acid changes.
T18A,C20A,T21A,G22C,C41T
'T6K,A7P,A13V'
'D19E,D25E,K26Q,G27R,H28P,K29Q,Y30I,V32S,F33I,E34*,A35S,N36K,T37H,G38W,T39N,E40*,D41R,G42W,Y43L,Q44P,G45R'
Read and write genotype data and FASTQ files.
Reads a tab-separated genotypes file and returns a list of (genotype_string, count) tuples.
20 A72G,A79G
19 A72G,A79T,A91G
17 T67G,A91G
17 A76G,A79T
17 A68C,A72G
17 A111G
16 A68G,A91G
16 A86G,A91T
15 A72G,A91T
15 A79G,A86G
Converts a DNA genotype list to an amino acid genotype list. Excludes genotypes with indels or Ns. Returns a sorted list of (aa_genotype_string, count) tuples.
[('', 43341),
('Y22H', 351),
('H15Q', 277),
('D19E', 246),
('L17P', 200),
('V23A', 162),
('S11P', 117),
('D25E', 113),
('D19E,Y22H', 75),
('T16P', 61)]
Downsamples a compressed FASTQ file to the specified number of reads.
Args: input_file (str): Path to the input FASTQ.gz file. output_file (str): Path to the output FASTQ.gz file. num_reads (int, optional): Number of reads to keep. Defaults to 10000.
Extracts the basename of a file without the extension.
Args: file_path (str): The path to the file.
Returns: str: The basename of the file without the extension.
my_file
Saves data to a file using pickle.
Loads data from a pickle file.
{'key': [1, 2, 3], 'value': 'test'}
Split the TR target into the input number and then generates the oligos to order
['ATAACCTCAGATACAAGCCGGCATAAATAATAACA',
'AATATGTTATTATTTATGCCGGCTTGTATCTGAGG',
'TATTCTATGACCATGATAATAGTGTAGGTGCAA',
'GCGTTTGCACCTACACTATTATCATGGTCATAG',
'ACGCCAACGCTAAAAACACTGGAACCATGAACG',
'TTACCGTTCATGGTTCCAGTGTTTTTAGCGTTG',
'GTAATACTGCAGGGACGAATATAGCCAAAACTTCT',
'CAGAAGAAGTTTTGGCTATATTCGTCCCTGCAGTA']
Filter mutations by position.
Converts a genotype list to its reverse complement. Mirrors mutation positions and complements bases relative to the reference sequence length.
Removes mutations at the specified positions from a genotype string.
Removes mutations at specified positions from all genotypes in a list.
ref_genome='AACGTATACGGCGGAATATTTGCCGAATGCCGTGTGGACGTAAGCGTGAACGTCAGGATCACGTTTCCCCGACCCGCTGGCATGTCAACAATACGGGAGAACACCTGTACCGCCTCGTTCGCCGCGC'
geno_list_test=[('T19C', 176012),
('T19C,T64A,T65A', 169),
('T19C,G36T', 40),
('T19C,T58G,T63A,T64G', 4),
('T19C,A42C', 14),
('T19C,T63G', 13),
('T19C,T52A,T64A,T65A', 19),
('T19C,A41C,A57C,T58C', 1),
('T19C,T52A', 94),
('T19C,T64A,T65G', 214),
('T19C,T63A,T64C,T65A', 2),
('T19C,A49C,T64C,T91C', 2),
('T19C,T32C,T52C,T64A,T65G,T84C,T91A', 1),
('T19C,T82C,T84A', 8),
('T19C,T64C', 308),
('T19C,T40C,G43A,T58A,A71C,T91C', 1),
('T52A,T77C', 1),
('T19C,T52C,T58A', 9),
('T20C,-52C,T58C,A71C,T77C', 1),
('T19C,G70T', 110),
('T19C,T32C,T65C,T82C', 1),
('T19C,T32C,T46C', 4),
('T19C,T32C,A41C,A49C,T64A,T65G', 1),
('T19C,T64A', 115),
('T19C,T58C', 68)]
geno_list_test=remove_position_list(geno_list_test,[19])
reverse_comp_geno_list(geno_list_test,ref_genome)[('', 176012),
('A61T,A62T', 169),
('C90A', 40),
('A62C,A63T,A68C', 4),
('T84G', 14),
('A63C', 13),
('A61T,A62T,A74T', 19),
('A68G,T69G,T85G', 1),
('A74T', 94),
('A61C,A62T', 214),
('A61T,A62G,A63T', 2),
('A35G,A62G,T77G', 2),
('A35T,A42G,A61C,A62T,A74G,A94G', 1),
('A42T,A44G', 8),
('A62G', 308),
('A35G,T55G,A68T,C83T,A86G', 1),
('A49G,A74T', 1),
('A68T,A74G', 9),
('A49G,T55G,A68G,-74G,A106G', 1),
('C56A', 110),
('A44G,A61G,A94G', 1),
('A80G,A94G', 4),
('A61C,A62T,T77G,T85G,A94G', 1),
('A62T', 115),
('A68G', 68)]
Convert between nucleotide sequences, codons, and amino acids.
Converts a nucleotide sequence into a sequence of amino acids.
Converts a list of codons into a sequence of amino acids.
Converts a nucleotide sequence into a list of codons.