predictions

We define all the usefull functions to predict the score from a given sequence

source

score

 score (TR_seq:str, features=1)

Calculates the predicted score of a given TR sequence (1 = perfect TR and 0 = crappy TR). If features=2, returns the score according to each feature (better to have both high).

Type Default Details
TR_seq str A string of the TR DNA sequence
features int 1 The classifier model, no need to specify it (one feature by default). If two: uses the two features model
TR_bad='TTAGCGAATGGCGAAATTCGTAAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGG'
print('TR bad score =',score(TR_bad))
TR_good='AAATGATCGCCAAATCTGAACAGGAAATTGGCAAAGCAACCGCTAAATACTTTTTCTACTCAAACATTAT'
print('TR good score =',score(TR_good))
TR bad score = 0.23
TR good score = 0.84

source

score_list

 score_list (TR_seq_list:list, TR_name_list:list, features=1)

Calculates the score for every TR in the list and returns them in a dataframe format. If features=2, returns the score according to each feature (better to have both high).

Type Default Details
TR_seq_list list A list of strings of TRs DNA sequences
TR_name_list list A list of strings of TRs names
features int 1 The number of features to use
TR_bad=[
     'TTAGCGAATGGCGAAATTCGTAAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGG',
     'AAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGGACAAAGGTCGTGATTTCGCTA',
    'GGTTTCTCTAAGGAGTCCATTCTGCCGAAGCGCAACTCCGACAAGCTGATCGCGCGTAAGAAGGACTGGG',
     'CAAGCTGATCGCGCGTAAGAAGGACTGGGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCG',
     'ACCCGATTGACTTCCTCGAGGCGAAGGGGTACAAGGAGGTGAAGAAGGATCTGATTATCAAGCTGCCGAA',
     'AGTACTCCCTGTTCGAGCTGGAGAATGGTCGTAAGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGG',
     'CAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGTTTTCTAAGCGCGTGATTCTGGCGG',
     'ACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGATAAGCCGATCCGTGAGCAGGCGGA',   
 ]

score_list(TR_bad,['TR_bad_'+str(k) for k in range (1,9)])
TR_Name TR_Seq TR_Score
0 TR_bad_1 TTAGCGAATGGCGAAATTCGTAAACGCCCTCTGATCGAAACCAACG... 0.23
1 TR_bad_2 AAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGT... 0.05
2 TR_bad_3 GGTTTCTCTAAGGAGTCCATTCTGCCGAAGCGCAACTCCGACAAGC... 0.00
3 TR_bad_4 CAAGCTGATCGCGCGTAAGAAGGACTGGGATCCGAAGAAGTACGGT... 0.00
4 TR_bad_5 ACCCGATTGACTTCCTCGAGGCGAAGGGGTACAAGGAGGTGAAGAA... 0.01
5 TR_bad_6 AGTACTCCCTGTTCGAGCTGGAGAATGGTCGTAAGCGTATGCTGGC... 0.08
6 TR_bad_7 CAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGT... 0.06
7 TR_bad_8 ACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGA... 0.12
TR_good=[
     'AAATGATCGCCAAATCTGAACAGGAAATTGGCAAAGCAACCGCTAAATACTTTTTCTACTCAAACATTAT',
     'TCAAACATTATGAATTTCTTCAAAACCGAAATCACCTTAGCGAATGGCGAAATTCGTAAACGCCCTCTGA',
     'ATGCCTCAAGTAAACATCGTTAAAAAGACTGAGGTGCAGACTGGCGGTTTCTCTAAGGAGTCCATTCTGC',
     'GGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCGTACTCTGTTCTGGTGGTCGCCAAGGTC',
     'AGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAA',
     'GCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAACTTCCTGTACCTGGCCTCGCACTACGAG',
     'CAGAAGCAGCTGTTCGTGGAGCAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGTTTT',
     'CTAAGCGCGTGATTCTGGCGGACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGATAA'
     ]

score_list(TR_good,['TR_good_'+str(k) for k in range (1,9)])
TR_Name TR_Seq TR_Score
0 TR_good_1 AAATGATCGCCAAATCTGAACAGGAAATTGGCAAAGCAACCGCTAA... 0.84
1 TR_good_2 TCAAACATTATGAATTTCTTCAAAACCGAAATCACCTTAGCGAATG... 0.82
2 TR_good_3 ATGCCTCAAGTAAACATCGTTAAAAAGACTGAGGTGCAGACTGGCG... 0.76
3 TR_good_4 GGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCGTAC... 0.74
4 TR_good_5 AGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGGGAACGAGTT... 0.83
5 TR_good_6 GCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAACTTC... 0.55
6 TR_good_7 CAGAAGCAGCTGTTCGTGGAGCAGCACAAGCACTACCTGGACGAGA... 0.81
7 TR_good_8 CTAAGCGCGTGATTCTGGCGGACGCGAATCTGGATAAGGTCCTGTC... 0.81
TR_bad=[
     'TTAGCGAATGGCGAAATTCGTAAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGG',
     'AAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGGACAAAGGTCGTGATTTCGCTA',
    'GGTTTCTCTAAGGAGTCCATTCTGCCGAAGCGCAACTCCGACAAGCTGATCGCGCGTAAGAAGGACTGGG',
     'CAAGCTGATCGCGCGTAAGAAGGACTGGGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCG',
     'ACCCGATTGACTTCCTCGAGGCGAAGGGGTACAAGGAGGTGAAGAAGGATCTGATTATCAAGCTGCCGAA',
     'AGTACTCCCTGTTCGAGCTGGAGAATGGTCGTAAGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGG',
     'CAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGTTTTCTAAGCGCGTGATTCTGGCGG',
     'ACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGATAAGCCGATCCGTGAGCAGGCGGA',   
 ]

score_list(TR_bad,['TR_bad_'+str(k) for k in range (1,9)],2)
TR_Name TR_Seq TR_Score_Sp TR_Score_Avd
0 TR_bad_1 TTAGCGAATGGCGAAATTCGTAAACGCCCTCTGATCGAAACCAACG... 0.23 0.63
1 TR_bad_2 AAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGT... 0.05 0.42
2 TR_bad_3 GGTTTCTCTAAGGAGTCCATTCTGCCGAAGCGCAACTCCGACAAGC... 0.00 0.30
3 TR_bad_4 CAAGCTGATCGCGCGTAAGAAGGACTGGGATCCGAAGAAGTACGGT... 0.00 0.51
4 TR_bad_5 ACCCGATTGACTTCCTCGAGGCGAAGGGGTACAAGGAGGTGAAGAA... 0.01 0.59
5 TR_bad_6 AGTACTCCCTGTTCGAGCTGGAGAATGGTCGTAAGCGTATGCTGGC... 0.08 0.54
6 TR_bad_7 CAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGT... 0.06 0.29
7 TR_bad_8 ACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGA... 0.12 0.04
TR_good=[
     'AAATGATCGCCAAATCTGAACAGGAAATTGGCAAAGCAACCGCTAAATACTTTTTCTACTCAAACATTAT',
     'TCAAACATTATGAATTTCTTCAAAACCGAAATCACCTTAGCGAATGGCGAAATTCGTAAACGCCCTCTGA',
     'ATGCCTCAAGTAAACATCGTTAAAAAGACTGAGGTGCAGACTGGCGGTTTCTCTAAGGAGTCCATTCTGC',
     'GGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCGTACTCTGTTCTGGTGGTCGCCAAGGTC',
     'AGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAA',
     'GCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAACTTCCTGTACCTGGCCTCGCACTACGAG',
     'CAGAAGCAGCTGTTCGTGGAGCAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGTTTT',
     'CTAAGCGCGTGATTCTGGCGGACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGATAA'
     ]

score_list(TR_good,['TR_good_'+str(k) for k in range (1,9)],2)
TR_Name TR_Seq TR_Score_Sp TR_Score_Avd
0 TR_good_1 AAATGATCGCCAAATCTGAACAGGAAATTGGCAAAGCAACCGCTAA... 0.84 0.80
1 TR_good_2 TCAAACATTATGAATTTCTTCAAAACCGAAATCACCTTAGCGAATG... 0.82 0.78
2 TR_good_3 ATGCCTCAAGTAAACATCGTTAAAAAGACTGAGGTGCAGACTGGCG... 0.76 0.84
3 TR_good_4 GGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCGTAC... 0.74 0.75
4 TR_good_5 AGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGGGAACGAGTT... 0.83 0.88
5 TR_good_6 GCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAACTTC... 0.55 0.58
6 TR_good_7 CAGAAGCAGCTGTTCGTGGAGCAGCACAAGCACTACCTGGACGAGA... 0.81 0.34
7 TR_good_8 CTAAGCGCGTGATTCTGGCGGACGCGAATCTGGATAAGGTCCTGTC... 0.81 0.82

source

DGR_percentage

 DGR_percentage (TR_seq:str)

Calculates the predicted DGR mutagenesis percentage of a given TR sequence (100 = perfect TR and 0 = crappy TR)

Type Details
TR_seq str A string of the TR DNA sequence

source

DGR_percentage_list

 DGR_percentage_list (TR_seq_list:list, TR_name_list:list)

Calculates the predicted DGR mutagenesis percentage for every TR in the list and returns them in a dataframe format

Type Details
TR_seq_list list A list of strings of TRs DNA sequences
TR_name_list list A list of strings of TRs names
TR_bad=[
     'TTAGCGAATGGCGAAATTCGTAAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGG',
     'AAACGCCCTCTGATCGAAACCAACGGCGAAACGGGTGAGATCGTGTGGGACAAAGGTCGTGATTTCGCTA',
    'GGTTTCTCTAAGGAGTCCATTCTGCCGAAGCGCAACTCCGACAAGCTGATCGCGCGTAAGAAGGACTGGG',
     'CAAGCTGATCGCGCGTAAGAAGGACTGGGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCG',
     'ACCCGATTGACTTCCTCGAGGCGAAGGGGTACAAGGAGGTGAAGAAGGATCTGATTATCAAGCTGCCGAA',
     'AGTACTCCCTGTTCGAGCTGGAGAATGGTCGTAAGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGG',
     'CAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGTTTTCTAAGCGCGTGATTCTGGCGG',
     'ACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGATAAGCCGATCCGTGAGCAGGCGGA',   
 ]

DGR_percentage_list(TR_bad,['TR_bad_'+str(k) for k in range (1,9)])
TR_Name TR_Seq
0 TR_bad_1 0.427049
1 TR_bad_2 0.110769
2 TR_bad_3 0.019986
3 TR_bad_4 0.025612
4 TR_bad_5 0.045752
5 TR_bad_6 0.172833
6 TR_bad_7 0.092111
7 TR_bad_8 0.039965
TR_good=[
     'AAATGATCGCCAAATCTGAACAGGAAATTGGCAAAGCAACCGCTAAATACTTTTTCTACTCAAACATTAT',
     'TCAAACATTATGAATTTCTTCAAAACCGAAATCACCTTAGCGAATGGCGAAATTCGTAAACGCCCTCTGA',
     'ATGCCTCAAGTAAACATCGTTAAAAAGACTGAGGTGCAGACTGGCGGTTTCTCTAAGGAGTCCATTCTGC',
     'GGATCCGAAGAAGTACGGTGGCTTCGATTCTCCGACCGTGGCGTACTCTGTTCTGGTGGTCGCCAAGGTC',
     'AGCGTATGCTGGCGTCTGCGGGTGAGCTGCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAA',
     'GCAGAAGGGGAACGAGTTGGCCCTTCCGTCCAAGTACGTGAACTTCCTGTACCTGGCCTCGCACTACGAG',
     'CAGAAGCAGCTGTTCGTGGAGCAGCACAAGCACTACCTGGACGAGATTATTGAGCAGATTTCTGAGTTTT',
     'CTAAGCGCGTGATTCTGGCGGACGCGAATCTGGATAAGGTCCTGTCTGCCTACAATAAGCACCGTGATAA'
     ]

DGR_percentage_list(TR_good,['TR_good_'+str(k) for k in range (1,9)])
TR_Name TR_Seq
0 TR_good_1 3.067841
1 TR_good_2 2.721600
2 TR_good_3 2.808427
3 TR_good_4 1.919474
4 TR_good_5 4.142115
5 TR_good_6 0.811313
6 TR_good_7 0.940305
7 TR_good_8 3.040041