encoding

Functions to encode TR (Template Repeat) sequences as feature vectors for machine learning models. Uses ViennaRNA to compute RNA secondary structure folding energies. The key feature is ΔE = E(TR+Spacer) - E(TR), which captures whether the spacer interaction disrupts the TR fold — this affects bRT accessibility and predicts TR activity.

source

encode_tr_list


def encode_tr_list(
    list_TRs:list, # A list of DNA sequences (strings) to encode.
    feat:int=1
):

Encodes a list of TR sequences. If feat is 1, uses only the single feature model, if 2 uses the 2 features model.

TR_list=['AACTTGCTAGTAAACACAGTGCAGCTAAAGTGCATTCTAATGGGGTACTAAATGTGTTGTTCCATCTGGG','CCAGAGGTTAGGATTATGAGTGTAGTGCGTATCACGTGGGTTGCAGATTAAACAGGTCGTGCCTCATTTA']
encode_tr_list(TR_list,feat=2)
array([[ -8.17911339, -15.36869621],
       [-11.47970581, -13.57634544]])