LSTM

A module to generate and estimate likelihoods of TR to VR generation using a Long-Short Term Memory (LSTM) model.

source

sequences_same_length

 sequences_same_length (sequences)

*Check if all sequences in a list have the same length.

Parameters: - sequences: list of sequences (e.g., strings, lists, or arrays)

Returns: - bool: True if all sequences have the same length, False otherwise.*


source

generate_sequence_from_onehot

 generate_sequence_from_onehot (X, firstmodel, secondmodel)

*Generate a sequence iteratively using one cell of the second LSTM.

Parameters: - X: np.array, Input sequences (one-hot encoded, reversed). - model: Trained Keras model for generation. - vocab_size: int, Size of the vocabulary (e.g., 4 for A, T, C, G).

Returns: - generated_sequence: np.array, Generated sequence (one-hot encoded).*


source

separate_model

 separate_model (model)

source

one_hot_decode

 one_hot_decode (encoded_sequence)

source

one_hot_encode

 one_hot_encode (sequence, vocab_size=4)

source

compute_likelihood_matrix

 compute_likelihood_matrix (TR_list, VR_list, batch_size=64)

Compute log-likelihood matrix of generating each VR from each TR using batching.


source

compute_likelihood_batch

 compute_likelihood_batch (TR_batch, VR_batch)

*Compute log-likelihoods of generating each VR[i] from TR[i] (batched).

TR_batch, VR_batch: list of same-length strings Returns: list of log-likelihoods*


source

compute_likelihood

 compute_likelihood (TR, VR)

*Compute log-likelihood of VR from TR.

TR, VR: same-length strings Returns: list of log-likelihoods*


source

to_tensor_inputs

 to_tensor_inputs (*args)

source

generate_sequences_oneTR

 generate_sequences_oneTR (TR, n=1000)

Generate list of VR from one TR (one TR-> n VR). Parameters: - TR: one TR sequence (e.g., strings, lists, or arrays) -n: integer corresponding to the number of VR to generate Returns: - list: list of n VR sequence strings given the one TR sequence.


source

generate_sequences

 generate_sequences (X_seq)

Generate list of VR from list of TR (one TR-> one VR). Parameters: - X_seq: list of TR sequences (e.g., strings, lists, or arrays) Returns: - list: list of VR sequence strings given TR sequences (corresponding to the initial TR list).

Details
X_seq X_seq is a list of sequences ATCG sequences (faster if same length)
seq=''.join(np.random.choice(["A","T","C","G"],size=100,replace=True))
list_vr=generate_sequences_oneTR(seq,5)
list_vr
1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 663ms/step
2025-07-01 12:23:38.488865: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
Generating sequence: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 163.41it/s]
['CGCTTCCATTTATTCCCAGCTCGAAAGCTCCTCGCCCACTATTTCCAGCGTCACAATTGGTGAGGGCATGCTTCTGCCACGTATAACGAAAATGTACGAA',
 'CGCTTCCATTTATTCCCAGCTCGAAAACACCTCGCCCACTATCTCCAGCGTCACAGTTGGTGAGGACATGCTTCTGCCACGTATAACGAAAATGTACGAA',
 'CGCTTCCATTTATTCCCAGCTCGAAAGCCCCTCGACCACTATATCCAGCGTCACAATTGGTGAAGACATGCTTCTGCCACGTATAACGAAAATGTACGAA',
 'CGCTTCCATTTATTCCCAGCTCGAAAACCCCTCGCCCACTATATCCAGCGTCACAATTGGTGAAGACATGCTTCTGCCACGTATAACGAAAATGTACGAA',
 'CGCTTCCATTTATTCCCAGCTCGATTACACCTCGACCACTATATCCAGCGTCACAATTGGTGAAGGCATGCTTCTGCCACGTATAACGAAAATGTACGAA']
seq_2=seq[:]
seq_2=seq[:10]+"CCC"+seq[13:]
ll=compute_likelihood_matrix([seq,seq_2],list_vr)
for k in range(2):
    print(f'log likelihoods of VRs with TR {k}:')
    print(ll[k])
log likelihoods of VRs with TR 0:
[-15.088246688449884, -13.283765104251316, -7.054487097791395, -6.822223379890877, -10.39629542414491]
log likelihoods of VRs with TR 1:
[-38.96067896598159, -37.36270110571473, -31.011657058638708, -30.822617378527305, -34.504065737226]