from import get_example_data_dir
import os
Contains the GenomeBrowser and GenomeStack classes
GenomeBrowser (gff_path:str=None, fasta_path:str=None, gb_path:str=None, seq_id:str=None, init_pos:int=None, init_win:int=10000, bounds:tuple=None, max_interval:int=100000, show_seq:bool=True, search:bool=True, attributes:Union[list,Dict[str,Optional[list]]]=None, feature_name:Union[str,Dict[str,str],NoneType]=None, feature_types:list=None, glyphs:dict=None, height:int=150, width:int=600, label_angle:int=45, label_font_size:str='10pt', label_justify:str='center', label_vertical_offset:float=0.03, label_horizontal_offset:float=-5, show_labels:bool=True, feature_height:float=0.15, features:pandas.core.frame.DataFrame=None, seq:Bio.Seq.Seq=None, color_attribute:str=None, z_stack:bool=False, **kwargs)
Initialize a GenomeBrowser object.
Type | Default | Details | |
gff_path | str | None | path to the gff3 file of the annotations (also accepts gzip files) |
fasta_path | str | None | path to the fasta file of the genome sequence |
gb_path | str | None | path to a genbank file |
seq_id | str | None | id of the sequence to load, for genomes with multiple contigs, defaults to the first sequence in the genbank or gff file. |
init_pos | int | None | initial position to display |
init_win | int | 10000 | initial window size (max=20000) |
bounds | tuple | None | bounds can be specified. This helps preserve memory by not loading the whole genome if not needed. |
max_interval | int | 100000 | maximum size of the field of view in bp |
show_seq | bool | True | creates a html div that shows the sequence when zooming in |
search | bool | True | enables a search bar |
attributes | Union | None | list of attribute names from the GFF attributes column to be extracted. If dict then keys are feature types and values are lists of attributes. If None, then all attributes will be used. |
feature_name | Union | None | attribute to be displayed as the feature name. If str then use the same field for every feature type. If dict then keys are feature types and values are feature name attribute. |
feature_types | list | None | list of feature types to display |
glyphs | dict | None | dictionary defining the type and color of glyphs to display for each feature type |
height | int | 150 | height of the annotation track |
width | int | 600 | width of the inner frame of the browser |
label_angle | int | 45 | angle of the feature names displayed on top of the features |
label_font_size | str | 10pt | font size fo the feature names |
label_justify | str | center | center, left |
label_vertical_offset | float | 0.03 | how far above a feature to draw the label |
label_horizontal_offset | float | -5 | how far to shift the feature label on the x-axis |
show_labels | bool | True | if False, then don’t show feature labels |
feature_height | float | 0.15 | fraction of the annotation track height occupied by the features |
features | DataFrame | None | DataFrame with columns: [“seq_id”, “source”, “type”, “start”, “end”, “score”, “strand”, “phase”, “attributes”], where “attributes” is a dict of attributes. |
seq | Seq | None | keeps the Biopython sequence object |
color_attribute | str | None | feature attribute to be used as patch color |
z_stack | bool | False | if true features that overlap will be stacked on top of each other |
kwargs |
Additional keyword arguments are passed as is to bokeh.plotting.figure
Upon initialization a GenomBrowser object parses the data and creates a the GenomeBrowser.patches pandas DataFrame that contains the data to be plotted.
= get_example_data_dir()
data_path = os.path.join(data_path, "MG1655_U00096.fasta")
fasta_path = os.path.join(data_path, "MG1655_U00096.gff3")
=GenomeBrowser(gff_path=gff_path, fasta_path=fasta_path, bounds=(0,50000))
gprint(g.seq_id, g.seq[:10])
names | xs | ys | xbox_min | color | alpha | pos | attributes | type | label_y | label_x | |
0 | thrL | (190, 190, 190, 255, 190) | (0.05, 0.2, 0.2, 0.125, 0.05) | 190 | purple | 0.8 | 222.5 | <span style="color:FireBrick">CDS</span><br><s... | CDS | 0.23 | 222.5 |
1 | thrA | (337, 337, 2699, 2799, 2699) | (0.05, 0.2, 0.2, 0.125, 0.05) | 337 | purple | 0.8 | 1568.0 | <span style="color:FireBrick">CDS</span><br><s... | CDS | 0.23 | 1568.0 |
2 | thrB | (2801, 2801, 3633, 3733, 3633) | (0.05, 0.2, 0.2, 0.125, 0.05) | 2801 | purple | 0.8 | 3267.0 | <span style="color:FireBrick">CDS</span><br><s... | CDS | 0.23 | 3267.0 |
3 | thrC | (3734, 3734, 4920, 5020, 4920) | (0.05, 0.2, 0.2, 0.125, 0.05) | 3734 | purple | 0.8 | 4377.0 | <span style="color:FireBrick">CDS</span><br><s... | CDS | 0.23 | 4377.0 |
4 | yaaX | (5234, 5234, 5430, 5530, 5430) | (0.05, 0.2, 0.2, 0.125, 0.05) | 5234 | purple | 0.8 | 5382.0 | <span style="color:FireBrick">CDS</span><br><s... | CDS | 0.23 | 5382.0 | ()
Shows the plot in an interactive Jupyter notebook
#GFF + FASTA input
=GenomeBrowser(fasta_path=fasta_path, gff_path=gff_path, bounds=(0,50000),width=600)
#GenBank input
=os.path.join(data_path, "")
#Providing GFF file as the only input