.. _visualisation: Visualisation and Analysis ========================== ORILINX produces bedGraph output that can be visualised in genome browsers for intuitive exploration of predicted replication origins. This section covers various tools and methods for visualising results. Overview -------- bedGraph files from ORILINX are compatible with most genome browsers including: - **UCSC Genome Browser** - Web-based, no installation needed - **IGV (Integrative Genomics Viewer)** - Desktop application with advanced features - **JBrowse** - Lightweight web browser for genomic data - **Gviz (R)** or **pyBigWig (Python)** - Programmatic visualisation The basic workflow is: 1. Run ORILINX to generate bedGraph files 2. Load them into a genome browser 3. Compare with other genomic features (genes, regulatory elements, etc.) 4. Interpret results in biological context UCSC Genome Browser -------------------- The UCSC Genome Browser is a free, web-based tool that requires no installation. **Basic Usage** 1. Generate your ORILINX results: .. code-block:: console orilinx --fasta_path hg38.fa --output_dir results --sequence_names chr8 2. Go to `https://genome.ucsc.edu/ `_ 3. Select your genome (e.g., "Human" and "Dec. 2013 (GRCh38/hg38)") 4. Navigate to your region of interest (e.g., type "chr8:128862888-128870405" in the search box) 5. Click "Add Custom Tracks" and upload your bedGraph file: - Copy the contents of ``results/chr8.bedGraph`` or upload the file directly - Set the display height and colour - Click "Submit" 6. Your ORILINX scores will appear as a histogram track **Customizing Track Appearance** You can customize how your track appears by adding a header to your bedGraph file: .. code-block:: text track name="ORILINX Origins" description="Predicted replication origins" colour=50,50,200 viewLimits=0:1 Then prepend this to your bedGraph file: .. code-block:: bash echo 'track name="ORILINX Origins" description="Predicted replication origins" colour=50,50,200 viewLimits=0:1' > formatted.bedGraph cat results/chr8.bedGraph >> formatted.bedGraph Then upload ``formatted.bedGraph`` to UCSC. **Converting to BigWig for faster loading** For large files, convert bedGraph to BigWig format for faster loading: .. code-block:: bash # Download BigWig tools if needed wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig chmod +x bedGraphToBigWig # Obtain chrom sizes curl https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes > hg38.chrom.sizes # Convert ./bedGraphToBigWig results/chr8.bedGraph hg38.chrom.sizes results/chr8.bw Then upload the ``.bw`` file to UCSC instead of the bedGraph. IGV (Integrative Genomics Viewer) --------------------------------- IGV is a desktop application that offers more control and advanced features than web browsers. **Installation** 1. Download from `http://software.broadinstitute.org/software/igv/ `_ 2. Install for your operating system (Mac, Windows, Linux) 3. Launch IGV **Loading ORILINX Data** 1. Open IGV and select your genome (File → Genomes → Load Genome) 2. Load your bedGraph file: - File → Load from File - Select ``results/chr8.bedGraph`` 3. Navigate to your region of interest using the search box 4. IGV will display your ORILINX scores as a bar graph **Tips for IGV** - **Zoom in/out**: Use the zoom controls or scroll wheel - **Compare tracks**: Load multiple bedGraph files simultaneously to compare regions - **Overlay with annotations**: Load gene annotations, ChIP-seq, or other genomic data for context - **Export images**: Right-click on tracks to save publication-quality figures - **Coverage view**: Change display mode to "expanded" to see individual windows **Coloured tracks** Create a coloured bedGraph based on score thresholds: .. code-block:: bash # High confidence origins (score > 0.7) in red # Medium confidence (0.3-0.7) in yellow # Low confidence (< 0.3) in blue awk 'BEGIN {FS=OFS="\t"} {if ($4 > 0.7) colour="255,0,0"; else if ($4 > 0.3) colour="255,255,0"; else colour="0,0,255"; print $1, $2, $3, $4, colour}' results/chr8.bedGraph > coloured.bedGraph Then load ``coloured.bedGraph`` in IGV. JBrowse ------- JBrowse is a lightweight, embeddable genome browser suitable for web-based visualisation. **Using JBrowse Online** 1. Visit `https://jbrowse.org/jbrowse/ `_ 2. Select your reference genome 3. Add tracks: - Click "Add Track" - Paste the URL to your bedGraph file or upload it directly 4. Navigate to your region to visualize **Self-hosted JBrowse** For more control, you can host JBrowse on your own server: .. code-block:: bash # Install JBrowse (see documentation) wget https://github.com/GMOD/jbrowse/releases/download/1.11.6/JBrowse-1.11.6.zip unzip JBrowse-1.11.6.zip # Configure data directory cd JBrowse-1.11.6 ./bin/prepare-refseqs.pl --fasta hg38.fa # Add ORILINX track ./bin/flatfile-to-json.pl --gff results/chr8.bedGraph --type bedGraph --trackType wig --out data In Python --------- Use Python for programmatic visualisation and analysis of ORILINX results. **Basic plot with Matplotlib** .. code-block:: python import pandas as pd import matplotlib.pyplot as plt # Read CSV output df = pd.read_csv('results/chr8.csv') # Plot probability scores plt.figure(figsize=(14, 4)) plt.plot(df['start'], df['probability'], linewidth=0.5) plt.fill_between(df['start'], df['probability'], alpha=0.3) plt.xlabel('Genomic Position (bp)') plt.ylabel('Origin Probability') plt.title('ORILINX Predictions - Chr8') plt.tight_layout() plt.savefig('origins_plot.png', dpi=300) plt.show() **Finding high-confidence origins** .. code-block:: python import pandas as pd df = pd.read_csv('results/chr8.csv') # Filter for high-confidence origins (>0.7 probability) origins = df[df['probability'] > 0.7] print(f"Found {len(origins)} high-confidence origins") print(origins[['start', 'end', 'probability']]) # Export for further analysis origins.to_csv('high_confidence_origins.csv', index=False) **Interactive visualisation with Plotly** .. code-block:: python import pandas as pd import plotly.graph_objects as go df = pd.read_csv('results/chr8.csv') fig = go.Figure() fig.add_trace(go.Scatter( x=df['start'], y=df['probability'], mode='lines', name='ORILINX Probability', fill='tozeroy' )) # Add threshold lines fig.add_hline(y=0.7, line_dash="dash", line_color="red", annotation_text="High confidence", annotation_position="right") fig.add_hline(y=0.3, line_dash="dash", line_color="orange", annotation_text="Low confidence", annotation_position="right") fig.update_layout( title='ORILINX Predictions with Confidence Thresholds', xaxis_title='Genomic Position (bp)', yaxis_title='Origin Probability', hovermode='x unified' ) fig.show() fig.write_html('origins_interactive.html') **Comparison of multiple regions** .. code-block:: python import pandas as pd import matplotlib.pyplot as plt # Load multiple regions regions = {} for region in ['chr1', 'chr8', 'chrX']: regions[region] = pd.read_csv(f'results/{region}.csv') # Plot comparison fig, axes = plt.subplots(len(regions), 1, figsize=(14, 3*len(regions))) for idx, (region, df) in enumerate(regions.items()): axes[idx].plot(df['start'], df['probability'], linewidth=0.5) axes[idx].fill_between(df['start'], df['probability'], alpha=0.3) axes[idx].set_title(f'{region} ORILINX Predictions') axes[idx].set_ylabel('Probability') if idx == len(regions) - 1: axes[idx].set_xlabel('Genomic Position (bp)') plt.tight_layout() plt.savefig('multiregion_comparison.png', dpi=300) plt.show() In R ---- Use R for statistical analysis and publication-quality figures. **Basic plot with ggplot2** .. code-block:: r library(ggplot2) library(dplyr) # Read CSV output df <- read.csv('results/chr8.csv') # Create plot ggplot(df, aes(x=start, y=probability)) + geom_line(size=0.2) + geom_area(alpha=0.3) + theme_minimal() + labs( title = 'ORILINX Predictions - Chr8', x = 'Genomic Position (bp)', y = 'Origin Probability' ) + theme(text=element_text(size=12)) ggsave('origins_plot.png', width=14, height=4, dpi=300) **Finding and annotating peaks** .. code-block:: r library(dplyr) library(ggplot2) df <- read.csv('results/chr8.csv') # Find peak origins (local maxima) df <- df %>% mutate( is_peak = probability > 0.7, peak_id = cumsum(c(TRUE, diff(is_peak) != 0)) * is_peak ) peaks <- df %>% filter(is_peak) %>% group_by(peak_id) %>% summarise( peak_start = min(start), peak_end = max(end), peak_probability = max(probability), .groups = 'drop' ) print(peaks) write.csv(peaks, 'predicted_origins.csv', row.names=FALSE) **Genome browser-style visualisation with Gviz** .. code-block:: r library(Gviz) library(GenomicRanges) # Read ORILINX data df <- read.csv('results/chr8.csv') # Convert to GRanges gr <- GRanges( seqnames = df$chrom, ranges = IRanges(start = df$start, end = df$end), score = df$probability ) # Create DataTrack dtrack <- DataTrack( range = gr, name = "ORILINX", type = "histogram", col.histogram = "steelblue", fill.histogram = "steelblue" ) # Plot plotTracks(dtrack, from=128862888, to=128870405, chromosome="chr8") Combining with other genomic features -------------------------------------- For biological interpretation, visualise ORILINX results alongside: - Gene annotations - ChIP-seq peaks - Copy number variation - Evolutionary conservation - Chromatin accessibility **Example: Adding genes to your plot** .. code-block:: python import pandas as pd import matplotlib.pyplot as plt import matplotlib.patches as mpatches # Load ORILINX results and gene annotations origins = pd.read_csv('results/chr8.csv') genes = pd.read_csv('gene_annotations.csv') # Your gene file fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 6), sharex=True) # Plot ORILINX scores ax1.plot(origins['start'], origins['probability'], linewidth=0.5) ax1.fill_between(origins['start'], origins['probability'], alpha=0.3) ax1.set_ylabel('ORILINX Probability') ax1.set_title('Chr8 - Origins and Gene Structure') # Plot genes for idx, gene in genes.iterrows(): ax2.barh(0, gene['end']-gene['start'], left=gene['start'], height=0.5, label=gene['name']) ax2.set_ylabel('Genes') ax2.set_xlabel('Genomic Position (bp)') plt.tight_layout() plt.savefig('origins_with_genes.png', dpi=300) plt.show()