Visualisation and Analysis

ORILINX produces bedGraph output that can be visualised in genome browsers for intuitive exploration of predicted replication origins. This section covers various tools and methods for visualising results.

Overview

bedGraph files from ORILINX are compatible with most genome browsers including:

  • UCSC Genome Browser - Web-based, no installation needed

  • IGV (Integrative Genomics Viewer) - Desktop application with advanced features

  • JBrowse - Lightweight web browser for genomic data

  • Gviz (R) or pyBigWig (Python) - Programmatic visualisation

The basic workflow is:

  1. Run ORILINX to generate bedGraph files

  2. Load them into a genome browser

  3. Compare with other genomic features (genes, regulatory elements, etc.)

  4. Interpret results in biological context

UCSC Genome Browser

The UCSC Genome Browser is a free, web-based tool that requires no installation.

Basic Usage

  1. Generate your ORILINX results:

    orilinx --fasta_path hg38.fa --output_dir results --sequence_names chr8
    
  2. Go to https://genome.ucsc.edu/

  3. Select your genome (e.g., “Human” and “Dec. 2013 (GRCh38/hg38)”)

  4. Navigate to your region of interest (e.g., type “chr8:128862888-128870405” in the search box)

  5. Click “Add Custom Tracks” and upload your bedGraph file:

    • Copy the contents of results/chr8.bedGraph or upload the file directly

    • Set the display height and colour

    • Click “Submit”

  6. Your ORILINX scores will appear as a histogram track

Customizing Track Appearance

You can customize how your track appears by adding a header to your bedGraph file:

track name="ORILINX Origins" description="Predicted replication origins" colour=50,50,200 viewLimits=0:1

Then prepend this to your bedGraph file:

echo 'track name="ORILINX Origins" description="Predicted replication origins" colour=50,50,200 viewLimits=0:1' > formatted.bedGraph
cat results/chr8.bedGraph >> formatted.bedGraph

Then upload formatted.bedGraph to UCSC.

Converting to BigWig for faster loading

For large files, convert bedGraph to BigWig format for faster loading:

# Download BigWig tools if needed
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig
chmod +x bedGraphToBigWig

# Obtain chrom sizes
curl https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes > hg38.chrom.sizes

# Convert
./bedGraphToBigWig results/chr8.bedGraph hg38.chrom.sizes results/chr8.bw

Then upload the .bw file to UCSC instead of the bedGraph.

IGV (Integrative Genomics Viewer)

IGV is a desktop application that offers more control and advanced features than web browsers.

Installation

  1. Download from http://software.broadinstitute.org/software/igv/

  2. Install for your operating system (Mac, Windows, Linux)

  3. Launch IGV

Loading ORILINX Data

  1. Open IGV and select your genome (File → Genomes → Load Genome)

  2. Load your bedGraph file:

    • File → Load from File

    • Select results/chr8.bedGraph

  3. Navigate to your region of interest using the search box

  4. IGV will display your ORILINX scores as a bar graph

Tips for IGV

  • Zoom in/out: Use the zoom controls or scroll wheel

  • Compare tracks: Load multiple bedGraph files simultaneously to compare regions

  • Overlay with annotations: Load gene annotations, ChIP-seq, or other genomic data for context

  • Export images: Right-click on tracks to save publication-quality figures

  • Coverage view: Change display mode to “expanded” to see individual windows

Coloured tracks

Create a coloured bedGraph based on score thresholds:

# High confidence origins (score > 0.7) in red
# Medium confidence (0.3-0.7) in yellow
# Low confidence (< 0.3) in blue
awk 'BEGIN {FS=OFS="\t"}
     {if ($4 > 0.7) colour="255,0,0";
      else if ($4 > 0.3) colour="255,255,0";
      else colour="0,0,255";
      print $1, $2, $3, $4, colour}' results/chr8.bedGraph > coloured.bedGraph

Then load coloured.bedGraph in IGV.

JBrowse

JBrowse is a lightweight, embeddable genome browser suitable for web-based visualisation.

Using JBrowse Online

  1. Visit https://jbrowse.org/jbrowse/

  2. Select your reference genome

  3. Add tracks:

    • Click “Add Track”

    • Paste the URL to your bedGraph file or upload it directly

  4. Navigate to your region to visualize

Self-hosted JBrowse

For more control, you can host JBrowse on your own server:

# Install JBrowse (see documentation)
wget https://github.com/GMOD/jbrowse/releases/download/1.11.6/JBrowse-1.11.6.zip
unzip JBrowse-1.11.6.zip

# Configure data directory
cd JBrowse-1.11.6
./bin/prepare-refseqs.pl --fasta hg38.fa

# Add ORILINX track
./bin/flatfile-to-json.pl --gff results/chr8.bedGraph --type bedGraph --trackType wig --out data

In Python

Use Python for programmatic visualisation and analysis of ORILINX results.

Basic plot with Matplotlib

import pandas as pd
import matplotlib.pyplot as plt

# Read CSV output
df = pd.read_csv('results/chr8.csv')

# Plot probability scores
plt.figure(figsize=(14, 4))
plt.plot(df['start'], df['probability'], linewidth=0.5)
plt.fill_between(df['start'], df['probability'], alpha=0.3)
plt.xlabel('Genomic Position (bp)')
plt.ylabel('Origin Probability')
plt.title('ORILINX Predictions - Chr8')
plt.tight_layout()
plt.savefig('origins_plot.png', dpi=300)
plt.show()

Finding high-confidence origins

import pandas as pd

df = pd.read_csv('results/chr8.csv')

# Filter for high-confidence origins (>0.7 probability)
origins = df[df['probability'] > 0.7]

print(f"Found {len(origins)} high-confidence origins")
print(origins[['start', 'end', 'probability']])

# Export for further analysis
origins.to_csv('high_confidence_origins.csv', index=False)

Interactive visualisation with Plotly

import pandas as pd
import plotly.graph_objects as go

df = pd.read_csv('results/chr8.csv')

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df['start'],
    y=df['probability'],
    mode='lines',
    name='ORILINX Probability',
    fill='tozeroy'
))

# Add threshold lines
fig.add_hline(y=0.7, line_dash="dash", line_color="red",
              annotation_text="High confidence", annotation_position="right")
fig.add_hline(y=0.3, line_dash="dash", line_color="orange",
              annotation_text="Low confidence", annotation_position="right")

fig.update_layout(
    title='ORILINX Predictions with Confidence Thresholds',
    xaxis_title='Genomic Position (bp)',
    yaxis_title='Origin Probability',
    hovermode='x unified'
)

fig.show()
fig.write_html('origins_interactive.html')

Comparison of multiple regions

import pandas as pd
import matplotlib.pyplot as plt

# Load multiple regions
regions = {}
for region in ['chr1', 'chr8', 'chrX']:
    regions[region] = pd.read_csv(f'results/{region}.csv')

# Plot comparison
fig, axes = plt.subplots(len(regions), 1, figsize=(14, 3*len(regions)))

for idx, (region, df) in enumerate(regions.items()):
    axes[idx].plot(df['start'], df['probability'], linewidth=0.5)
    axes[idx].fill_between(df['start'], df['probability'], alpha=0.3)
    axes[idx].set_title(f'{region} ORILINX Predictions')
    axes[idx].set_ylabel('Probability')
    if idx == len(regions) - 1:
        axes[idx].set_xlabel('Genomic Position (bp)')

plt.tight_layout()
plt.savefig('multiregion_comparison.png', dpi=300)
plt.show()

In R

Use R for statistical analysis and publication-quality figures.

Basic plot with ggplot2

library(ggplot2)
library(dplyr)

# Read CSV output
df <- read.csv('results/chr8.csv')

# Create plot
ggplot(df, aes(x=start, y=probability)) +
  geom_line(size=0.2) +
  geom_area(alpha=0.3) +
  theme_minimal() +
  labs(
    title = 'ORILINX Predictions - Chr8',
    x = 'Genomic Position (bp)',
    y = 'Origin Probability'
  ) +
  theme(text=element_text(size=12))

ggsave('origins_plot.png', width=14, height=4, dpi=300)

Finding and annotating peaks

library(dplyr)
library(ggplot2)

df <- read.csv('results/chr8.csv')

# Find peak origins (local maxima)
df <- df %>%
  mutate(
    is_peak = probability > 0.7,
    peak_id = cumsum(c(TRUE, diff(is_peak) != 0)) * is_peak
  )

peaks <- df %>%
  filter(is_peak) %>%
  group_by(peak_id) %>%
  summarise(
    peak_start = min(start),
    peak_end = max(end),
    peak_probability = max(probability),
    .groups = 'drop'
  )

print(peaks)
write.csv(peaks, 'predicted_origins.csv', row.names=FALSE)

Genome browser-style visualisation with Gviz

library(Gviz)
library(GenomicRanges)

# Read ORILINX data
df <- read.csv('results/chr8.csv')

# Convert to GRanges
gr <- GRanges(
  seqnames = df$chrom,
  ranges = IRanges(start = df$start, end = df$end),
  score = df$probability
)

# Create DataTrack
dtrack <- DataTrack(
  range = gr,
  name = "ORILINX",
  type = "histogram",
  col.histogram = "steelblue",
  fill.histogram = "steelblue"
)

# Plot
plotTracks(dtrack, from=128862888, to=128870405, chromosome="chr8")

Combining with other genomic features

For biological interpretation, visualise ORILINX results alongside:

  • Gene annotations

  • ChIP-seq peaks

  • Copy number variation

  • Evolutionary conservation

  • Chromatin accessibility

Example: Adding genes to your plot

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# Load ORILINX results and gene annotations
origins = pd.read_csv('results/chr8.csv')
genes = pd.read_csv('gene_annotations.csv')  # Your gene file

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 6), sharex=True)

# Plot ORILINX scores
ax1.plot(origins['start'], origins['probability'], linewidth=0.5)
ax1.fill_between(origins['start'], origins['probability'], alpha=0.3)
ax1.set_ylabel('ORILINX Probability')
ax1.set_title('Chr8 - Origins and Gene Structure')

# Plot genes
for idx, gene in genes.iterrows():
    ax2.barh(0, gene['end']-gene['start'],
            left=gene['start'], height=0.5, label=gene['name'])
ax2.set_ylabel('Genes')
ax2.set_xlabel('Genomic Position (bp)')

plt.tight_layout()
plt.savefig('origins_with_genes.png', dpi=300)
plt.show()