Output Format
ORILINX produces prediction results in two formats: bedGraph (default) and CSV (optional). Each sequence (chromosome) is written to a separate output file.
bedGraph Format
The bedGraph format is the default output and is compatible with genome browsers like UCSC Genome Browser and IGV.
Format specification:
Each line contains four tab-separated fields:
chrom start end score
Where:
chrom: Sequence name (e.g.,
chr1,NC_000001.11)start: 0-based start position (inclusive)
end: 0-based end position (exclusive)
score: Prediction score (probability by default, logit if
--score logitspecified)
Example output:
chr8 500 1500 0.586426
chr8 1501 2500 0.925781
chr8 2501 3500 0.029205
chr8 3501 4500 0.868652
chr8 4501 5500 0.006096
chr8 5501 6500 0.006123
Window coordinate behavior:
The output coordinate system depends on the stride parameter used:
When stride < 2000 bp (overlapping windows)
Windows are output as non-overlapping intervals to avoid gaps in genome coverage. Each interval is centered on the window’s center position with width equal to the stride:
chr8 500 1500 0.586426
chr8 1501 2500 0.925781
chr8 2501 3500 0.029205
With stride=1000, each output interval spans 1000 bp (center ± 500 bp).
When stride >= 2000 bp (non-overlapping windows)
Windows are output using their full 2000 bp coordinates:
chr8 128862888 128864888 0.654321
chr8 128864888 128866888 0.723456
chr8 128866888 128868888 0.812345
File naming:
Output files are named after the sequence: {sequence_name}.bedGraph
Examples:
chr1.bedGraph- For a human chromosomeNC_000001.11.bedGraph- For RefSeq accessionsMT.bedGraph- For mitochondrial DNA
CSV Format
The CSV format is optional and will be written when the --output_csv flag is used. It provides a more detailed record of all windows and predictions.
Format specification:
CSV files contain a header row followed by data rows with comma-separated values:
chrom,start,end,probability,logit
Where:
chrom: Sequence name
start: Window start position (0-based)
end: Window end position (exclusive)
probability: Sigmoid probability (0.0 to 1.0)
logit: Raw logit score from the model (unbounded)
Example output:
chrom,start,end,probability,logit
chr8,10000,12000,0.586426,0.343521
chr8,11000,13000,0.925781,2.876543
chr8,12000,14000,0.029205,-3.543210
chr8,13000,15000,0.868652,1.987654
chr8,14000,16000,0.006096,-5.098765
chr8,15000,17000,0.006123,-5.087654
Key differences from bedGraph:
Shows full window coordinates (always in 2000 bp window format)
Includes both probability and logit scores
Header row for column identification
Easier to parse in R, Python, etc.
File naming:
Output files are named after the sequence: {sequence_name}.csv
Examples:
chr1.csvNC_000001.11.csvMT.csv
Generating both formats
ORILINX will always give an output in bedGraph format. Use the --output_csv flag to, in addition, generate CSV files:
orilinx --fasta_path genome.fa \
--output_dir results \
--output_csv \
--sequence_names chr8:128862888-128870405
This will produce:
results/chr8.bedGraph- For genome browser visualisationresults/chr8.csv- For downstream analysis
Interpreting the scores
Probability scores (default):
Range: 0.0 to 1.0
Interpretation: Likelihood that the region contains a replication origin
0.5: Equal probability of being an origin or non-origin
> 0.7: High confidence origin prediction
< 0.3: Low confidence origin prediction
Logit scores:
Range: Unbounded (typically -5 to +5)
Interpretation: Log-odds ratio of being an origin
0: No preference (equivalent to 0.5 probability)
Positive: Likely origin
Negative: Likely non-origin
Use when you need raw model confidence
Choosing a score type:
Use probability (default) for: - Genome browser visualisation - Intuitive interpretation - Most downstream analyses
Use logit (--score logit) for:
- Statistical analyses requiring raw model output
- Machine learning pipelines
- Detailed statistical modeling
File organization
For large analyses, bedGraph and CSV files are organised by sequence:
output_dir/
├── chr1.bedGraph
├── chr1.csv
├── chr2.bedGraph
├── chr2.csv
├── chr3.bedGraph
├── chr3.csv
└── ...
└── chrX.bedGraph
└── chrX.csv
You can process all files together:
# Concatenate all bedGraphs into single file
cat output_dir/*.bedGraph > all_origins.bedGraph
# Concatenate all CSVs (preserving header from first file)
(head -1 output_dir/chr1.csv; tail -q -n +2 output_dir/*.csv) > all_origins.csv