Troubleshooting
Installation Issues
“Command not found: orilinx”
The orilinx command is not recognized after installation.
Solutions:
Verify you installed ORILINX correctly:
cd /path/to/ORILINX pip install -e .
Make sure you’re using the correct Python environment if you use conda or virtualenv:
conda activate myenv # or source venv/bin/activate pip install -e .
Try using the Python module directly instead:
python -m orilinx --help
“No module named transformers” or other missing packages
Installation completed but ORILINX is missing dependencies.
Solutions:
Install all required packages:
pip install torch transformers pysam pandas numpy peftIf you’re using conda, install dependencies from conda-forge:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia conda install transformers pysam pandas numpy
Verify installation by importing modules:
python -c "import torch, transformers, pysam, pandas; print('All modules imported successfully')"
“Index file not found”
ORILINX cannot find the FASTA index file (.fai).
Solutions:
Create the index file using samtools:
samtools faidx your_genome.faVerify the index file was created:
ls -la your_genome.fa*You should see both your_genome.fa and your_genome.fa.fai files.
Make sure the
.faifile is in the same directory as the FASTA file.
Memory Issues
Out of Memory (OOM) errors
ORILINX crashes with “out of memory” or “CUDA out of memory” error.
Solutions:
Reduce batch size (decrease GPU memory usage):
orilinx --fasta_path genome.fa \ --output_dir results \ --batch_size 32
Reduce number of workers (decrease CPU memory and parallelism):
orilinx --fasta_path genome.fa \ --output_dir results \ --batch_size 32 \ --num_workers 2
Use CPU instead of GPU (slower but uses system memory):
CUDA_VISIBLE_DEVICES="" orilinx --fasta_path genome.fa \ --output_dir results \ --batch_size 16
Memory limit exceeded on cluster
Running on an HPC cluster with memory limits.
Solutions:
Request more memory in your job script:
#SBATCH --mem=32G orilinx --fasta_path genome.fa --output_dir results --batch_size 32
Use the conservative settings:
orilinx --fasta_path genome.fa \ --output_dir results \ --batch_size 16 \ --num_workers 2
Performance Issues
Slow performance
ORILINX is running but much slower than expected.
Check if GPU is being used:
python -c "import torch; print('GPU available:', torch.cuda.is_available()); print('GPU name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"
If GPU is available but not being used:
Try running with verbose output to see what device is being used:
orilinx --fasta_path genome.fa \
--output_dir results \
--verbose
Solutions for slow performance:
Increase batch size if you have GPU memory available:
orilinx --fasta_path genome.fa \ --output_dir results \ --batch_size 128
Increase number of workers for data loading:
orilinx --fasta_path genome.fa \ --output_dir results \ --num_workers 16
GPU and Computation Errors
“Triton compilation error” or “CUDA error”
ORILINX encounters GPU-related compilation or memory errors.
Solutions:
Use the
--disable_flashflag to fall back to standard PyTorch attention:orilinx --fasta_path genome.fa \ --output_dir results \ --disable_flash
This is slower but more stable and compatible with more GPUs.
Update your GPU drivers and CUDA:
nvidia-smi # Check current CUDA version # Update drivers from nvidia.com
Downgrade PyTorch if you have compatibility issues:
pip install torch==2.0.1
Model loading fails or crashes
ORILINX cannot find or load the model files.
Solutions:
Check that your models directory exists and has the correct structure:
ls -la models/ ls -la models/DNABERT-2-117M-Flash/ ls -la models/model_epoch_6.pt
Manually specify paths with environment variables:
export ORILINX_DNABERT_PATH=/full/path/to/DNABERT-2-117M-Flash export ORILINX_MODEL=/full/path/to/model_epoch_6.pt orilinx --fasta_path genome.fa --output_dir results
Run with verbose output to see where ORILINX is searching:
orilinx --fasta_path genome.fa --output_dir results --verbose
Data Input Issues
“Sequence not found in FASTA”
ORILINX cannot find the specified chromosome or sequence name.
Solutions:
List all available sequences in your FASTA file:
samtools idxstats genome.fa | cut -f1Make sure you’re using the correct sequence name format:
# First part of FASTA file is: # >chr1 # ATCGAATCGGATA... # Correct - use exact names from FASTA orilinx --fasta_path genome.fa --output_dir results --sequence_names chr1 # Wrong - sequence name doesn't exist orilinx --fasta_path genome.fa --output_dir results --sequence_names chromosome1
If sequences have RefSeq accessions instead of chr names:
# First check what names are in the FASTA samtools idxstats genome.fa | head # Then use those exact names orilinx --fasta_path genome.fa --output_dir results --sequence_names NC_000001.11
“Range is too small” error
Specified region is smaller than 2000 bp.
Solution:
Since ORILINX analyses 2000 bp windows, the minimum region size is 2000 bp. Expand your region:
# Wrong - only 1000 bp orilinx --fasta_path genome.fa --sequence_names chr1:1000000-1001000 # Correct - 2000 bp minimum orilinx --fasta_path genome.fa --sequence_names chr1:1000000-1002000
Output Issues
Empty output files
Output files are created but contain no data.
Causes and solutions:
Too many ‘N’ bases in the region: Regions with >5% ‘N’ bases are skipped by default:
# Check for 'N' content samtools faidx genome.fa chr1:1000000-2000000 | grep -o 'N' | wc -l # Reduce strictness orilinx --fasta_path genome.fa \ --output_dir results \ --sequence_names chr1:1000000-2000000 \ --max_N_frac 0.2
Region is smaller than window: The region must be at least 2000 bp.
Analyse a different region to verify the pipeline works.
Debugging and Getting Help
Enable verbose output
Get detailed information about what ORILINX is doing:
orilinx --fasta_path genome.fa \
--output_dir results \
--verbose
This shows: - Which model files are being loaded - What device (CPU/GPU) is being used - Runtime settings and batch configuration
Test with a small region
Before running genome-wide, test with a small region to verify everything works:
orilinx --fasta_path genome.fa \
--output_dir test_results \
--sequence_names chr1:50000000-51000000 \
--verbose
Check system resources
Monitor CPU and GPU usage while running:
# In a separate terminal
nvidia-smi -l 1 # Update GPU stats every second (NVIDIA GPUs)
# Or for CPU usage
top
Report issues
If you encounter issues not covered here, raise a GitHub issue and please provide:
The command you used:
orilinx --fasta_path ... (your full command)The verbose output:
orilinx --fasta_path genome.fa --output_dir results --verboseYour system information:
python --version python -c "import torch; print(torch.__version__, torch.cuda.is_available())" nvidia-smi # If using GPU samtools --version python -c "import transformers; print(transformers.__version__)"