fgbio

Tools for working with genomic and high throughput sequencing data.

View the Project on GitHub

SortBam

Overview

Group: SAM/BAM

Sorts a SAM or BAM file. Several sort orders are available:

  1. Coordinate: sorts reads by their reference sequence and left-most aligned coordinate
  2. Queryname: sort the reads by their query (i.e. read) name
  3. Random: sorts the reads into a random order. The output is deterministic for any given input. and several
  4. RandomQuery: sorts the reads into a random order but keeps reads with the same queryname together. The ordering is deterministic for any given input.
  5. TemplateCoordinate: The sort order used by GroupReadByUmi. Sorts reads by the earlier unclipped 5’ coordinate of the read pair, the higher unclipped 5’ coordinate of the read pair, library, the molecular identifier (MI tag), read name, and if R1 has the lower coordinates of the pair.

Uses a temporary directory to buffer sets of sorted reads to disk. The number of reads kept in memory affects memory use and can be changed with the --max-records-in-ram option. The temporary directory to use can be set with the fgbio global option --tmp-dir.

An example invocation might look like:

java -Xmx4g -jar fgbio.jar --tmp-dir=/my/big/scratch/volume \
  SortBam --input=queryname.bam --sort-order=Coordinate --output coordinate.bam

Arguments

Name Flag Type Description Required? Max # of Values Default Value(s)
input i PathToBam Input SAM or BAM. Required 1  
output o PathToBam Output SAM or BAM. Required 1  
sort-order s SamOrder Order into which to sort the records. Optional 1 Coordinate
max-records-in-ram m Int Max records in RAM. Optional 1 1000000