Tools for working with genomic and high throughput sequencing data.
Group: SAM/BAM
Downsamples a BAM in a biased way to a uniform coverage across regions.
Attempts to downsample a BAM such that every base in the genome (or in the target regions
if provided)
is covered by at least coverage
reads. When computing coverage:
Reads are first sorted into a random order (by hashing read names). Reads are then consumed one template at a time, and if any read adds coverage to base that is under the target coverage, all reads (including secondary, unmapped, etc.) for that template are emitted into the output.
Given the procedure used for downsampling, it is likely the output BAM will have coverage up to 2X the requested coverage at regions in the input BAM that are i) well covered and ii) are close to regions that are poorly covered.
Name | Flag | Type | Description | Required? | Max # of Values | Default Value(s) |
---|---|---|---|---|---|---|
input | i | PathToBam | Input SAM or BAM file. | Required | 1 | |
output | o | PathToBam | Output SAM or BAM file. | Required | 1 | |
coverage | c | Int | Desired minimum coverage. | Required | 1 | |
min-map-q | m | Int | Minimum mapping quality to count a read as covering. | Optional | 1 | 0 |
seed | s | Int | Random seed to use when randomizing order of reads/templates. | Optional | 1 | 42 |
regions | l | PathToIntervals | Optional set of regions for coverage targeting. | Optional | 1 | |
max-in-memory | M | Int | Maximum records to be held in memory while sorting. | Optional | 1 | 1000000 |