Tools for working with genomic and high throughput sequencing data.
Group: Unique Molecular Identifiers (UMIs)
Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads. Two kinds of filtering are performed:
Base-level filtering/masking is only applied if per-base tags are present (see CallDuplexConsensusReads and CallMolecularConsensusReads for descriptions of these tags). Read-level filtering is always applied. When filtering reads, secondary alignments and supplementary records may be removed independently if they fail one or more filters; if either R1 or R2 primary alignments fail a filter then all records for the template will be filtered out.
The filters applied are as follows:
When filtering single-umi consensus reads generated by CallMolecularConsensusReads a single value each
should be supplied for --min-reads
, --max-read-error-rate
, and --max-base-error-rate
.
When filtering duplex consensus reads generated by CallDuplexConsensusReads each of the three parameters may independently take 1-3 values. For example:
FilterConsensusReads ... --min-reads 10 5 3 --max-base-error-rate 0.1
In each case if fewer than three values are supplied, the last value is repeated (i.e. 80 40
-> 80 40 40
and 0.1
-> 0.1 0.1 0.1
. The first value applies to the final consensus read, the second value to one
single-strand consensus, and the last value to the other single-strand consensus. It is required that if
values two and three differ, the more stringent value comes earlier.
In order to correctly filter reads in or out by template, the input BAM must be either queryname
sorted or
query
grouped. If your BAM is not already in an appropriate order, this can be done in streaming fashion with:
samtools sort -n -u in.bam | fgbio FilterConsensusReads -i /dev/stdin ...
The output sort order may be specified with --sort-order
. If not given, then the output will be in the same
order as input.
The --reverse-tags-per-base
option controls whether per-base tags should be reversed before being used on reads
marked as being mapped to the negative strand. This is necessary if the reads have been mapped and the
bases/quals reversed but the consensus tags have not. If true, the tags written to the output BAM will be
reversed where necessary in order to line up with the bases and quals.
Name | Flag | Type | Description | Required? | Max # of Values | Default Value(s) |
---|---|---|---|---|---|---|
input | i | PathToBam | The input SAM or BAM file of consensus reads. | Required | 1 | |
output | o | PathToBam | Output SAM or BAM file. | Required | 1 | |
ref | r | PathToFasta | Reference fasta file. | Required | 1 | |
reverse-per-base-tags | R | Boolean | Reverse [complement] per base tags on reverse strand reads. | Optional | 1 | false |
min-reads | M | Int | The minimum number of reads supporting a consensus base/read. | Required | 3 | |
max-read-error-rate | E | Double | The maximum raw-read error rate across the entire consensus read. | Required | 3 | 0.025 |
max-base-error-rate | e | Double | The maximum error rate for a single consensus base. | Required | 3 | 0.1 |
min-base-quality | N | PhredScore | Mask (make N ) consensus bases with quality less than this threshold. |
Required | 1 | |
max-no-call-fraction | n | Double | Maximum fraction of no-calls in the read after filtering. | Optional | 1 | 0.2 |
min-mean-base-quality | q | PhredScore | The minimum mean base quality across the consensus read. | Optional | 1 | |
require-single-strand-agreement | s | Boolean | Mask (make N ) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only). |
Optional | 1 | false |
sort-order | S | SamOrder | The sort order of the output. If not given, output will be in the same order as input if the input is query name sorted or query grouped, otherwise queryname order. | Optional | 1 |