Tools for working with genomic and high throughput sequencing data.
Group: Unique Molecular Identifiers (UMIs)
Copies the UMI at the end of the BAM’s read name to the RX tag.
The read name is split on :
characters with the last field assumed to be the UMI sequence. The UMI
will be copied to the RX
tag as per the SAM specification. If any read does not have a UMI composed of
valid bases (ACGTN), the program will report the error and fail.
If a read name contains multiple UMIs they may be delimited (typically by a hyphen (-
) or plus (+
)).
The --umi-delimiter
option specifies the delimiter on which to split. The resulting UMI in the RX
tag
will always be hyphen delimited.
Some tools (e.g. BCL Convert) may reverse-complement UMIs on R2 and add an ‘r’ prefix to indicate that the sequence
has been reverse-complemented. By default, the ‘r’ prefix is removed and the sequence is reverse-complemented
back to the forward orientation. The --override-reverse-complement-umis
disables the latter behavior, such that
the ‘r’ prefix is removed but the UMI sequence is left as reverse-complemented.
Name | Flag | Type | Description | Required? | Max # of Values | Default Value(s) |
---|---|---|---|---|---|---|
input | i | PathToBam | The input BAM file. | Required | 1 | |
output | o | PathToBam | The output BAM file. | Required | 1 | |
remove-umi | Boolean | Remove the UMI from the read name. | Optional | 1 | false | |
field-delimiter | Char | Delimiter between the read name and UMI. | Optional | 1 | : | |
umi-delimiter | Char | Delimiter between UMI sequences. | Optional | 1 | + | |
override-reverse-complement-umis | Boolean | Do not reverse-complement UMIs prefixed with ‘r’. | Optional | 1 | false |