fgbio

Tools for working with genomic and high throughput sequencing data.

View the Project on GitHub

CopyUmiFromReadName

Overview

Group: Unique Molecular Identifiers (UMIs)

Copies the UMI at the end of the BAM’s read name to the RX tag.

The read name is split on : characters with the last field assumed to be the UMI sequence. The UMI will be copied to the RX tag as per the SAM specification. If any read does not have a UMI composed of valid bases (ACGTN), the program will report the error and fail.

If a read name contains multiple UMIs they may be delimited (typically by a hyphen (-) or plus (+)). The --umi-delimiter option specifies the delimiter on which to split. The resulting UMI in the RX tag will always be hyphen delimited.

Some tools (e.g. BCL Convert) may reverse-complement UMIs on R2 and add an ‘r’ prefix to indicate that the sequence has been reverse-complemented. By default, the ‘r’ prefix is removed and the sequence is reverse-complemented back to the forward orientation. The --override-reverse-complement-umis disables the latter behavior, such that the ‘r’ prefix is removed but the UMI sequence is left as reverse-complemented.

Arguments

Name Flag Type Description Required? Max # of Values Default Value(s)
input i PathToBam The input BAM file. Required 1  
output o PathToBam The output BAM file. Required 1  
remove-umi   Boolean Remove the UMI from the read name. Optional 1 false
field-delimiter   Char Delimiter between the read name and UMI. Optional 1 :
umi-delimiter   Char Delimiter between UMI sequences. Optional 1 +
override-reverse-complement-umis   Boolean Do not reverse-complement UMIs prefixed with ‘r’. Optional 1 false