Tools for working with genomic and high throughput sequencing data.
Group: Unique Molecular Identifiers (UMIs)
Collects a suite of metrics to QC duplex sequencing data.
The input to this tool must be a BAM file that is either:
GroupReadsByUmi
tool (in the sort-order it was produced in)GroupReadsByUmi
and has
been sorted with SortBam
into TemplateCoordinate
order.Calculation of metrics may be restricted to a set of regions using the --intervals
parameter. This
can significantly affect results as off-target reads in duplex sequencing experiments often have very
different properties than on-target reads due to the lack of enrichment.
Several metrics are calculated related to the fraction of tag families that have duplex coverage. The
definition of “duplex” is controlled by the --min-ab-reads
and --min-ba-reads
parameters. The default
is to treat any tag family with at least one observation of each strand as a duplex, but this could be
made more stringent, e.g. by setting --min-ab-reads=3 --min-ba-reads=3
. If different thresholds are
used then --min-ab-reads
must be the higher value.
The following output files are produced:
Within the metrics files the prefixes CS
, SS
and DS
are used to mean:
For plots to be generated R must be installed and the ggplot2 package installed with suggested dependencies. Successfully executing the following in R will ensure a working installation:
install.packages("ggplot2", repos="http://cran.us.r-project.org", dependencies=TRUE)
Name | Flag | Type | Description | Required? | Max # of Values | Default Value(s) |
---|---|---|---|---|---|---|
input | i | PathToBam | Input BAM file generated by GroupReadsByUmi . |
Required | 1 | |
output | o | PathPrefix | Prefix of output files to write. | Required | 1 | |
intervals | l | PathToIntervals | Optional set of intervals over which to restrict analysis. | Optional | 1 | |
description | d | String | Description of data set used to label plots. Defaults to sample/library. | Optional | 1 | |
duplex-umi-counts | u | Boolean | If true, produce the .duplex_umi_counts.txt file with counts of duplex UMI observations. | Optional | 1 | false |
min-ab-reads | a | Int | Minimum AB reads to call a tag family a ‘duplex’. | Optional | 1 | 1 |
min-ba-reads | b | Int | Minimum BA reads to call a tag family a ‘duplex’. | Optional | 1 | 1 |
umi-tag | t | String | The tag containing the raw UMI. | Optional | 1 | RX |
mi-tag | T | String | The output tag for UMI grouping. | Optional | 1 | MI |