Tools for working with genomic and high throughput sequencing data.

View the Project on GitHub

fgbio tools

The following tools are available in fgbio version 0.8.1.


Tools for manipulating basecalling data.

Tool Description
ExtractBasecallingParamsForPicard Extracts sample and library information from an sample sheet for a given lane
ExtractIlluminaRunInfo Extracts information about an Illumina sequencing run from the RunInfo


Tools for manipulating FASTA files.

Tool Description
HardMaskFasta Converts soft-masked sequence to hard-masked in a FASTA file


Tools for manipulating FASTQ files.

Tool Description
DemuxFastqs Performs sample demultiplexing on FASTQs
FastqToBam Generates an unmapped BAM (or SAM or CRAM) file from fastq files
SortFastq Sorts a FASTQ file
TrimFastq Trims reads in one or more line-matched fastq files to a specific read length


Various personal programs (not supported).

Tool Description
FindSwitchbackReads Finds reads where a template switch occurred during library construction
GenerateRegionsFromFasta Generates a list of ‘freebayes’/’bamtools’ region specifiers
SplitTag Splits an optional tag in a SAM or BAM into multiple optional tags
StripFastqReadNumbers Removes trailing /# from read names in fastq


Tools for RNA-Seq data

Tool Description
CollectErccMetrics Collects metrics for ERCC spike-ins for RNA-Seq experiments
EstimateRnaSeqInsertSize Computes the insert size for RNA-Seq experiments


Tools for manipulating SAM, BAM, or related data.

Tool Description
AnnotateBamWithUmis Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file
AutoGenerateReadGroupsByName Adds read groups to a BAM file for a single sample by parsing the read names
ClipBam Clips reads from the same template
ErrorRateByReadPosition Calculates the error rate by read position on coordinate sorted mapped BAMs
EstimatePoolingFractions Examines sequence data generated from a pooled sample and estimates the fraction of sequence data coming from each constituent sample
ExtractUmisFromBam Extracts unique molecular indexes from reads in a BAM file into tags
FilterBam Filters reads out of a BAM file
FindTechnicalReads Find reads that are from technical or synthetic sequences in a BAM file
RandomizeBam Randomizes the order of reads in a SAM or BAM file
RemoveSamTags Removes SAM tags from a SAM or BAM file
SetMateInformation Adds and/or fixes mate information on paired-end reads
SortBam Sorts a SAM or BAM file
SplitBam Splits a BAM into multiple BAMs, one per-read group (or library)
TrimPrimers Trims primers from reads post-alignment
UpdateReadGroups Updates one or more read groups and their identifiers

Unique Molecular Identifiers (UMIs)

Tools for manipulating UMIs & reads tagged with UMIs

Tool Description
CallDuplexConsensusReads Calls duplex consensus sequences from reads generated from the same double-stranded source molecule
CallMolecularConsensusReads Calls consensus sequences from reads with the same unique molecular tag
CollectDuplexSeqMetrics Collects a suite of metrics to QC duplex sequencing data
CorrectUmis Corrects UMIs stored in BAM files when a set of fixed UMIs is in use
FilterConsensusReads Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads
GroupReadsByUmi Groups reads together that appear to have come from the same original molecule
ReviewConsensusVariants Extracts data to make reviewing of variant calls from consensus reads easier


Various utility programs.

Tool Description
PickIlluminaIndices Picks a set of molecular indices that should work well together
PickLongIndices Picks a set of molecular indices that have at least a given number of mismatches between them


Tools for manipulating VCF, BCF, or related data.

Tool Description
AssessPhasing Assess the accuracy of phasing for a set of variants
FilterSomaticVcf Applies one or more filters to a VCF of somatic variants
HapCutToVcf Converts the output of ‘HAPCUT’ (‘HapCut1’/’HapCut2’) to a VCF
MakeMixtureVcf Creates a VCF with one sample whose genotypes are a mixture of other samples’
MakeTwoSampleMixtureVcf Creates a simulated tumor or tumor/normal VCF by in-silico mixing genotypes from two samples