Tools for working with genomic and high throughput sequencing data.

View the Project on GitHub

fgbio tools

The following tools are available in fgbio version 2.2.1.


Tools for manipulating basecalling data.

Tool Description
ExtractBasecallingParamsForPicard Extracts sample and library information from an sample sheet for a given lane
ExtractIlluminaRunInfo Extracts information about an Illumina sequencing run from the RunInfo


Tools for manipulating FASTA files.

Tool Description
CollectAlternateContigNames Collates the alternate contig names from an NCBI assembly report
HardMaskFasta Converts soft-masked sequence to hard-masked in a FASTA file
SortSequenceDictionary Sorts a sequence dictionary file in the order of another sequence dictionary
UpdateFastaContigNames Updates the sequence names in a FASTA
UpdateIntervalListContigNames Updates the sequence names in an Interval List file


Tools for manipulating FASTQ files.

Tool Description
DemuxFastqs Performs sample demultiplexing on FASTQs
FastqToBam Generates an unmapped BAM (or SAM or CRAM) file from fastq files
SortFastq Sorts a FASTQ file
TrimFastq Trims reads in one or more line-matched fastq files to a specific read length


Tools for RNA-Seq data

Tool Description
CollectErccMetrics Collects metrics for ERCC spike-ins for RNA-Seq experiments
EstimateRnaSeqInsertSize Computes the insert size for RNA-Seq experiments


Tools for manipulating SAM, BAM, or related data.

Tool Description
AnnotateBamWithUmis Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from separate FASTQ files
AssignPrimers Assigns reads to primers post-alignment
AutoGenerateReadGroupsByName Adds read groups to a BAM file for a single sample by parsing the read names
CallOverlappingConsensusBases Consensus calls overlapping bases in read pairs
ClipBam Clips reads from the same template
DownsampleAndNormalizeBam Downsamples a BAM in a biased way to a uniform coverage across regions
ErrorRateByReadPosition Calculates the error rate by read position on coordinate sorted mapped BAMs
EstimatePoolingFractions Examines sequence data generated from a pooled sample and estimates the fraction of sequence data coming from each constituent sample
ExtractUmisFromBam Extracts unique molecular indexes from reads in a BAM file into tags
FilterBam Filters reads out of a BAM file
FindSwitchbackReads Finds reads where a template switch occurred during library construction
FindTechnicalReads Find reads that are from technical or synthetic sequences in a BAM file
RandomizeBam Randomizes the order of reads in a SAM or BAM file
RemoveSamTags Removes SAM tags from a SAM or BAM file
SetMateInformation Adds and/or fixes mate information on paired-end reads
SortBam Sorts a SAM or BAM file
SplitBam Splits a BAM into multiple BAMs, one per-read group (or library)
TrimPrimers Trims primers from reads post-alignment
UpdateReadGroups Updates one or more read groups and their identifiers
ZipperBams Zips together an unmapped and mapped BAM to transfer metadata into the output BAM

Unique Molecular Identifiers (UMIs)

Tools for manipulating UMIs & reads tagged with UMIs

Tool Description
CallDuplexConsensusReads Calls duplex consensus sequences from reads generated from the same double-stranded source molecule
CallMolecularConsensusReads Calls consensus sequences from reads with the same unique molecular tag
CollectDuplexSeqMetrics Collects a suite of metrics to QC duplex sequencing data
CopyUmiFromReadName Copies the UMI at the end of the BAM’s read name to the RX tag
CorrectUmis Corrects UMIs stored in BAM files when a set of fixed UMIs is in use
FilterConsensusReads Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads
GroupReadsByUmi Groups reads together that appear to have come from the same original molecule
ReviewConsensusVariants Extracts data to make reviewing of variant calls from consensus reads easier


Various utility programs.

Tool Description
PickIlluminaIndices Picks a set of molecular indices that should work well together
PickLongIndices Picks a set of molecular indices that have at least a given number of mismatches between them
UpdateDelimitedFileContigNames Updates the contig names in columns of a delimited data file (e
UpdateGffContigNames Updates then contig names in a GFF


Tools for manipulating VCF, BCF, or related data.

Tool Description
AssessPhasing Assess the accuracy of phasing for a set of variants
FilterSomaticVcf Applies one or more filters to a VCF of somatic variants
FixVcfPhaseSet Adds/fixes the phase set (PS) genotype field
HapCutToVcf Converts the output of ‘HAPCUT’ (‘HapCut1’/’HapCut2’) to a VCF
MakeMixtureVcf Creates a VCF with one sample whose genotypes are a mixture of other samples’
MakeTwoSampleMixtureVcf Creates a simulated tumor or tumor/normal VCF by in-silico mixing genotypes from two samples
UpdateVcfContigNames Updates then contig names in a VCF