fgbio

Tools for working with genomic and high throughput sequencing data.

View the Project on GitHub

UpdateDelimitedFileContigNames

Overview

Group: Utilities

Updates the contig names in columns of a delimited data file (e.g. CSV, TSV).

The name of each sequence must match one of the names (including aliases) in the given sequence dictionary. The new name will be the primary (non-alias) name in the sequence dictionary. Use --skip-missing to ignore lines where a contig name could not be updated (i.e. missing from the sequence dictionary).

Arguments

Name Flag Type Description Required? Max # of Values Default Value(s)
input i FilePath Input delimited data file. Required 1  
dict d PathToSequenceDictionary The path to the sequence dictionary with contig aliases. Required 1  
columns c Int The column indices for the contig names (0-based). Required Unlimited  
delimiter T Char The delimiter Optional 1 \t
comment H String Treat lines with this starting string as comments (always printed) Optional 1 #
output o FilePath Output delimited data file. Required 1  
output-first-num-lines n Int Output the first N lines as-is (always printed). Optional 1 0
skip-missing   Boolean Skip lines where a contig name could not be updated (i.e. missing from the sequence dictionary). Optional 1 false
sort-order s SortOrder Sort the output based on the following order. Optional 1 Unsorted
contig   Int The column index for the contig (0-based) for sorting. Use the first column if not given. Optional 1  
position   Int The column index for the genomic position (0-based) for sorting by coordinate. Optional 1  
max-objects-in-ram   Int The maximum number of objects to store in memory Optional 1 1000000