A tab-separated table containing the estimated abundances of GTDB taxons in a metagenome. It is in TSV format with 3 columns, with each row corresponding to a taxon. A taxonomic profile may also be called a condensed profile, since it is the output of the condense
algorithm within the main pipe
workflow. Taxonomic profiles can be converted to other formats using singlem summarise
. Columns:
sample coverage taxonomy
ERR1914274 0 Root
ERR1914274 3.16 Root; d__Bacteria
ERR1914274 0 Root; d__Bacteria; p__Pseudomonadota
ERR1914274 0.06 Root; d__Bacteria; p__Pseudomonadota; c__Gammaproteobacteria
ERR1914274 0 Root; d__Bacteria; p__Bacillota_A
ERR1914274 0.61 Root; d__Bacteria; p__Bacillota_A; c__Clostridia
ERR1914274 0 Root; d__Bacteria; p__Bacteroidota
ERR1914274 0.39 Root; d__Bacteria; p__Bacteroidota; c__Bacteroidia
ERR1914274 0 Root; d__Bacteria; p__Bacillota
...
A table containing window sequences per metagenome/contig and marker gene. It may be in default form (a TSV with 6 columns, like below), or an extended form with more detail in further columns. The default OTU table output from pipe, renew and summarise subcommands has 6 columns, with one sequence per row. The extended form OTU table and archive OTU tables have further information (see below). Columns of a default OTU table:
gene sample sequence num_hits coverage taxonomy
4.21.ribosomal_protein_S19_rpsS my_sequences TGGTCGCGCCGTT... 1 1.64 Root; d__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfuromonadales
4.21.ribosomal_protein_S19_rpsS my_sequences TGGTCGCGGCGCT... 1 1.64 Root; d__Bacteria; p__Acidobacteria; c__Solibacteres; o__Solibacterales; f__Solibacteraceae; g__Candidatus_Solibacter; s__Candidatus_Solibacter_usitatus
The extended OTU table form generated with the --output-extras
option to the pipe, renew and summarise subcommands, has all the columns of a regular OTU table, but with several additional columns which contain more information about each OTU:
Similar to an extended form OTU table, but in JSON form for machine readability and with formatting version recorded. The renew subcommand which re-analyses a dataset requires this format of OTU table rather than the default tab-separated OTU table format. The canonical file extension for SingleM packages is .json
.
First, to define coverage. Coverage of a single species is defined in the standard way as in e.g. CoverM. It is the average number of reads covering each position in that species' genome. SingleM (and Lyrebird) do not directly calculate this via mapping, but instead assume that the coverage of a species' single copy marker genes is representative of the coverage of the whole genome. In particular, it is estimated from the reads that encode sequences recruited to the highly conserved marker gene windows.
In a standard taxonomic profile output by pipe, the coverage column is unfilled coverage. This means that the coverage of a taxon does not include the coverage of its sub-taxons. For example, in this profile:
sample coverage taxonomy
ERR1914274 0 Root
ERR1914274 3.1 Root; d__Bacteria
ERR1914274 4.0 Root; d__Bacteria; p__Pseudomonadota
ERR1914274 2.9 Root; d__Archaea
The unfilled coverage of d__Bacteria
is 3.1. This means that SingleM estimates that 3.1x read coverage of bacterial genomes that do not belong to p__Pseudomonadota
(the only sub-taxon of d__Bacteria
listed).
The filled coverage of d__Bacteria
is 7.1, calculated as the coverage of d__Bacteria
(3.1) plus the coverage of all of its descendents (Here just p__Pseudomonadota
so 4.0).
To calculate relative abundance, the filled coverage is used. For example, the relative abundance of d__Bacteria
in the above profile is 7.1 / (7.1 + 2.9) = 0.71 or 71%. The relative abundance of p__Pseudomonadota
is 4.0 / (7.1 + 2.9) = 0.40 or 40%.
The coverage of a genus is the sum of the coverages of all its species, and so on up the taxonomic tree.
Reference data for one particular marker gene and its window position. The canonical file extension for SingleM packages is .spkg
.
A collection of SingleM packages, with additional indices. The canonical file extension for SingleM metapackages is .smpkg
.
An OTU table which has been converted to SQLite3 format and sequence similarity search indexes. Canonically SingleM databases are named with the .sdb
extension, but this is not enforced. SingleM databases are created with the makedb subcommand, and queried with the query subcommand.
Powered by Doctave