|The difference between metagenomics and amplicon sequencing
(i.e. why a 16S survey is NOT metagenomics)
|This post was triggered by a manuscript in which the auhthors referred to their 16S rRNA gene survey as “amplicon-metagenomics” and “amplicon-based metagenomics”. A subsequent google search revealed that these misleading terms have been used in several scientific papers, including in Nature journals, e.g. (Xu et al. 2015). The range of chimeric terms invented for 16S amplicon studies includes misnomers such as “16S rRNA gene-based metagenomic analysis” and “16S rRNA metagenomics-based survey”.
A large sequencing company takes the cake by jumping on this misleading bandwagon and naming their app, designed to classify 16S rRNA amplicon reads, the “16S Metagenomics app”. They even provide their own curated versions of the Greengenes 16S database and call it a “16S Metagenomics Database”.
Let us give these scientists the benefit of the doubt and assume they are not familiar with the differences between metagenomics and amplicon sequencing. Indeed, there are similarities, since both terms refer to techniques designed to bypass the traditional culturing bottleneck and hence work on DNA extracted directly from the environment. However, the differences in (1) underlying methodology, (2) data utilisation, and (3) definitions are vast.
In contrast, metagenomics targets all DNA in a sample, with the aim to recover genomes or at least large fragments thereof. Nowadays shotgun sequencing, a technique to sequence random DNA fragments, has become the main workhorse in metagenomics. Shotgun metagenomics results in short DNA sequences (reads), which are bioinformatically assembled into larger fragments (contigs and scaffolds), which are then binned to recover entire genomes, termed metagenome-assembled genomes (MAGs). Recently long read sequencing technologies (e.g. by Oxford Nanopore and PacBio) haven been paired with shotgun sequencing to generate MAGs of superior quality, but that is again another story.
2) Data utilisation
In summary, based on the differences in 1) the applied methodology, 2) the resulting data/information, and 3) the definition of the term, metagenomics is clearly distinguished from amplicon sequencing. Therefore, disguising an amplicon, e.g. 16S, study as a metagenomics analysis is not only bad practice, it also misleads the reader to believe that genomic data with all its possible inferences will be presented, when in fact the analysis is based on a single gene alone.
I hope this post clarifies what sets metagenomics apart from amplicon sequencing and that it will help reduce the practice of using misleading characterisations for 16S rRNA gene based and other amplicon studies.
Q: What if I target multiple genes with amplicon sequencing? If I develop primers to amplify tens or even hundreds of genes, is it metagenomics then?