Short-read gene-centric metagenomics, including contig assembly, detection and functional annotation of protein-coding sequences, gene abundance profiles (KEGG), pairwise dissimilarity metrics, MDS and various visualizations. The client provides raw shotgun metagenomic sequences and sample metadata.
Log in to see availability and payment modalities.
▾ General introduction
Metagenomic sequencing is a well-established and powerful technique in modern microbiology, in which random DNA fragments obtained from a biological sample are sequenced and analyzed.
These DNA fragments generally originate from many different taxa and from all genomic regions, thus offering a deep view of a microbial community.
Metagenomics has been used to study microbial communities in virtually all ecosystems, ranging from the deep ocean and subsurface sediments, to soil, bioreactors and the human gut.
In gene-centric metagenomics, one specifically focuses on the detection and identification of protein-coding genes, or fragments of such genes,
in order to determine the likely biochemical functions or metabolic potential of resident microbes. This information becomes very powerful when combined with experimental treatments, or surveys across space or time.
For example, one can use gene-centric metagenomics to determine the proportions of microbes capable of photosynthesis, iron respiration or denitrification,
or the prevalence of antibiotic resistance genes, separately in each sample. One could then examine whether these variables change across an environmental gradient, or between patient treatment groups.
It is important to note that these "functional profiles" summarize the overall microbial community in each sample, and do not resolve which taxon happens to encode which biochemical function.
For many studies this is plenty and sufficient information. If an identification of individual taxa involved with specific biochemical functions is needed, one can instead use genome-resolved metagenomics, which is a separate analysis that we also offer.
A typical gene-centric metagenomic study proceeds as follows:
Collection of small amounts (<1 g) of material from each sample by the researcher.
Extraction of DNA from each sample using an in-house of commercial kit. This step is sometimes outsourced to an academic or commercial service provider.
DNA fragment size selection, library preparation and sequencing of the fragments. This step is commonly performed by an academic or commercial service provider. The most widespread technology is short read Illumina sequencing, which yields large numbers of sequences around 150-300 bp long.
Sequencing ultimately yields a separate set of DNA sequences for each sample, ranging from thousands to billions of sequences per sample, with each sequence covering some random part of some genome. These data are commonly stored in fastq files, which are delivered by the sequencing service provider to the researcher.
Computational analysis of the sequences, including trimming and removal of poor quality (i.e., likely erroneous) sequences, assembly of sequences into longer contiguous segments ("contigs"), detection and annotation of protein-coding genes, and estimation of gene abundances based on reads mapped to contigs.
Statistical analysis, hypothesis testing and visualization of gene/functional profiles. This step generally incorporates additional sample metadata, such as information about treatment groups, chemical measurements at each site, disease symptoms in human subjects, and so on.
Caveats and limitations: Before using gene-centric metagenomics for your project, you should consider the following caveats and limitations:
Gene-centric metagenomics is not the most accurate nor cost-effective option for taxonomic profiling, i.e., determining which taxa are present in your system. If the latter is your goal then you should consider metabarcoding (e.g., 16S or 18S), both of which we can also help you with.
Gene-centric metagenomics does not yield information on which taxa exhibit which biochemical pathways, i.e., the functional profiles represent the overall genetic content of the community.
Functional identification of detected genes only work as well as our existing gene reference databases; a lot of genes found in nature remain unaccounted for in databases or have unknown functions.
We are eager to help you with your metagenomic analysis. Simply configure the analysis to your preferences, upload your raw sequences and metadata, and we can handle it from there.
▸ Overview of provided analysis
This analysis starts with raw short-read Illumina metagenomic sequences, which are provided by the client and typically obtained from a sequencing service provider.
We deliver a summary report and key data products for presentations and downstream investigations.
Main steps and deliverables:
Basic quality filtering and trimming of sequences to improve overall data quality.
Assembly of reads into contiguous sequences (contigs).
Detection, functional annotation and quantification of protein-coding genes.
Multiple common pairwise dissimilarity metrics (aka. β-diversities) are computed between samples, measuring the differences in genetic compositions.
Two-dimensional multidimensional scaling is performed based on the pairwise dissimilarities.
Common visualizations, such as barplots of relative gene abundances and MDS plots.
In addition, we also provide a thorough Materials & Methods writeup for use in your publications.
▸ Input requirements
Short-read DNA sequences from an Illumina next-generation sequencing platform or similar. Sequences must be provided as demultiplexed fastq files, one file per sample and per read direction. For paired-end reads, you will thus need to provide two fastq files per sample.
Metadata for all samples, in the form of a table file (e.g., CSV or Excel). This table must at the very least specify sample IDs.
▸ Examples of data products
EC_proportions_vs_sample.tsv
Table listing relative abundances of genes associated with various Enzyme Commission numbers in each sample.
KEGG_B_proportions_vs_sample.tsv
Table listing relative abundances of KEGG level-B gene groups in each sample.
▸ Examples of generated figures
×<>
▸ Used 3rd party resources
Main databases and software used in this analysis:
Habibi-Soufi, H., Porch, R., Korchagina, M. V., Abrams, J. A., Schnider, J. S., Carr, B. D. et al. (2024). Taxonomic variability and functional stability across Oregon coastal subsurface microbiomes. Communications Biology 7:1663
Habibi-Soufi, H., Tran, D., Louca, S. (2024). Microbiology of Big Soda Lake, a multi-extreme meromictic volcanic crater lake in the Nevada desert. Environmental Microbiology 26:e16578
Louca, S., Jacques, S. M. S., Pires, A. P. F., Leal, J. S., Srivastava, D. S., Parfrey, L. W. et al. (2016). High taxonomic variability despite stable functional structure across microbial communities. Nature Ecology & Evolution 1:0015
▸ Price and billing
Price starts at $50 base + $5 per sample + applicable taxes. Final price may differ depending on user settings, and will be available prior to order submission. Log in to see availability and payment modalities.