Here is the list of most commonly used assembler for metagenomics reads used. The list is extensive but by no means is complete. I will try to update as soon as I come across a new one. Help me keeping the list updated if you come across any new and interesting assembler I have missed.
MetaVelvet : http://metavelvet.dna.bio.keio.ac.jp/
: An extension of Velvet assembler to de novo metagenome assembly from short
sequence reads: An important step in ‘metagenomics’ analysis is the assembly of
multiple genomes from mixed sequence reads of multiple species in a microbial
community. Most conventional pipelines use a single-genome assembler with
carefully optimized parameters. A limitation of a single-genome assembler for
de novo metagenome assembly is that sequences of highly abundant species are
likely misidentified as repeats in a single genome, resulting in a number of
small fragmented scaffolds. We extended a single-genome assembler for short
reads, known as ‘Velvet’, to metagenome assembly, which we called ‘MetaVelvet’,
for mixed short reads of multiple species. Our fundamental concept was to first
decompose a de Bruijn graph constructed from mixed short reads into individual
sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn
sub-graph as an isolate species genome. We made use of two features, the
coverage (abundance) difference and graph connectivity, for the decomposition
of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in
generating significantly higher N50 scores than any single-genome assemblers.
MetaVelvet also reconstructed relatively low-coverage genome sequences as
scaffolds. On real datasets of human gut microbial read data, MetaVelvet
produced longer scaffolds and increased the number of predicted genes. http://nar.oxfordjournals.org/content/40/20/e155.short
MetAMOS: https://github.com/marbl/metAMOS a
metagenomic assembly and analysis pipeline for AMOS: We describe MetAMOS, an
open source and modular metagenomic assembly and analysis pipeline. MetAMOS
represents an important step towards fully automated metagenomic analysis,
starting with next-generation sequencing reads and producing genomic scaffolds,
open-reading frames and taxonomic or functional annotations. MetAMOS can aid in
reducing assembly errors, commonly encountered when assembling metagenomic
samples, and improves taxonomic assignment accuracy while also reducing
computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053804/
IDBA-UD: http://www.cs.hku.hk/~alse/idba_ud
a de novo assembler for single-cell and metagenomic sequencing data with highly
uneven depth. Motivation: Next-generation sequencing allows us to sequence
reads from a microbial environment using single-cell sequencing or metagenomic
sequencing technologies. However, both technologies suffer from the problem
that sequencing depth of different regions of a genome or genomes from
different species are highly uneven. Most existing genome assemblers usually
have an assumption that sequencing depths are even. These assemblers fail to
construct correct long contigs. Results: We introduce the IDBA-UD algorithm
that is based on the de Bruijn graph approach for assembling reads from
single-cell sequencing or metagenomic sequencing technologies with uneven
sequencing depths. Several non-trivial techniques have been employed to tackle
the problems. Instead of using a simple threshold, we use multiple
depthrelative thresholds to remove erroneous k-mers in both low-depth and
high-depth regions. The technique of local assembly with paired-end information
is used to solve the branch problem of low-depth short repeat regions. To speed
up the process, an error correction step is conducted to correct reads of
high-depth regions that can be aligned to highconfident contigs. Comparison of
the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC,
SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can
reconstruct longer contigs with higher accuracy. Availability: The IDBA-UD
toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud. http://bioinformatics.oxfordjournals.org/content/28/11/1420.short
Meta-IDBA: http://www.cs.hku.hk/~alse/metaidba
a de Novo assembler for metagenomic data. Motivation: Next-generation
sequencing techniques allow us to generate reads from a microbial environment
in order to analyze the microbial community. However, assembling of a set of
mixed reads from different species to form contigs is a bottleneck of metagenomic
research. Although there are many assemblers for assembling reads from a single
genome, there are no assemblers for assembling reads in metagenomic data
without reference genome sequences. Moreover, the performances of these
assemblers on metagenomic data are far from satisfactory, because of the
existence of common regions in the genomes of subspecies and species, which
make the assembly problem much more complicated. Results: We introduce the
Meta-IDBA algorithm for assembling reads in metagenomic data, which contain
multiple genomes from different species. There are two core steps in Meta-IDBA.
It first tries to partition the de Bruijn graph into isolated components of
different species based on an important observation. Then, for each component,
it captures the slight variants of the genomes of subspecies from the same
species by multiple alignments and represents the genome of one species, using
a consensus sequence. Comparison of the performances of Meta-IDBA and existing
assemblers, such as Velvet and Abyss for different metagenomic datasets shows
that Meta-IDBA can reconstruct longer contigs with similar accuracy. Availability:
Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba.
http://bioinformatics.oxfordjournals.org/content/27/13/i94.short
Ray Meta: http://denovoassembler.sf.net:
Voluminous parallel sequencing datasets, especially metagenomic experiments,
require distributed computing for de novo assembly and taxonomic profiling. Ray
Meta is a massively distributed metagenome assembler that is coupled with Ray
Communities, which profiles microbiomes based on uniquely-colored k-mers. It
can accurately assemble and profile a three billion read metagenomic experiment
representing 1,000 bacterial genomes of uneven proportions in 15 hours with
1,024 processor cores, using only 1.5 GB per core. The software will facilitate
the processing of large and complex datasets, and will help in generating
biological insights for specific environments. Ray Meta is open source and
available at http://denovoassembler.sf.net.
: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4056372/
MAP: http://bioinfo.ctb.pku.edu.cn/MAP/
Motivation: A high-quality assembly of reads generated from shotgun sequencing
is a substantial step in metagenome projects. Although traditional assemblers
have been employed in initial analysis of metagenomes, they cannot surmount the
challenges created by the features of metagenomic data. Result: We present a de
novo assembly approach and its implementation named MAP (metagenomic assembly
program). Based on an improved overlap/layout/consensus (OLC) strategy
incorporated with several special algorithms, MAP uses the mate pair
information, resulting in being more applicable to shotgun DNA reads
(recommended as >200 bp) currently widely used in metagenome projects.
Results of extensive tests on simulated data show that MAP can be superior to
both Celera and Phrap for typical longer reads by Sanger sequencing, as well as
has an evident advantage over Celera, Newbler and the newest Genovo, for
typical shorter reads by 454 sequencing. Availability and implementation: The
source code of MAP is distributed as open source under the GNU GPL license, the
MAP program and all simulated datasets can be freely available at http://bioinfo.ctb.pku.edu.cn/MAP/.
http://bioinformatics.oxfordjournals.org/content/28/11/1455.short
Genovo: http://cs.stanford.edu/group/genovo/
: Next-generation sequencing
technologies produce a large number of noisy reads from the DNA in a sample.
Metagenomics and population sequencing aim to recover the genomic sequences of
the species in the sample, which could be of high diversity. Methods geared
towards single sequence reconstruction are not sensitive enough when applied in
this setting. We introduce a generative probabilistic model of read generation
from environmental samples and present Genovo, a novel de novo sequence
assembler that discovers likely sequence reconstructions under the model. A
nonparametric prior accounts for the unknown number of genomes in the sample.
Inference is performed by applying a series of hill-climbing steps iteratively
until convergence. We compare the performance of Genovo to three other short
read assembly programs in a series of synthetic experiments and across nine
metagenomic datasets created using the 454 platform, the largest of which has
311k reads. Genovo's reconstructions cover more bases and recover more genes
than the other methods, even for low-abundance sequences, and yield a higher
assembly score. http://online.liebertpub.com/doi/abs/10.1089/cmb.2010.0244
Extended Genovo: http://xgenovo.dna.bio.keio.ac.jp Metagenomes
present assembly challenges, when assembling multiple genomes from mixed reads
of multiple species. An assembler for single genomes can’t adapt well when
applied in this case. A metagenomic assembler, Genovo, is a de novo assembler
for metagenomes under a generative probabilistic model. Genovo assembles all
reads without discarding any reads in a preprocessing step, and is therefore
able to extract more information from metagenomic data and, in principle,
generate better assembly results. Paired end sequencing is currently widely-used
yet Genovo was designed for 454 single end reads. In this research, we
attempted to extend Genovo by incorporating paired-end information, named
Xgenovo, so that it generates higher quality assemblies with paired end reads. First,
we extended Genovo by adding a bonus parameter in the Chinese Restaurant
Process used to get prior accounts for the unknown number of genomes in the
sample. This bonus parameter intends for a pair of reads to be in the same
contig and as an effort to solve chimera contig case. Second, we modified the
sampling process of the location of a read in a contig. We used relative
distance for the number of trials in the symmetric geometric distribution
instead of using distance between the offset and the center of contig used in
Genovo. Using this relative distance, a read sampled in the appropriate
location has higher probability. Therefore a read will be mapped in the correct
location. Results of extensive experiments on simulated metagenomic datasets
from simple to complex with species coverage setting following uniform and
lognormal distribution showed that Xgenovo can be superior to the original
Genovo and the recently proposed metagenome assembler for 454 reads, MAP.
Xgenovo successfully generated longer N50 than Genovo and MAP while maintaining
the assembly quality even for very complex metagenomic datasets consisting of
115 species. Xgenovo also demonstrated the potential to decrease the
computational cost. This means that our strategy worked well. The software and
all simulated datasets are publicly available online at http://xgenovo.dna.bio.keio.ac.jp.
https://peerj.com/articles/196/
SmashCommunity: a
metagenomic annotation and analysis tool. SmashCommunity is a stand-alone
metagenomic annotation and analysis pipeline suitable for data from Sanger and
454 sequencing technologies. It supports state-of-the-art software for
essential metagenomic tasks such as assembly and gene prediction. It provides
tools to estimate the quantitative phylogenetic and functional compositions of
metagenomes, to compare compositions of multiple metagenomes and to produce
intuitive visual representations of such analyses. Availability: SmashCommunity
source code and documentation are available at http://www.bork.embl.de/software/smash:
http://bioinformatics.oxfordjournals.org/content/26/23/2977.short
Bambus 2: http://amos.sf.net. Motivation: Sequencing
projects increasingly target samples from non-clonal sources. In particular,
metagenomics has enabled scientists to begin to characterize the structure of
microbial communities. The software tools developed for assembling and analyzing
sequencing data for clonal organisms are, however, unable to adequately process
data derived from non-clonal sources. Results: We present a new scaffolder,
Bambus 2, to address some of the challenges encountered when analyzing
metagenomes. Our approach relies on a combination of a novel method for
detecting genomic repeats and algorithms that analyze assembly graphs to
identify biologically meaningful genomic variants. We compare our software to
current assemblers using simulated and real data. We demonstrate that the
repeat detection algorithms have higher sensitivity than current approaches
without sacrificing specificity. In metagenomic datasets, the scaffolder avoids
false joins between distantly related organisms while obtaining long-range
contiguity. Bambus 2 represents a first step toward automated metagenomic
assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. http://bioinformatics.oxfordjournals.org/content/27/21/2964.short
MetaCAA: https://metagenomics.atc.tcs.com/MetaCAA
A clustering-aided methodology for efficient assembly of metagenomic datasets. A
key challenge in analyzing metagenomics data pertains to assembly of sequenced
DNA fragments (i.e. reads) originating from various microbes in a given
environmental sample. Several existing methodologies can assemble reads
originating from a single genome. However, these methodologies cannot be
applied for efficient assembly of metagenomic sequence datasets. In this study,
we present MetaCAA — a clustering-aided methodology which helps in improving
the quality of metagenomic sequence assembly. MetaCAA initially groups
sequences constituting a given metagenome into smaller clusters. Subsequently,
sequences in each cluster are independently assembled using CAP3, an existing
single genome assembly program. Contigs formed in each of the clusters along
with the unassembled reads are then subjected to another round of assembly for
generating the final set of contigs. Validation using simulated and real-world
metagenomic datasets indicates that MetaCAA aids in improving the overall
quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA.
http://www.sciencedirect.com/science/article/pii/S0888754314000135