Friday, May 27, 2016

Consensus Phylogenetic trees of Fifteen Prokaryotic Aminoacyl-tRNA Synthetase Polypeptides based on Euclidean Geometry of All-Pairs Distances and Concatenation

An interesting one from my friend. 

For comments and quarries please write to  rbargaje[at]systemsbiology[dot]org 

Most molecular phylogenetic trees depict the relative closeness or the extent of similarity among a set of taxa based on comparison of sequences of homologous genes or proteins. Since the tree topology for individual monogenic traits varies among the same set of organisms and does not overlap taxonomic hierarchy, hence there is a need to generate multidimensional phylogenetic trees. Phylogenetic trees were constructed for 119 prokaryotes representing 2 phyla under Archaea and 11 phyla under Bacteria after comparing multiple sequence alignments for 15 different aminoacyl-tRNA synthetase polypeptides. The topology of Neighbor Joining (NJ) trees for individual tRNA synthetase polypeptides varied substantially. We use Euclidean geometry to estimate all-pairs distances in order to construct phylogenetic trees. Further, we used a novel 'Taxonomic fidelity' algorithm to estimate clade by clade similarity between the phylogenetic tree and the taxonomic tree. We find that, as compared to trees for individual tRNA synthetase polypeptides and rDNA sequences, the topology of our Euclidean tree and that for aligned and concatenated sequences of 15 proteins are closer to the taxonomic trees and offer the best consensus. We have also aligned sequences after concatenation, and find that by changing the order of sequence joining prior to alignment, the tree topologies vary. In contrast, changing the types of polypeptides in the grouping for Euclidean trees does not affect the tree topologies. We show that a consensus phylogenetic tree of 15 polypeptides from 14 aminoacyl-tRNA synthetases for 119 prokaryotes using Euclidean geometry exhibits better taxonomic fidelity than trees for individual tRNA synthetase polypeptides as well as 16S rDNA. We have also examined Euclidean N-dimensional trees for 15 tRNA synthetase polypeptides which give the same topology as that constructed after amalgamating 3-dimensional Euclidean trees for groups of 3 polypeptides. Euclidean N-dimensional trees offer a reliable future to multi-genic molecular phylogenetics.

Prediction of peptidoglycan hydrolases- a new class of antibacterial proteins

Recent article from our lab: 

 Read more at:

 For comments and quarries please write to me or ashok[at]


The efficacy of antibiotics against bacterial infections is decreasing due to the development of resistance in bacteria, and thus, there is a need to search for potential alternatives to antibiotics. In this scenario, peptidoglycan hydrolases can be used as alternate antibacterial agents due to their unique property of cleaving peptidoglycan cell wall present in both gram-positive and gram-negative bacteria. Along with a role in maintaining overall peptidoglycan turnover in a cell and in daughter cell separation, peptidoglycan hydrolases also play crucial role in bacterial pathophysiology requiring development of a computational tool for the identification and classification of novel peptidoglycan hydrolases from genomic and metagenomic data.


In this study, the known peptidoglycan hydrolases were divided into multiple classes based on their site of action and were used for the development of a computational tool ‘HyPe’ for identification and classification of novel peptidoglycan hydrolases from genomic and metagenomic data. Various classification models were developed using amino acid and dipeptide composition features by training and optimization of Random Forest and Support Vector Machines. Random Forest multiclass model was selected for the development of HyPe tool as it showed up to 71.12 % sensitivity, 99.98 % specificity, 99.55 % accuracy and 0.80 MCC in four different classes of peptidoglycan hydrolases. The tool was validated on 24 independent genomic datasets and showed up to 100 % sensitivity and 0.94 MCC. The ability of HyPe to identify novel peptidoglycan hydrolases was also demonstrated on 24 metagenomic datasets.


The present tool helps in the identification and classification of novel peptidoglycan hydrolases from complete genomic or metagenomic ORFs. To our knowledge, this is the only tool available for the prediction of peptidoglycan hydrolases from genomic and metagenomic data.


Peptidoglycan hydrolase N-acetylglucosaminidase N-acetylmuramidases Lytic transglycosylases Endopeptidase N-acetylmuramoyl-L-alanine Carboxypeptidase Cell wall hydrolases Support Vector Machine Random Forest

Reconstruction of bacterial and viral genomes from multiple metagenomes

Recent article from our lab:

Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments.

For comments and queries please write to me or ankitgmeister[at]gmail[dot]com.