The sequencing of 16S rRNA gene is commonly performed to estimate the microbial diversity in a metagenomic study. The rapid developments in genome sequencing technologies have shifted the focus on sequencing the selected hypervariable regions (HVRs) of 16S rRNA gene instead of sequencing the complete gene. The recent metagenomic projects involve the sequencing of only a single HVR or a combination of two or more HVRs. At present there is no specialized method available for the correct identification and classification of species using short variable 16S rRNA sequences. Therefore, we have developed 16S Classifier using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed the precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the first test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.
16S classifier displayed up to 42.9%, 40.7%, 41.0%, 57.9% and 73.8% higher accuracy at phylum, class, order, family and genus levels, respectively, as compared to the commonly used RDP classifier program. In addition, it is 7.5 times faster than RDP Classifier and 800 times faster than BLAST. 16S classifier can be easily used with the QIIME pipeline which is commonly used for the 16S rRNA analysis.
To the best of our knowledge, 16S Classifier is the only available tool which can carry out the efficient, sensitive and accurate taxonomic assignment of any of the 16S rRNA hypervariable regions which are commonly used in metagenomic projects. In the case of complete 16S rRNA also, it displayed exceptional (precision of 0.97) performance on the test dataset. Thus, the wide usage of this tool is anticipated in different metagenomic projects. 16S
Instructions for running the stand-alone version of 16S Classifier on the Linux PC.
1. User can download zip file of a particular hypervariable region or complete 16S, which is freely available at http://metagenomics.iiserb.ac.in/16Sclassifier/download.html
2. Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16sclassifier.exe).
Other dependencies:
1. User has to install R from the following link http://cran.r-project.org/
2. intall Randomforest by typing the following commands in terminal R and install.packages ('randomForest')
# Command line usage #
./16sclassifier.exe 'queryfile' 'modelname'
The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and Complete16S.