Bioinformatics has evolved as a great tool for molecular biologists. There are various tools available for saving time required to analyze biological materials be it DNA, RNA, Proteins, etc. I wish to list here a few of the commonly used tools. Please send me suggestions to improve the content. If you like or dislike something, let me know, your inputs matters. contact me: drsanjivk[at]gmail[dot]com
Sunday, January 1, 2012
Homology modeling of proteins
Tuesday, December 27, 2011
Circular dichroism code to help in data analysis
Steps:
1.Install python (if you do not have already http://www.python.org/getit/)
2.Copy all CSV files to one folder with their names
3. Write the name of CSV in one text file and save it as file_name.txt in the same folder as your data and code
a.You can do this by Get to the MS-DOS prompt or the Windows command line. Navigate to the directory you wish to print the contents of. If you're new to the command line, familiarize yourself with the cd command and the dir command. Once in the directory you wish to print the contents of, type this command: dir /b > file_name.txt
b.Open the new file created with name file_name.txt on the same folder and check for the file names and if file_name.txt is also there remove it so that you only have file names listed on the text file.
4.Copy the code given below in notepad and save it as .py file (it’s a python code) in the same folder
5.Right click on the python file and Run this code on python IDLE (press F5)
6.You will get a result file with name final_file.txt. It will be a CSV files with your data for mdeg and HV shorted from 350nm to 200nm, open it with excel. You can make changes in the code to suit your needs like if you are taking data from 200nm to 260 nm, make relevant change in the python code by changing x=range(151) to x=range(61) and then outfile.write(str(350-j)) to outfile.write(str(260-j)) respectively.
7.Hope that helps, thank Rhishikesh Bargaje (he wrote code for me) if it works, write me back if you face some problem, I can try to help.
Code:
infile = open('file_name.txt','r')
s = infile.read().split('\n')
infile.close()
outfile = open('final_file.txt','w')
outfile.write('Wavelength')
for k in s:
for w in range(2):
if w == 0:
outfile.write('\t' + k.replace('.csv','').replace(' ','_') + '_mdeg')
if w == 1:
outfile.write('\t' + k.replace('.csv','').replace(' ','_') + '_HV')
outfile.write('\n')
x = range(151)
for j in x:
outfile.write(str(350-j))
for i in s:
infile = open(i,'r')
t = infile.read().split('XYDATA\n')
infile.close()
data1 = t[1].split('\n\n')[0].split('\n')[j].split(',')[1]
data2 = t[1].split('\n\n')[0].split('\n')[j].split(',')[2]
outfile.write('\t' + data1 + '\t' + data2)
outfile.write('\n')
outfile.close()
##end of the code##
Alternatively, if you are acquainted with R (Download R if you haven't http://cran.r-project.org/, you can use following script to run it on R for the same result with temperature range for thermal melt from 10 degrees to 70 degrees, edit the code to customize for your use, if needed, remember that you do not have to have directory name printed for this R code and it may not work properly if there are other files in the data folder. Get acquainted with R. Thank Shrikant if you find it useful.
Code:
##Start of the code##
CSV_Files=list.files(path=".",pattern="\\.csv",full.names=FALSE);
ResultantMatrix=matrix(nrow=151);
ResultantMatrix[,1]=c(350:200);
for(i in 1:length(CSV_Files))
{
Current_File=read.table(CSV_Files[[i]],header=FALSE,blank.lines.skip=FALSE);
tempM=matrix(nrow=151,ncol=2);
k=1;
for(j in 21:171)
{
temp=strsplit(as.character(Current_File[j,1]),split=",");
tempM[k,1]=temp[[1]][2];
tempM[k,2]=temp[[1]][3];
k=k+1;
}
t=as.numeric(gsub(".*(\\d+.+?)\\.csv","\\1",CSV_Files[[i]]))+9;
colnames(tempM)=c(t,t);
ResultantMatrix=cbind(ResultantMatrix,tempM);
}
write.csv(ResultantMatrix,file="Result.csv");
##End of the code##
Sunday, December 4, 2011
Protein-Protein Docking Servers
Wednesday, November 23, 2011
Folder list to text file, text file to folders
You could do this:
1. Make sure all your entries are in column A of your spreadsheet.
2. Edit/copy column A
3. Click Start / Run / notepad c:\folders.txt {OK}
4. Click Edit / paste. You now have a text file with all the folder names
inside.
5. Click Start / run / cmd {OK}
6. Type this test command:
for /F "tokens=*" %* in (c:\folders.txt) do @echo md "D:\My Folders\%*"
{Enter}
If you're happy with the result, make it happen by typing this command:
for /F "tokens=*" %* in (c:\folders.txt) do @md "D:\My Folders\%*"
{Enter}
How do I print a listing of files in a directory?
Get to the MS-DOS prompt or the Windows command line.
Navigate to the directory you wish to print the contents of. If you're new to the command line, familiarize yourself with the cd command and the dir command.
Once in the directory you wish to print the contents of, type one of the below commands.
dir > print.txt
The above command will take the list of all the files and all of the information about the files, including size, modified date, etc., and send that output to the print.txt file in the current directory.
dir /b > print.txt
This command would print only the file names and not the file information of the files in the current directory.
dir /s /b > print.txt
This command would print only the file names of the files in the current directory and any other files in the directories in the current directory.
After doing any of the above steps the print.txt file is created. Open this file in any text editor (e.g. Notepad) and print the file. You can also do this from the command prompt by typing notepad print.txt.
Saturday, November 5, 2011
In-silico characterization of proteins
SignalP: SignalP 4.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
GlobPlot Webservice:
- GlobPlot webservice - link to GlobPlot WSDL file.
Prediction of disorder:
- DisEMBL - DisEMBL is our neural network based predictor.
- DISOPRED - Predictor from David Jones' lab.
Function prediction in non-globular protein space:
- ELM - The Eukaryotic Linear Motif Resource.
- NetworKIN - Systematic Discovery of In Vivo Phosphorylation Networks.
Thesis on disorder and linear motifs
Function prediction in globular protein space:
- SMART - SMART/Pfam domains
Domain boundaries:
Synthetic Biology
Synthetic Biology Project @ SLRI - Applying GlobPlot.
Resources
- CELLO (Yu et al, 2004)
- ESLPred (Bhasin and Raghava, 2004)
- LOCnet and LOCtarget (Nair and Rost, 2004)
- LOCSVMPSI (Xie et al, 2005, NAR in press)
- NucPred (Heddad et al, 2004)
- Predotar
- SecretomeP (Bendtsen et al, 2004)
- SignalP (Bendtsen et al, 2004)
- SubLoc (Hua and Sun, 2001)
- TargetP (Emanuelsson et al, 2000)
Wednesday, February 24, 2010
Weight to Molar Quantity (for proteins)
Saturday, May 16, 2009
Wolfram|Alpha: Future of search and analysis
Tuesday, March 31, 2009
miRex: A web based resource for miRNA expression profiles
Background: A few hundred miRNAs carry the potential to regulate thousands of target genes in eukaryotes. The expression profiles of miRNAs convey important information regarding tissue specific gene expression and can be used as a biomarker for disease progression and cancer classification among other rational interpretations pertaining to miRNA-gene interactions. There are several individual reports of miRNA expression profiling; however there is a lack of server that can render cross-comparison of all these datasets.
Description: We have developed miRex, a database and analysis tool for comparing miRNA expression profiles generated by high-throughput methods. Currently data from public repositories have been pre-normalized and provided with visual representation to aid comparison between experiments. miRNA ID converter: a tool for mapping miRNA IDs from one system of nomenclature to another has also been included.
Data: Currently, 614 experiments spanning 25 datasets deposited in Gene Expression Omnibus (GEO),the public repository for high-throughput gene expression data hosted by NCBI and 1132 experiments from 18 datasets from ArrayExpress, another resource for expression data, is available through miRex. Besides the microarray based data, there is a set of 40 experiments carried out by real time PCR.
URL: miRex is available at http://miracle.igib.res.in/mirex/
Wednesday, September 24, 2008
Reverse Complement
Reverse Complement is commonly used in Bioinformatics for various purposes. Here is the tool that does the job without much effort, there are simple Perl programs that could be run locally for the purpose. This tool is provided by GENE INFINITY, this can also do reverse and complementary separately. Hope this helps, the tool is located here, Reverse Complement
Protein Blast against another set of proteins
This tool is provided by NCBI/ BLAST/ blastp suite: BLASTP programs search protein databases using a protein query.This gives BLAST of a query protein against a set of other proteins. I found it useful when you don't wish to BLAST your query against whole protein database, instead a set of proteins given by the user. This tool is located here, Protein Blast against another set of proteins
PeptideCutter
This tool is provided by ExPASy. This predicts potential cleavage sites cleaved by proteases or chemicals in a given protein sequence.
PeptideCutter returns the query sequence with the possible cleavage sites mapped on it and /or a table of cleavage site positions. Single or multiple enzymes can be selected for the purpose. PeptideCutter
Predicting Antigenic Peptides
This is a program that predicts those segments from within a protein sequence that are likely to be antigenic by eliciting an antibody response. The method used here is the method of Kolaskar and Tongaonkar (1990).
Predictions are based on a table that reflects the occurrence of amino acid residues in experimentally known segmental epitopes. Segments are only reported if the have a minimum size of 8 residues. The reported accuracy of method is about 75%.
The program is located here Predicting Antigenic Peptides
Friday, February 1, 2008
Sequence analyzer
Nucleic Acid Sequence Massager is a very easy to use tool for convention of DNA to RNA, RNA to DNA, Upper Case to Lower Case and vice verse, Removal of FASTA format, Removal of HTML tags, Removal of number, White spaces, line breaks.
I find this tool very handy.
Wednesday, October 3, 2007
List of CSIR Institutes
Following is the list of CSIR Institutes for Life Science, Comprehensive list of the CSIR institutes can be downloaded from the link provided at the end of the list.
Center for Cellular & Molecular Biology
Central Drug Research Institute
Central Food Technological Research Institute
Central Institute of Medicinal and Aromatic Plants
Central Leather Research Institute
Indian Institute of Chemical Biology
Indian Institute of Chemical Technology
Institute of Microbial Technology
Industrial Toxicology Research Center
National Botanical Research Institute
National Environmental Engineering Research Institute
National
Indian Institute of Integrative Medicine
National Institute for Interdisciplinary Science & Technology(NIST)