Bioinformatics Tools

Wednesday, February 1, 2012

Working with multiple urls from text file to tabs and vice versa

Copy or save multiple tabs to a text file

There are various methods; I use this one for Firefox:

1. Install send-tab-urls add-on to Firefox https://addons.mozilla.org/en-US/firefox/addon/send-tab-urls/

2. Open all the urls in different tabs.

3. Go to Files --> Send tab urls --> select your options and send to clipboard

4. Url open in all tabs will be copied to your clipboard, you can just paste them to a text file and save them or you can reuse them for opening in multiple tabs again.

Use: While looking for published articles on Pubmed or on Google, you need to save the relevant article list search wise as a text file, and then reuse them whenever required. At home I did not have accesses to various journals, so I used to save the links in text files and then at institute just open and save all the articles required.

Open multiple urls in different tabs from a text file

1. Copy all urs in a txt file

2. Open http://www.urlopener.com/index.php

3. Paste it in the space provided

4. Click submit and then click open all

Use: Opening all the urls in one go, for faster work. It would be logical to open a new window in Firefox to do this so that you won’t clutter your ongoing work with the multiple urls

Meanwhile I also found a very useful tool for common comparisons or lists and making Venn diagrams

Please cite: Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html

Happy surfing.

Sunday, January 1, 2012

Homology modeling of proteins

CPHmodels: http://www.cbs.dtu.dk/services/CPHmodels/ CPHmodels-3.0 is a web-server predicting protein 3D-structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6? When superimposed to the 3D-structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D-structure with an average RMSD of 9.3?. These performance values place the CPHmodels-3.0 method in the group of high performing 3D-prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is less than 20 minutes. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.

MODELLER: http://www.salilab.org/modeller/ MODELLER is used for homology or comparative modeling of protein three-dimensional structures (1,2). The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms. MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints (3,4), and can perform many additional tasks, including de novo modeling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, etc. MODELLER is available for download for most Unix/Linux systems, Windows, and Mac.

SWISS-MODEL: http://swissmodel.expasy.org/ SWISS-MODEL is a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make Protein Modeling accessible to all biochemists and molecular biologists worldwide.

Phyre2: http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index (Protein Homology/AnalogY Recognition Engine; pronounced as 'fire') are web-based services for protein structure prediction that are free for non-commercial use. Phyre is among the most popular methods for protein structure prediction having been cited over 1000 times. Like other remote homology recognition techniques (see protein threading), it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed (funded by the BBSRC) to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

HHpred : http://toolkit.tuebingen.mpg.de/hhpred the primary aim in developing HHpred was to provide biologists with a method for sequence database searching and structure prediction that is as easy to use as BLAST or PSI-BLAST and that is at the same time much more sensitive in finding remote homologs. In fact, HHpred's sensitivity is competitive with the most powerful servers for structure prediction currently available. HHpred is the first server that is based on the pair wise comparison of profile hidden Markov models (HMMs). Whereas most conventional sequence search methods search sequence databases such as UniProt or the NR, HHpred searches alignment databases, like Pfam or SMART. This greatly simplifies the list of hits to a number of sequence families instead of a clutter of single sequences. All major publicly available profile and alignment databases are available through HHpred. HHpred accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in an easy-to-read format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template sequence alignments, merged query-template multiple alignments (e.g. for transitive searches), as well as 3D structural models calculated by the MODELLER software from HHpred alignments.

LOMATES: http://zhanglab.ccmb.med.umich.edu/LOMETS/ LOMETS (Local Meta-Threading-Server) is an on-line web service for protein structure prediction. It generates 3D models by collecting high-scoring target-to-template alignments from 8 locally-installed threading programs (FUGUE, HHsearch, MUSTER, PPA, PROSPECT2, SAM-T02, SPARKS, SP3). A detailed description of the server can be seen in the Readme file.

MODBASE: MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5 152 695 reliable models for domains in 1 593 209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/).

Robetta: http://www.robetta.org/ Robetta provides both ab initio and comparative models of protein domains. It uses the ROSETTA fragment insertion method (Simons et al. (1997) J Mol Biol. 268:209-225). Domains without a detectable PDB homolog are modeled with the Rosetta de novo protocol (Bonneau et al. (2002) J Mol Biol. 322:65-78). Comparative models are built from Parent PDBs detected by UW-PDB-BLAST or HHSEARCH and aligned by various methods which include HHSEARCH, Compass, and Promals. Loop regions are assembled from fragments and optimized to fit the aligned template structure (Rohl et al. (2004) Proteins 55:656-677). The procedure is fully automated. Robetta is evaluated in the blind benchmarking experiment CASP. Robetta uses ROSETTA software which is developed and maintained by the Rosetta Commons.

chunk-TASSER: http://cssb.biology.gatech.edu/skolnick/webservice/chunk-TASSER/index.html A protein structure prediction method that combines threading templates from SP3 and ab initio folded chunk structures (three consecutive segments of regular secondary structures). It is better for extreme hard targets

PSiFR (Protein Structure and Function predicton Resource) http://psifr.cssb.biology.gatech.edu/ provides integrated tools for protein tertiary structure prediction and structure and sequence-based function annotation. The details of various methods used are described below:

Protein structure prediction methods

TASSER (Threading/ASSembly/Refinement) is an automated protein structure prediction and modeling method. TASSER employs a hierarchical approach consisting of template identification by threading, followed by tertiary structure assembly by rearranging continuous template fragments (Zhang, Y. and Skolnick, J., 2004, PNAS).

TASSER-Lite is a comparative protein tertiary structure modeling tool. It is presently optimized for the modeling of single domain (41-200 residues) homologous protein sequences; that is, proteins with a sequence identity greater than 25% with respect to the best threading template (Pandit et. al., 2006, Biophysical Journal). The templates for the modeling of the query sequence are identified using the threading program PROSPECTOR_3 (Skolnick et. al., 2004, Proteins). Subsequently, the structure is refined using TASSER program with optimized parameters.

METATASSER is a protein tertiary prediction method that employs the 3D-Jury approach to select threading templates from SPARKS2 (Zhou H. and Zhou Y., 2004, Proteins), SP3 ( Zhou H. and Zhou Y., 2005, Proteins) and PROSPECTOR_3 (Skolnick et. al., 2004, Proteins), which provides aligned fragments and tertiary restraints as an input to TASSER procedure to generate full-length models. In the CASP7 and CASP8 assessment of server performance, METATASSER is among the top performing servers (Zhou et. al, 2007, Proteins; Zhou et al., 2009, Proteins (in press)).

ESyPred3D: http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/ ESyPred3D is a new automated homology modeling program. The method gets benefit of the increased alignment performances of a new alignment strategy using neural networks. Alignments are obtained by combining, weighting and screening the results of several multiple alignment programs. The final three dimensional structure is built using the modeling package MODELLER.

Protein Model Portal (PMP): http://www.proteinmodelportal.org/ PMP gives access to various models computed by comparative modeling methods provided by different partner sites, and provides access to various interactive services for model building, and quality assessment.

ProModel: http://www.vlifesciences.com/products/VLifeMDS/Protein_Modeller.php ProModel is a complete package for modeling proteins, whose crystal structure is unknown based on the amino acid sequences of a close homologue. ProModel allows homology modeling from either a selected template or a user defined template. Users can perform an automated homology modeling simply by reading in the template file or can perform a knowledge based manual modeling by specific loop insertions or by changing specific amino acid residues. A local BLAST speeds up the process of modeling. ProModel enables an exhaustive analysis of the target protein structure, active site and channels. The user can conveniently view, edit and superimpose proteins with ProModel. Facilities to distribute the secondary structure elements, distribute the Phi-Psi angles of residues in Ramachandran plot, identify and visualize cavities and channels make it a very useful product. ProModel is available for both Linux and Windows® operating systems.

SCWRL4: http://dunbrack.fccc.edu/scwrl4/index.php SCWRL4 is based on a new algorithm and new potential function that results in improved accuracy at reasonable speed. This has been achieved through: 1) a new backbone-dependent rotamer library based on kernel density estimates; 2) averaging over samples of conformations about the positions in the rotamer library; 3) a fast anisotropic hydrogen bonding function; 4) a short-range, soft van der Waals atom-atom interaction potential; 5) fast collision detection using k-discrete oriented polytopes; 6) a tree decomposition algorithm to solve the combinatorial problem; and 7) optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. Accuracies as a function of electron density of the side chains demonstrate that side chains with higher electron density are easier to predict than those with low electron density and presumed conformational disorder. For a testing set of 379 proteins, 86% of chi1 angles and 75% of chi1+2 are predicted correctly within 40 degrees of the X-ray positions. Among side chains with higher electron density (25th-100th percentile), these numbers rise to 89% and 80%. The new program maintains its simple command-line interface, designed for homology modeling. To achieve higher accuracy, SCWRL4 is somewhat slower than SCWRL3 when run in the default flexible rotamer model (FRM) by a factor of 3-6, depending on the protein. When run in the rigid rotamer model (RRM), SCWRL4 is about the same speed as SCWRL3. In both cases, SCWRL4 will converge on very large proteins or protein complexes or those with very dense interaction graphs, while SCWRL3 sometimes would not. The SCWRL4 paper has been published in Proteins: Structure, Function, Bioinformatics. A reprint is available. Please cite the paper: G. G. Krivov, M. V. Shapovalov, and R. L. Dunbrack, Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins (2009).

VADAR: http://vadar.wishartlab.com/ VADAR (Volume, Area, Dihedral Angle Reporter) is a compilation of more than 15 different algorithms and programs for analyzing and assessing peptide and protein structures from their PDB coordinate data. The results have been validated through extensive comparison to published data and careful visual inspection. The VADAR web server supports the submission of either PDB formatted files or PDB accession numbers. VADAR produces extensive tables and high quality graphs for quantitatively and qualitatively assessing protein structures determined by X-ray crystallography, NMR spectroscopy, 3D-threading or homology modelling. Please cite the following: Leigh Willard, Anuj Ranjan,Haiyan Zhang,Hassan Monzavi, Robert F. Boyko, Brian D. Sykes, and David S. Wishart "VADAR: a web server for quantitative evaluation of protein structure quality" Nucleic Acids Res. 2003 July 1; 31 (13): 3316.3319

IntFOLD : http://www.reading.ac.uk/bioinf/IntFOLD/ The IntFOLD server provides a unified interface for Tertiary structure prediction/3D modeling, 3D model quality assessment, Intrinsic disorder prediction, Domain prediction, Prediction of protein-ligand binding residues

PEPstr: http://www.imtech.res.in/raghava/pepstr/ The Pepstr server predicts the tertiary structure of small peptides with sequence length varying between 7 to 25 residues. The prediction strategy is based on the realization that β-turn is an important and consistent feature of small peptides in addition to regular structures. Thus, the methods uses both the regular secondary structure information predicted from PSIPRED and β-turns information predicted from BetaTurns. The side-chain abgles are placed using standard backbone-dependent rotamer library. The structure is further refined with energy minimization and molecular dynamic simulations using Amber version6.

BSR: http://cssb.biology.gatech.edu/BSR Binding Site Refinement employs a new template-based method for the local refinement of ligand-binding regions in protein models using closely as well as distantly related templates identified by threading. A Support Vector Regression (SVR) model is used to select likely correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for a specific chemical environment within a binding site and optimize the interactions with putative ligands.

KeyRecep: http://www.immd.co.jp/en/product_2.html KeyRecep is the best-suited solution for rational molecular design when the 3D structure of the target protein is unknown. Users can estimate the characteristics of the binding site of the target protein by superposing multiple active compounds in 3D space so that the physicochemical properties of the compounds match maximally with each other. (Estimation of virtual receptor model) Users can also examine relationship between chemical structures and the activities based on the multiple regression analysis with indices of conformity of each compound to the virtual receptor model and the activity values. (3D-SAR function) For compounds whose activities are unknown, users can estimate the activities based on the indices of conformity to the virtual receptor model and can perform virtual screening. (DB search function) KeyRecep rationally and strategically accelerates the molecular design projects based on hit compounds discovered by high throughput screening (HTS) or based on information on compounds from literature or patents. KeyRecep facilitates the structural expansion of such compounds to obtain lead compounds and further drug candidates.

PROTEUS2: http://wks16338.biology.ualberta.ca/proteus2/ PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane β-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure-based mapping, hidden Markov models, multi-component neural nets and up-to-date databases of known secondary structure assignments, PROTEUS2 is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2=94%), membrane spanning helices (Q2=87%) and secondary structure (Q3 score of 81.3% ). PROTEUS2's homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL (within 0.2 Å RMSD). The average PROTEUS2 prediction takes ~2 minutes per query sequence. Source code is also freely available here.

PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/ is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST). Using a very stringent cross validation method to evaluate the method's performance, PSIPRED 2.6 achieves an average Q3 score of 80.7%. Predictions produced by PSIPRED were also submitted to the CASP4 evaluation and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q3 score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which ranked PSIPRED top out of 20 evaluated methods (an earlier version of PSIPRED was also ranked top in CASP3 held in 1998). It is important to realize, however, that due to the small sample sizes, the results from CASP are not statistically significant, although they do give a rough guide as to the current "state of the art". For a more reliable evaluation, the EVA web site at Columbia University provides a continuous evaluation. Also see the EVA servlet to visualize a breakdown of specific types of errors made by PSIPRED and other secondary structure prediction methods. NOTE that at the time of writing, the EVA site is no longer being updated. The PSIPRED V2.6 software can be downloaded from HERE. Please note that you should read the license terms given in the README file if you wish to incorporate PSIPRED in another program or Web server. Older releases of PSIPRED can be downloaded here HERE.

I-TASSER : http://zhanglab.ccmb.med.umich.edu/I-TASSER/ server is an Internet service for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER assembly simulations; function insights are then derived by matching the predicted models with protein function databases. I-TASSER (as 'Zhang-Server') was ranked as the No 1 server for protein structure prediction in recent CASP7, CASP8 and CASP9 experiments. It was also ranked as the best for function prediction in CASP9. The server is in active development with the goal to provide the most accurate structural and functional predictions using state-of-the-art algorithms.

JPred: http://www.compbio.dundee.ac.uk/www-jpred/ Jpred is a Protein Secondary Structure Prediction server and has been in operation since approximately 1998. Jpred incorporates the Jnet algorithm in order to make more accurate predictions. In addition to protein secondary structure Jpred also makes predictions on Solvent Accessibility and Coiled-coil regions (Lupas method). The current version of Jpred (v3) follows on from previous versions of Jpred developed and maintained by James Cuff and Jonathan Barber

Verifying your modeled protein with online servers:

Stuctural Analysis and Verification Server (SAVS): http://nihserver.mbi.ucla.edu/SAVES/ SAVS uses following servers to check the quality of the protein structures: Procheck: Checks the stereochemical quality of a protein structure by analyzing residue-by-residue geometry and overall structure geometry. [Reference] What_Check: Derived from a subset of protein verification tools from the WHATIF program (Vriend, 1990), this does extensive checking of many sterochemical parameters of the residues in the model. [Reference] ERRAT: Analyzes the statistics of non-bonded interactions between different atom types and plots the value of the error function versus position of a 9-residue sliding window, calculated by a comparison with statistics from highly refined structures. [Reference] Verify3D: Determines the compatibility of an atomic model (3D) with its own amino acid sequence (1D) by assigned a structural class based on its location and environment (alpha, beta, loop, polar, nonpolar etc) and comparing the results to good structures. [Reference] Prove: Calculates the volumes of atoms in macromolecules using an algorithm which treats the atoms like hard spheres and calculates a statistical Z-score deviation for the model from highly resolved (2.0 Å or better) and refined (R-factor of 0.2 or better) PDB-deposited structures. [Reference]

COLORADO-3D: http://asia2.genesilico.pl/colorado3d/ COLORADO-3D is a www-tool that greatly facilitates the visual analysis of various features in three-dimensional protein structures, directly at the level of the protein structure, with the aid of commonly used viewers such as RASMOL or SWISSPDBVIEWER. Among the features most important for the structural biologist that our server allows to visualize in color are potential errors in protein structure (detected by ANOLEA, PROSA, PROVE,VERIFY3D), regions buried in the protein core and inaccessible to the solvent, and regions of high or low sequence conservation (e.g. detected by RATE4SITE). In particular COLORADO3D may serve to visualize the results of assessment of the protein structure's quality at various stages of the model building and refinement (both in the case of experimental structure determination and homology modeling).

Tuesday, December 27, 2011

Circular dichroism code to help in data analysis

I was looking for some kind of code for rearranging the data I get for thermal melt from CD (Circular Dichroism). No I could not get a code to convert .jsw files to CSV in batch, neither JASCO’s Spectrum Analysis software helps on that, update me if there's batch conversion option for .jsw files to CSV. You have to convert individual .jsw files to CSV and group them in one folder. What I could get is after converting .jsw files to CSVs you can get data from all the files to one CSV file that assist in data analysis. The code given below will copy the data from all files to one files from 350nm to 200nm with the file name as a header for mdeg and tension (HV).

Steps:

1.Install python (if you do not have already http://www.python.org/getit/)

2.Copy all CSV files to one folder with their names

3. Write the name of CSV in one text file and save it as file_name.txt in the same folder as your data and code

    a.You can do this by Get to the MS-DOS prompt or the Windows command line. Navigate to the directory you wish to print the contents of. If you're new to the command line, familiarize yourself with the cd command and the dir command. Once in the directory you wish to print the contents of, type this command: dir /b > file_name.txt

    b.Open the new file created with name file_name.txt on the same folder and check for the file names and if file_name.txt is also there remove it so that you only have file names listed on the text file.

4.Copy the code given below in notepad and save it as .py file (it’s a python code) in the same folder

5.Right click on the python file and Run this code on python IDLE (press F5)

6.You will get a result file with name final_file.txt. It will be a CSV files with your data for mdeg and HV shorted from 350nm to 200nm, open it with excel. You can make changes in the code to suit your needs like if you are taking data from 200nm to 260 nm, make relevant change in the python code by changing x=range(151) to x=range(61) and then outfile.write(str(350-j)) to outfile.write(str(260-j)) respectively.

7.Hope that helps, thank Rhishikesh Bargaje (he wrote code for me) if it works, write me back if you face some problem, I can try to help.

Code:

infile = open('file_name.txt','r')

s = infile.read().split('\n')

infile.close()

outfile = open('final_file.txt','w')

outfile.write('Wavelength')

for k in s:

    for w in range(2):

        if w == 0:

            outfile.write('\t' + k.replace('.csv','').replace(' ','_') + '_mdeg')

        if w == 1:

            outfile.write('\t' + k.replace('.csv','').replace(' ','_') + '_HV')



outfile.write('\n')



x = range(151)

for j in x:

    outfile.write(str(350-j))

    for i in s:

        infile = open(i,'r')

        t = infile.read().split('XYDATA\n')

        infile.close()

        data1 = t[1].split('\n\n')[0].split('\n')[j].split(',')[1]

        data2 = t[1].split('\n\n')[0].split('\n')[j].split(',')[2]

        outfile.write('\t' + data1 + '\t' + data2)

    outfile.write('\n')

outfile.close()

##end of the code##

Alternatively, if you are acquainted with R (Download R if you haven't http://cran.r-project.org/, you can use following script to run it on R for the same result with temperature range for thermal melt from 10 degrees to 70 degrees, edit the code to customize for your use, if needed, remember that you do not have to have directory name printed for this R code and it may not work properly if there are other files in the data folder. Get acquainted with R. Thank Shrikant if you find it useful.

Code:

##Start of the code##

CSV_Files=list.files(path=".",pattern="\\.csv",full.names=FALSE);
ResultantMatrix=matrix(nrow=151);
ResultantMatrix[,1]=c(350:200);
for(i in 1:length(CSV_Files))
{
    Current_File=read.table(CSV_Files[[i]],header=FALSE,blank.lines.skip=FALSE);
    tempM=matrix(nrow=151,ncol=2);
    k=1;
    for(j in 21:171)
    {
        temp=strsplit(as.character(Current_File[j,1]),split=",");
        tempM[k,1]=temp[[1]][2];
        tempM[k,2]=temp[[1]][3];
        k=k+1;

    }
    t=as.numeric(gsub(".*(\\d+.+?)\\.csv","\\1",CSV_Files[[i]]))+9;
    colnames(tempM)=c(t,t);
    ResultantMatrix=cbind(ResultantMatrix,tempM);

}

write.csv(ResultantMatrix,file="Result.csv");

##End of the code##

Sunday, December 4, 2011

Protein-Protein Docking Servers

I was looking for protein-protein docking servers to use in my study, here is the list of online servers that are commonly used and are popular. There are other software giving good result for protein-protein docking, I have not listed them here as I am still trying to compile and I would put it here as soon as I am done with the list. Have fun.

ClusPro: (http://nrc.bu.edu/cluster) represents the first fully automated, web-based program for the computational docking of protein structures. Users may upload the coordinate files of two protein structures through ClusPro's web interface, or enter the PDB codes of the respective structures, which ClusPro will then download from the PDB server (http://www.rcsb.org/pdb/). The docking algorithms evaluate billions of putative complexes, retaining a preset number with favorable surface complementarities. A filtering method is then applied to this set of structures, selecting those with good electrostatic and desolvation free energies for further clustering. The program output is a short list of putative complexes ranked according to their clustering properties, which is automatically sent back to the user via email.

RosettaDock: The RosettaDock protein-protein docking server predicts the structure of protein complexes given the structures of the individual components and an approximate binding orientation. The server uses the Rosetta 2.1 protein structure modeling suite. The RosettaDock server (http://rosettadock.graylab.jhu.edu) identifies low-energy conformations of a protein–protein interaction near a given starting configuration by optimizing rigid-body orientation and side-chain conformations. The server requires two protein structures as inputs and a starting location for the search. RosettaDock generates 1000 independent structures, and the server returns pictures, coordinate files and detailed scoring information for the 10 top-scoring models. A plot of the total energy of each of the 1000 models created shows the presence or absence of an energetic binding funnel. RosettaDock has been validated on the docking benchmark set and through the Critical Assessment of PRedicted Interactions blind prediction challenge.

ZDOCK, RDOCK: ZDOCK uses a fast Fourier transform to search all possible binding modes for the proteins, evaluating based on shape complementarity, desolvation energy, and electrostatics. The top 2000 predictions from ZDOCK are then given to RDOCK where they are minimized by CHARMM to improve the energies and eliminate clashes, and then the electrostatic and desolvation energies are recomputed by RDOCK (in a more detailed fashion than the calculations performed by ZDOCK). We then tested these programs with a benchmark of 49 non-redundant unbound test cases, where we identified a near-native structure (within 2.5 angstrom from the experimental structure) as the top prediction for 37% of the test cases, and within the top 4 predictions for 49% of the test cases. The superior performance of ZDOCK and RDOCK has also been demonstrated in a community-wide protein docking blind test, CAPRI. Check this out for more details. All software, as well as the benchmark is freely available to academic users. For basic information on running ZDOCK, see this site.

GPU.proton.DOCK: (Genuine Protein Ultrafast proton equilibria consistent DOCKing) is a state of the art service for in silico prediction of protein–protein interactions via rigorous and ultrafast docking code. It is unique in providing stringent account of electrostatic interactions self-consistency and proton equilibria mutual effects of docking partners. GPU.proton.DOCK is the first server offering such a crucial supplement to protein docking algorithms—a step toward more reliable and high accuracy docking results. The code (especially the Fast Fourier Transform bottleneck and electrostatic fields computation) is parallelized to run on a GPU supercomputer. The high performance will be of use for large-scale structural bioinformatics and systems biology projects, thus bridging physics of the interactions with analysis of molecular networks. We propose workflows for exploring in silico charge mutagenesis effects. Special emphasis is given to the interface-intuitive and user-friendly. The input is comprised of the atomic coordinate files in PDB format. The advanced user is provided with a special input section for addition of non-polypeptide charges, extra ionogenic groups with intrinsic pK_a values or fixed ions. The output is comprised of docked complexes in PDB format as well as interactive visualization in a molecular viewer. GPU.proton.DOCK server can be accessed at http://gpudock.orgchm.bas.bg/.

GRAMM-X: Protein docking software GRAMM-X and its web interface (http://vakser.bioinformatics.ku.edu/resources/gramm/grammx) extend the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The web server frees users from complex installation of database-dependent parallel software and maintaining large hardware resources needed for protein docking simulations. Docking problems submitted to GRAMM-X server are processed by a 320 processor Linux cluster. The server was extensively tested by benchmarking, several months of public use, and participation in the CAPRI server track.

HexServer: HexServer (http://hexserver.loria.fr/) is the first Fourier transform (FFT)-based protein docking server to be powered by graphics processors. Using two graphics processors simultaneously, a typical 6D docking run takes ∼15 s, which is up to two orders of magnitude faster than conventional FFT-based docking approaches using comparable resolution and scoring functions. The server requires two protein structures in PDB format to be uploaded, and it produces a ranked list of up to 1000 docking predictions. Knowledge of one or both protein binding sites may be used to focus and shorten the calculation when such information is available. The first 20 predictions may be accessed individually, and a single file of all predicted orientations may be downloaded as a compressed multi-model PDB file. The server is publicly available and does not require any registration or identification by the user.

3D-Garden: a system for modelling protein–protein complexes based on conformational refinement of ensembles generated with the marching cubes algorithm. 3DGarden is an integrated software suite for performing protein-protein and protein-polynucleotide docking. For any pair of biomolecules structures specified by the user, 3DGarden's primary function is to generate an ensemble of putative complexed structures and rank them. The highest-ranking candidates constitute predictions for the structure of the complex. 3DGarden cannot be used to decide whether or not a particular pair of biomolecules interacts. Complexes of protein and nucleic acid chains can also be specified as individual interactors for docking purposes.

Wednesday, November 23, 2011

Folder list to text file, text file to folders

How to make folder with name from test file ?

You could do this:
1. Make sure all your entries are in column A of your spreadsheet.
2. Edit/copy column A
3. Click Start / Run / notepad c:\folders.txt {OK}
4. Click Edit / paste. You now have a text file with all the folder names
inside.
5. Click Start / run / cmd {OK}
6. Type this test command:
for /F "tokens=*" %* in (c:\folders.txt) do @echo md "D:\My Folders\%*"
{Enter}

If you're happy with the result, make it happen by typing this command:
for /F "tokens=*" %* in (c:\folders.txt) do @md "D:\My Folders\%*"
{Enter}

How do I print a listing of files in a directory?

    Get to the MS-DOS prompt or the Windows command line.
    Navigate to the directory you wish to print the contents of. If you're new to the command line, familiarize yourself with the cd command and the dir command.
    Once in the directory you wish to print the contents of, type one of the below commands.

    dir > print.txt

    The above command will take the list of all the files and all of the information about the files, including size, modified date, etc., and send that output to the print.txt file in the current directory.

    dir /b > print.txt

    This command would print only the file names and not the file information of the files in the current directory.

    dir /s /b > print.txt

    This command would print only the file names of the files in the current directory and any other files in the directories in the current directory.

    After doing any of the above steps the print.txt file is created. Open this file in any text editor (e.g. Notepad) and print the file. You can also do this from the command prompt by typing notepad print.txt.

Saturday, November 5, 2011

In-silico characterization of proteins

BLAST : In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. Different types of BLASTs are available according to the query sequences. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman, and Webb Miller at the NIH and was published in the Journal of Molecular Biology in 1990

CDD search: Conserved Domain Database (CDD) is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins. These are available as position-specific score matrices (PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST. CDD content includes NCBI-curated domains, which use 3D-structure information to explicitly to define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM).

PFAM: The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families. Although these Pfam-A entries cover a large proportion of the sequences in the underlying sequence database, in order to give a more comprehensive coverage of known proteins we also generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans. A clan is a collection of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM.

TMHMM: A variety of tools are available to predict the topology of transmembrane proteins. To date no independent evaluation of the performance of these tools has been published. A better understanding of the strengths and weaknesses of the different tools would guide both the biologist and the bioinformatician to make better predictions of membrane protein topology.

SignalP: SignalP 4.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

STRING: STRING is a database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources i.e. Genomic context, high throughput experiments, coexpression, previous knowledge. STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 5'214'234 proteins from 1133 organisms.

PROTPARAM: ProtParam (References / Documentation) is a tool which allows the computation of various physical and chemical parameters for a given protein stored in Swiss-Prot or TrEMBL or for a user entered sequence. The computed parameters include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY)

PROSITE: Search your query sequence for protein motifs, rapidly compare your query protein sequence against all patterns stored in the PROSITE pattern database and determine what the function of an uncharacterised protein is. This tool requires a protein sequence as input, but DNA/RNA may be translated into a protein sequence using transeq and then queried.

InterPro: InterPro is an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.

GlobPlot Webservice:

GlobPlot webservice - link to GlobPlot WSDL file.

Prediction of disorder:

DisEMBL - DisEMBL is our neural network based predictor.
DISOPRED - Predictor from David Jones' lab.

Function prediction in non-globular protein space:

ELM - The Eukaryotic Linear Motif Resource.
NetworKIN - Systematic Discovery of In Vivo Phosphorylation Networks.

Thesis on disorder and linear motifs

See my homepage

Function prediction in globular protein space:

SMART - SMART/Pfam domains

Domain boundaries:

DomCut - A domain boundary detector
DomPred - Domain predictor from David Jones' lab.

Synthetic Biology

Synthetic Biology Project @ SLRI - Applying GlobPlot.

Resources

Subcellular localization predictors:

CELLO (Yu et al, 2004)

ESLPred (Bhasin and Raghava, 2004)

LOCnet and LOCtarget (Nair and Rost, 2004)

LOCSVMPSI (Xie et al, 2005, NAR in press)

NucPred (Heddad et al, 2004)

Predotar

SecretomeP (Bendtsen et al, 2004)

SignalP (Bendtsen et al, 2004)

SubLoc (Hua and Sun, 2001)

TargetP (Emanuelsson et al, 2000)

Subcellular localization databases:

Wednesday, February 24, 2010

Weight to Molar Quantity (for proteins)

For conversion of protein from weight to molar concentration. Here is the tool

Saturday, May 16, 2009

Wolfram|Alpha: Future of search and analysis

This is something very interesting I came across recently. I have just seen the DEMO which is so mind blowing, actual site is located here Wolfram|Alpha. God!!! we are a growing species, I just have to say that it is just mind blowing....marvelous..yes we human ...mind is the speck over the hell to achieve unimaginable through the labyrinthine network of our neuronal circuit ..its evitable the more guttural pouch will emerge like this..which will be more useful in days to come...This is the future of the way we look at the internet today, the search we do over internet, The search results and their analysis, which otherwise takes us to 100s of pages with analysis for the required details over each page. Anyways, this Wolfram|Alpha looks so good and I am wondering that it came bit late. Very amazing piece of work, I am sure this is going to be very very useful to almost every internet user. I am also hopeful that as in comparison to the Google search results ,this gives more analyzed results. And no I am not comparing these two. Lots of mathematical calculation, lots of general work analysis. I am sure Wolfram|Alpha people are and will be having tough time maintaining such a brilliant technology.

Tuesday, March 31, 2009

miRex: A web based resource for miRNA expression profiles

Background: A few hundred miRNAs carry the potential to regulate thousands of target genes in eukaryotes. The expression profiles of miRNAs convey important information regarding tissue specific gene expression and can be used as a biomarker for disease progression and cancer classification among other rational interpretations pertaining to miRNA-gene interactions. There are several individual reports of miRNA expression profiling; however there is a lack of server that can render cross-comparison of all these datasets.

Description: We have developed miRex, a database and analysis tool for comparing miRNA expression profiles generated by high-throughput methods. Currently data from public repositories have been pre-normalized and provided with visual representation to aid comparison between experiments. miRNA ID converter: a tool for mapping miRNA IDs from one system of nomenclature to another has also been included.

Data: Currently, 614 experiments spanning 25 datasets deposited in Gene Expression Omnibus (GEO),the public repository for high-throughput gene expression data hosted by NCBI and 1132 experiments from 18 datasets from ArrayExpress, another resource for expression data, is available through miRex. Besides the microarray based data, there is a set of 40 experiments carried out by real time PCR.

URL: miRex is available at http://miracle.igib.res.in/mirex/

Wednesday, September 24, 2008

Reverse Complement

Reverse Complement

Reverse Complement is commonly used in Bioinformatics for various purposes. Here is the tool that does the job without much effort, there are simple Perl programs that could be run locally for the purpose. This tool is provided by GENE INFINITY, this can also do reverse and complementary separately. Hope this helps, the tool is located here, Reverse Complement

Protein Blast against another set of proteins

Protein Blast against another set of proteins

This tool is provided by NCBI/ BLAST/ blastp suite: BLASTP programs search protein databases using a protein query.This gives BLAST of a query protein against a set of other proteins. I found it useful when you don't wish to BLAST your query against whole protein database, instead a set of proteins given by the user. This tool is located here, Protein Blast against another set of proteins

PeptideCutter

PeptideCutter: http://expasy.org/tools/peptidecutter/

This tool is provided by ExPASy. This predicts potential cleavage sites cleaved by proteases or chemicals in a given protein sequence.

PeptideCutter returns the query sequence with the possible cleavage sites mapped on it and /or a table of cleavage site positions. Single or multiple enzymes can be selected for the purpose. PeptideCutter

Predicting Antigenic Peptides

Predicting Antigenic Peptides

This is a program that predicts those segments from within a protein sequence that are likely to be antigenic by eliciting an antibody response. The method used here is the method of Kolaskar and Tongaonkar (1990).

Predictions are based on a table that reflects the occurrence of amino acid residues in experimentally known segmental epitopes. Segments are only reported if the have a minimum size of 8 residues. The reported accuracy of method is about 75%.

The program is located here Predicting Antigenic Peptides

Friday, February 1, 2008

Sequence analyzer

Sequence Massager: http://www.attotron.com/cybertory/analysis/seqMassager.htm

Nucleic Acid Sequence Massager is a very easy to use tool for convention of DNA to RNA, RNA to DNA, Upper Case to Lower Case and vice verse, Removal of FASTA format, Removal of HTML tags, Removal of number, White spaces, line breaks.

I find this tool very handy.