BLAST : In
bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm
for comparing primary biological sequence information, such as the amino-acid
sequences of different proteins or the nucleotides of DNA sequences. A BLAST
search enables a researcher to compare a query sequence with a library or
database of sequences, and identify library sequences that resemble the query
sequence above a certain threshold. Different types of BLASTs are available
according to the query sequences. For example, following the discovery of a
previously unknown gene in the mouse, a scientist will typically perform a
BLAST search of the human genome to see if humans carry a similar gene; BLAST
will identify sequences in the human genome that resemble the mouse gene based
on similarity of sequence. The BLAST program was designed by Eugene Myers,
Stephen Altschul, Warren Gish, David J. Lipman, and Webb Miller at the NIH and
was published in the Journal of Molecular Biology in 1990
CDD search:
Conserved Domain
Database (CDD) is a protein annotation resource that consists of a
collection of well-annotated multiple sequence alignment models for ancient
domains and full-length proteins. These are available as position-specific
score matrices (PSSMs)
for fast identification of conserved domains in protein sequences via RPS-BLAST.
CDD
content includes NCBI-curated
domains, which use 3D-structure
information to explicitly to define domain boundaries and provide insights into
sequence/structure/function
relationships, as well as domain models imported from a number of external
source databases (Pfam, SMART, COG, PRK, TIGRFAM).
PFAM: The Pfam database is
a large collection of protein families, each represented by multiple
sequence alignments and hidden Markov models (HMMs). Proteins are
generally composed of one or more functional regions, commonly termed domains.
Different combinations of domains give rise to the diverse range of proteins
found in nature. The identification of domains that occur within proteins can
therefore provide insights into their function. There are two components to
Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually
curated families. Although these Pfam-A entries cover a large proportion of the
sequences in the underlying sequence database, in order to give a more
comprehensive coverage of known proteins we also generate a supplement using
the ADDA
database. These automatically generated entries are called Pfam-B.
Although of lower quality, Pfam-B families can be useful for identifying
functionally conserved regions when no Pfam-A entries are found. Pfam also
generates higher-level groupings of related families, known as clans. A
clan is a collection of Pfam-A entries which are related by similarity of
sequence, structure or profile-HMM.
TMHMM: A
variety of tools are available to predict the topology of transmembrane
proteins. To date no independent evaluation of the performance of these tools
has been published. A better understanding of the strengths and weaknesses of
the different tools would guide both the biologist and the bioinformatician to
make better predictions of membrane protein topology.
SignalP: SignalP 4.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
SignalP: SignalP 4.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
STRING: STRING is a database of
known and predicted protein interactions. The interactions include direct
(physical) and indirect (functional) associations; they are derived from four
sources i.e. Genomic context, high throughput experiments, coexpression,
previous knowledge. STRING quantitatively integrates interaction data from
these sources for a large number of organisms, and transfers information
between these organisms where applicable. The database currently covers
5'214'234 proteins from 1133 organisms.
PROTPARAM: ProtParam
(References / Documentation) is
a tool which allows the computation of various physical and chemical parameters
for a given protein stored in Swiss-Prot or
TrEMBL or for a user entered sequence. The computed parameters include the
molecular weight, theoretical pI, amino acid composition, atomic composition,
extinction coefficient, estimated half-life, instability index, aliphatic index
and grand average of hydropathicity (GRAVY)
PROSITE: Search
your query sequence for protein motifs, rapidly compare your query protein
sequence against all patterns stored in the PROSITE pattern database and determine
what the function of an uncharacterised protein is. This tool requires a
protein sequence as input, but DNA/RNA may be translated into a protein
sequence using transeq
and then queried.
InterPro:
InterPro is an integrated database of predictive protein "signatures"
used for the classification and automatic annotation of proteins and genomes.
InterPro classifies sequences at superfamily, family and subfamily levels,
predicting the occurrence of functional domains, repeats and important sites.
InterPro adds in-depth annotation, including GO terms, to the protein signatures.
Subcellular localization predictors:
GlobPlot Webservice:
- GlobPlot webservice - link to GlobPlot WSDL file.
Prediction of disorder:
- DisEMBL - DisEMBL is our neural network based predictor.
- DISOPRED - Predictor from David Jones' lab.
Function prediction in non-globular protein space:
- ELM - The Eukaryotic Linear Motif Resource.
- NetworKIN - Systematic Discovery of In Vivo Phosphorylation Networks.
Thesis on disorder and linear motifs
Function prediction in globular protein space:
- SMART - SMART/Pfam domains
Domain boundaries:
Synthetic Biology
Synthetic Biology Project @ SLRI - Applying GlobPlot.
Resources
- CELLO (Yu et al, 2004)
- ESLPred (Bhasin and Raghava, 2004)
- LOCnet and LOCtarget (Nair and Rost, 2004)
- LOCSVMPSI (Xie et al, 2005, NAR in press)
- NucPred (Heddad et al, 2004)
- Predotar
- SecretomeP (Bendtsen et al, 2004)
- SignalP (Bendtsen et al, 2004)
- SubLoc (Hua and Sun, 2001)
- TargetP (Emanuelsson et al, 2000)
No comments:
Post a Comment