Bioinformatics Tools

Pages

Sunday, February 26, 2012

Pharmacological compound databases

Zinc: http://zinc.docking.org/ Welcome to ZINC, a free database of commercially-available compounds for virtual screening. ZINC contains over 14 million purchasable compounds in ready-to-dock, 3D formats. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF). To cite ZINC, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank NIGMS for financial support (GM71896).

PubChem: http://pubchem.ncbi.nlm.nih.gov/ PubChem, released in 2004, provides information on the biological activities of small molecules. PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links in the homepage. Links from PubChem's chemical structure records to other Entrez databases provide information on biological properties. These include links to PubMed scientific literature and NCBI's protein 3D structure resource. Links to PubChem's bioassay database present the results of biological screening. Links to depositor web sites provide further information. A PubChem FTP site, Download Facility, Power User Gateway(PUG), Standardization Service, Score Matrix Service, Structure Clustering, and Deposition Gateway are also available. PubChem provides tips and example code to allow users to add PubChem search tool (free) in their sites. A PubChem publication site provides links to published articles. 

The DrugBank database: http://www.drugbank.ca/ is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6712 drug entries including 1441 FDA-approved small molecule drugs, 134 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5086 experimental drugs. Additionally, 4231 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. DrugBank is supported by David Wishart, Departments of Computing Science & Biological Sciences, University of Alberta. DrugBank is also supported by The Metabolomics Innovation Centre, a Genome Canada-funded core facility serving the scientific community and industry with world-class expertise and cutting-edge technologies in metabolomics. 

ChemDB: http://cdb.ics.uci.edu/index.htm ChemicalSearch: Find Chemicals by Various Criteria Find a chemical by basic criteria like molecular weight and predicted logP, or by the more abstract notion of structural similarity. Virtual Chemical Space: Retro-Synthesis and Combinatorial Library Design Interactively deconstruct target compounds into component precursors and reconstruct similar building-blocks into combinatorial libraries representing the "virtual chemical space" near the target compound. Reaction Explorer: Synthesis Explorer and Mechanism Explorer Interactive system for learning and practicing reactions, syntheses and mechanisms in organic chemistry, with advanced support for the automatic generation of random problems, curved-arrow mechanism diagrams, and inquiry-based learning. Datasets: For Machine Learning and Searching Experiments Various available chemical datasets annotated with interesting properties to train and test machine-learning prediction and searching methods. Supplements: Articles and Support Material Online articles relating to the system with supplementary data and figures referenced in them.

 The Chapman & Hall/CRC Chemical Database is a structured database holding information on chemical substances. It includes descriptive and numerical data on chemical, physical and biological properties of compounds; systematic and common names of compounds; literature references; structure diagrams and their associated connection tables. The Dictionary of Natural Products Online is a subset of this database and includes all compounds contained in the Dictionary of Natural Products (Main Work and Supplements). The Dictionary of Natural Products (DNP) is the only comprehensive and fully-edited database on natural products. It arose as a daughter product of the well-known Dictionary of Organic Compounds (DOC) which, since its inception in the 1930s has, through successive editions, always been a leading source of natural product information. In the early 1980s, following the publication of the Fifth Edition of DOC, the first to be founded on database methods, the Editors and contributors for the various classes of natural products embarked on a programme of enlargement, rationalisation and classification of the natural product entries, while at the same time keeping the coverage up-to-date. In 1992 the results of this major project, which had grown to match DOC in size, were separately published in both book (7 volumes) and CD-ROM format, leaving DOC with coverage of only the most widely distributed and/or practically important natural products. DNP compilation has since continued unabated by a combination of an exhaustive survey of current literature and of historical sources such as reviews to pick up minor natural products and items of data previously overlooked. The compilation of DNP is undertaken by a team of academics and freelancers who work closely with the in-house editorial staff at Chapman & Hall. Each contributor specialises in a particular natural product class (e.g. alkaloids) and is able to reorganise and classify the data in the light of new research so as to present it in the most consistent and logical manner possible. Thus the compilation team is able to reconcile errors and inconsistencies. The resulting on-line version represents an extremely well organised dictionary documenting virtually every known natural product. A valuable feature of the design is that closely related natural products (e.g. where one is a glycoside or simple ester of another) are organised into the same entry, thus simplifying and bringing out the underlying structural and biosynthetic relationships of the compounds. Structure diagrams are drawn and numbered in the most consistent way according to best stereochemical and biogenetic relationships. In addition, every natural product is indexed by structural/biogenetic type under one of more than 1000 headings, allowing the rapid location of all compounds in the category, even where they have undergone biogenetic modification and no longer share exactly the same skeleton. There is extensive (but not complete) coverage of natural products of unknown structure, and the coverage of these is currently being enhanced by various retrospective searches. 

ChemSpider: http://www.chemspider.com/ is a free chemical structure database providing fast text and structure search access to over 26 million structures from hundreds of data sources.

ChemBank: http://chembank.broadinstitute.org/ is a public, web-based informatics environment created by the Broad Institute's Chemical Biology Program and funded in large part by the National Cancer Institute's Initiative for Chemical Genetics (ICG). This knowledge environment includes freely available data derived from small molecules and small-molecule screens, and resources for studying the data so that biological and medical insights can be gained. ChemBank is intended to guide chemists synthesizing novel compounds or libraries, to assist biologists searching for small molecules that perturb specific biological pathways, and to catalyze the process by which drug hunters discover new and effective medicines. ChemBank stores an increasingly varied set of cell measurements derived from, among other biological objects, cell lines treated with small molecules. Analysis tools are available and are being developed that allow the relationships between cell states, cell measurements and small molecules to be determined. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the ICG in collaborations involving biomedical researchers worldwide. These scientists have agreed to perform their experiments in an open data-sharing environment.The goals of ChemBank are to provide life scientists unfettered access to biomedically relevant data and tools heretofore available almost exclusively in the private sector. We intend for ChemBank to be a planning and discovery tool for chemists, biologists, and drug hunters anywhere, with the only necessities being a computer, access to the Internet, and a desire to extract knowledge from public experiments whose greatest value is likely to reside in their collective sum.

SuperDrug: http://bioinf.charite.de/superdrug/ Different resources exist for experimentally determined and computed three-dimensional (3D)-structures of low molecular weight structures but for approved drugs, no free, publicly accessible source of 3D-structures and conformers is available. Furthermore, for selection purposes or for correlation of structural similarity with medical application, the assignment of the Anatomical Therapeutic Chemical (ATC) classification codes to each structure according to the WHO-scheme would be desirable.RESULTS: The database contains approximately 2500 3D-structures of active ingredients of essential marketed drugs. To account for structural flexibility they are represented by 10(5) structural conformers. Here we present a web-query system enabling searches for drug name, synonyms, trade name, trivial name, formula, CAS-number, ATC-code etc. 2D-similarity screening (Tanimoto coefficients) and an automatic 3D-superposition procedure based on conformational representation are implemented. Drug structures above a similarity threshold as well as superimposed conformers can be retrieved in the mol- file format via a graphical interface. AVAILABILITY: For academic use the system is accessible at http://bioinf.charite.de/superdrug . The retrieval system requires the free browser-plugin 'chime' from MDL for visualization.

Ligand Expo: http://ligand-expo.rutgers.edu/ Ligand Expo (formerly Ligand Depot) provides chemical and structural information about small molecules within the structure entries of the Protein Data Bank. Tools are provided to search the PDB dictionary for chemical components, to identify structure entries containing particular small molecules, and to download the 3D structures of the small molecule components in the PDB entry. A sketch tool is also provided for building new chemical definitions from reported PDB chemical components.

Schrödinger has made available a set of the ligand decoys used in Glide enrichment studies. 1K Drug-Like Ligand Decoys Set: This collection of ligands was created by selecting 1000 ligands from a one million compound library that were chosen to exhibit "drug-like" properties. Creation and application of the ligand set is presented in the following publications: 

Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shaw, D. E.; Shelley, M.; Perry, J. K.; Francis, P.; Shenkin, P. S, "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy", J. Med. Chem. 2004, 47, 1739-1749.

Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L., "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening", J. Med. Chem. 2004, 47, 1750-1759.

The SuperLigands: http://bioinf-tomcat.charite.de/superligands/ The SuperLigands is an encyclopedia that is dedicated to a ligand oriented view of the protein structural space. The database contains small molecule structures occurring as ligands in the Protein Data Bank. SuperLigands integrates different information about drug-likeness or binding properties. A 3D superpositioning algorithm is implemented that allows screening all ligands for possible scaffold hoppers as well as a 2D similarity screen for compounds based on fingerprints.

ChEBI: http://www.ebi.ac.uk/chebi/ Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms.ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.ChEBI uses nomenclature, symbolism and terminology endorsed by the following international scientific bodies: 
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) 

Molecules directly encoded by the genome (e.g. nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original source.

No comments: