Bioinformatics Tools: April 2014

Wednesday, April 30, 2014

How to make a protein soluble?

Cloning, expression and purification of difficult to clone, express and purify proteins in E. coli

I have got some mails in relation to the expression of difficult to purify proteins, so I thought of making a short do's and don't's. For pure bioinformatic people, please bear with me for a couple of posts. First of all it is important to know about the protein, gather as much information about the protein as you can. All those small pieces of information help a lot if kept in mind while designing the strategy for cloning, expression and purification of the proteins. Also be informed about the source of protein, eukaryotic or prokaryotic or any others source. Some of the basic parameters like the size of the protein, PI, amino acid composition etc. pays a vital role in designing the strategy. Here are some tools to look for such information I have compiled on this blog before http://bioinformatictools.blogspot.in/2014/04/functional-annotation-of-hypothetical.html and http://bioinformatictools.blogspot.in/2011/11/in-silico-characterization-of-proteins.html. Look for other sources too. Main theme is to find as much information about the protein as much one could. I am not a big fan of purifying the protein under denaturing condition. There are lots of question that are difficult to answer if the protein needs to be refolded from denaturing conditions, like if the protein has folded properly, if this is the way the protein is natively folded and not just any random refolding of the protein, which are difficult to demonstrate experimentally until you already have some assay in mind. Since I have tried that too I will end by suggesting what all I have learned on that part.

Downstream experimental procedures: Before designing strategy for Cloning, expression and purification of protein, it is wise to determine the downstream experimental procedure you are going to perform and strategy for Cloning, expression and purification mainly depends on this. At times it is possible to purify the protein in soluble form in very small amount using a very large culture (which is ok, if you need very small amount of protein for downstream experiments) for which one need not go through all the standardization experiments with trials in different vectors and host cells. However, in case if large amount of protein is required (such as in crystallization experiments) it is advised to optimize the purification process overall.

Read as much as you can: There are various resources available for suggestions for cloning, expression and purification of the protein in soluble fraction (i.e. QIAexpress handbook). But please keep in mind that it’s easy to suggest in wet lab work but it takes a lot of time and energy to perform the experiments the way one wishes to, so try what you think is logical and more importantly easily available to you (do-able).

Membrane or membrane associated protein: check if the selected protein is Membrane or membrane associated protein. This can be done by using surface localization tools, some of them are listed here http://bioinformatictools.blogspot.in/2007/09/predicting-subcellular-localization-of.html. Also, check if the protein Transmembrane domain (TMHMM http://www.cbs.dtu.dk/services/TMHMM/) or signal peptide (Signal Phttp://www.cbs.dtu.dk/services/SignalP/) in it. These are hydrophobic regions and are normally intrinsically disordered. Membrane proteins are bit tough to get in soluble form till one removes the transmembrane or signal peptide part. It is logical to remove the initial (normally N-terminal) transmembrane or signal peptide part to get the functional domain or multiple domains in soluble form. (I had similar problem with a protein I was working on, when removed the signal peptide and transmembrane domain, it solved everything, got the protein into soluble fraction and got purified as charm, got it crystallized also).

Check for the functional domain in protein if any: This will help in determining the probable function the protein might be having. This will also indicate the other proteins with similar domain and their nature with respect to the cloning, expression and purification of the protein in E. coli. If you can find the protein with the similar domain use the cloning, expression and purification protocol for target protein. Also, for some of the protein the sequence based analysis results/characters change with addition of the tag, keep this in mind too, it might lead to change in PI or so on.

Optimize the temperature: Try different temperature for growth and induction. Induction temperature is more crucial.

Try growing cells at 37 C and induction at 37 C.
Try growing cells at 37 C and induction at 25 C for long time.
Try growing cells at 37 C and induction at 16 C for long time.
Try growing cells at 25 C and induction at 16 C for long time.
Try growing cells at 37 C followed by chilling at 16 C at least one hour before induction.

Low temperature decreases the rate of protein synthesis and usually more soluble protein is obtained. Also, if the temperature is reduced before induction of the cells, it is more likely to yield protein in soluble fraction, it kind of diverts from the pathway of going into inclusion bodies (Sorry, I do not know how).

Optimize the IPTG concentration: it is a good idea to check a gradient in a small scale for the amount of IPTG (using a range from 0.1, 0.2, 0.3 ….mM) required for optimal expression level of the protein. Normally, IPTG is required at very low levels for optimal expression and using higher concentration not only is costly, but also doesn’t show much improvement in the expression level of the protein.

Use a large tag, but make sure to make and arrangement to remove it once you have the protein: Larger tags like intein tag, His-SUMO, GST tag, MBP (maltose binding protein) etc. are known to increase the solubility of proteins, use them if you have the corresponding vectors easily available for them.

Change the vector: using a weaker promoter (e.g. trc instead of T7) and using a lower copy number plasmid normally increases the chance of protein to be purified in soluble fraction. Also, using N- and/or C- terminal tags (in various vectors) affects the solubility of the protein, especially in those protein where folding is dependent on any of these terminals.

Change the host cells: Some of the E. coli strains are better capable of handling toxic or membrane proteins in comparison to others. I had very good experience working with C41 and C43 strains which I came to know through this paper http://www.ncbi.nlm.nih.gov/pubmed/15294299. There are also pLysS versions of these strains, I did not try but you can read and try. Other strains like rosetta etc. might also be good to try (depends upon the strains you can get your hands on) (So, beg, borrow or steal ;)). For a new protein I usually perform as many changes one by one as I can do at small scale and then move them onto large scale. Also, check if your protein is using codons that are rarely used in E. coli. You can check ‘rare codon usage’ using different software available.

Change the culture media: After changing and optimizing as many parameters I could, I was getting low level of protein in soluble fraction in LB media, I read somewhere that someone had good yield with the Terrific Broth, I tried and it gave a way more protein in soluble fraction. I was happy to use it thereafter for any protein I had to purify.

Use Auto-induction media: it will be worthwhile trying auto-induction. The idea is that instead of using an inducing agent like IPTG one uses the native function of the T7 promoter. So if you use media containing glucose and lactose and grow the cells, as the glucose is depleted, the cells will slowly start activating their T7 promoters which will start using lactose in place of glucose. This will also induce the promoters on your expression vector and lead to a much more gradual expression than from using IPTG.

Functional Annotation of Hypothetical proteins

Experimental work is though time taking but is direct approach for functional annotation of hypothetical proteins; however, at times it is difficult to decide upon the experimental design for a relatively new class of a protein. With increasing size and quality of various protein databases, it is becoming relatively easier to look for the experimental design for the probable function of a protein. Following are the steps that can be used in choosing the type of experimental analysis that needs to be performed and the substrate to be used during laboratory tests.

1. If the protein is predicted to be an enzyme, BLAST results normally indicates its closely related proteins that can be looked upon for the experimental procedures to be performed as indicated by the matching hits (look for the papers on those proteins that might indicate the type of related function the protein might perform).

2. With the increasing domain databases, it is possible to analyze the protein domain wise indicating the ability to perform certain kind of biochemical reactions if any. The NCBI’s Conserved Doamin Database (CDD), Pfam and InterProScan databases have a large number of conserved domains that defines a functional class. Presence of certain domain is also indicative of the possible activity of the protein and therefore the type of substrate to be used for defining its chemical activity in laboratory could be helpful.

3. Composition based analysis of protein: there are various bioinformatics tools available online to studying the amino acid composition based analysis of protein informing various properties which help in indicating the properties of protein which later help with the functional annotation of the proteins i.e. Protparam, SPAAN, MP3 and a lot more etc.

4. Homology based modeling: this is an important step in determining the functional annotation of protein based on the structure of the protein, though it may be difficult for the proteins with low identity (<30%) with the already known crystal structures of the protein. However, a good homology model can be an important step towards determining functional annotation for a protein. So also the secondary and tertiary structure prediction of the protein will tell the similar functional categories thereby help in designing relative experimental assays. Some of the commonly used homology based modeling tools are listed here http://bioinformatictools.blogspot.in/2012/01/homology-modeling-of-proteins.html.

5. Phylogenetic analysis: Phylogenetic analysis not only shows evolutionary divergence of the protein but also act as an important step towards functional conservation of the protein. This helps in determining the degree of functional similarity with other related homologous proteins. Thus, determining the appropriate experimental assays towards functional annotation of the protein. With the help of molecular dynamic simulation, this also helps in-silico assessment of the ability of substrate to bind to the protein. In fact it can cut down from large number of substrate molecules to the top most hits, helping to prioritize the experimental analysis, saving time and resources.

6. It is sometimes a bit difficult while working with novel proteins for which relevant data is almost negligible worldwide, so you can wait till you get more information.

Let me know if you have more suggestions to add on.