Overview of Protein Expression in E. coli

The study of Escherichia coli during the 1960s and 1970s made it the best understood organism in nature. Today’s recombinant DNA technology is a direct extension of the genetic and biochemical analyses carried out at that time. Even before the advent of molecular cloning, genetically altered E. coli strains were used to produce quantities of proteins of scientific interest. When cloning techniques became available, most cloning vectors utilized E. coli as their host organism. Thus, it is not surprising that the first attempts to express large quantities of proteins encoded by cloned genes were carried out in E. coli.

E. coli has two characteristics that make it ideally suited as an expression system for many kinds of proteins: it is easy to manipulate and it grows quickly in inexpensive media. These characteristics, coupled with more than 10 years’ experience with expression of foreign genes, have established E. coli as the leading host organism for most scientific applications of protein expression.

Despite a growing literature describing successful protein expression from cloned genes,each new gene still presents its own unique expression problems. No one, and certainly no laboratory manual, can provide a set of methods that will guarantee successful production of every protein in a useful form. Nevertheless,the vast body of accumulated knowledge has led to a general approach that often helps to solve specific expression problems. This unit introduces general considerations and strategies, while subsequent units describe procedures that can be applied to specific expression problems.

GENERAL STRATEGY FOR GENE EXPRESSION IN E. COLI

The basic approach used to express all foreign genes in E. coli begins with insertion of the gene into an expression vector, usually a plasmid. This vector generally contains several elements: sequences encoding a selectable marker that assure maintenance of the vector in the cell; a controllable transcriptional promoter which, upon induction, can produce large amounts of mRNA from the cloned gene; translational control sequences, such as an appropriately positioned ribosome-binding site and initiator ATG; and a polylinker to simplify the insertion of the gene in the correct orientation within the vector. Once constructed, the expression vector containing the gene to be expressed is introduced nto an appropriate E. coli strain by transformation.

SPECIFIC EXPRESSION SCENARIOS

Although this general approach—insertion of the gene of interest into an expression vector followed by transformation in E. coli—is common to all expression systems, specific procedures differ greatly. When choosing a procedure, it is helpful to consider the final application of the expressed protein, as this often dictates which expression strategy to use.

Antigen Production

If the goal is to use the expressed protein as an antigen to make antibodies, several approaches are available to make protein reliably and to allow for rapid purification of the antigen. The two best approaches are synthesis of fusion proteins with specific “tag” sequences that can be retrieved by affinity chromatography and synthesis of the native protein, or a fragment of it, under conditions that cause it to precipitate into insoluble inclusion bodies. These inclusion bodies can be purified sufficiently by differential centrifugation so preparative denaturing polyacrylamide gel electrophoresis will yield an isolated band that can be cut out and crushed, or electroeluted, to provide antigenic material for injection into an animal.

Biochemical or Cell Biology Studies

If the goal is to use the expressed protein as a reagent in a series of biochemical or cell biology experiments, other considerations are relevant. In this case, the authenticity of the protein’s function is very important, while the ease of preparing the protein matters less. For this application, it is possible to express the protein as a fusion protein containing a specific protease-sensitive cleavage site so the N-terminal peptide tail can be removed easily, leaving only the native amino acid sequence. Alternatively, direct expression vectors of the type may be used to produce the authentic primary sequence. When expressed, the protein may be soluble and active, as is the case with many intracellular enzymes. If it is insoluble, as is the case for many secreted growth factors when they are made cytoplasmically in E. coli, it may be necessary to isolate inclusion bodies, solubilize the protein using denaturing agents, and refold the protein. Refolding is usually not too difficult when the protein is of moderate size. Whether the protein is expressed in a soluble form or whether it requires refolding, its integrity can usually be checked by specific enzyme assays or by bioassays.

Structural Studies

If the goal is to do structural studies of the expressed protein, the greatest constraints are imposed on the expression system. Because it is nearly impossible to show that a protein of unknown structure has been precisely refolded after denaturing, the protein must generally be made in a soluble form so its purification does not require a denaturation/renaturation step. Usually, the soluble form of the protein—either intracellular or secreted—must be made in strains and by induction protocols that minimize proteolytic degradation.
Soluble expression of most eukaryotic proteins is best achieved with systems that allow induction of synthesis without changing the temperature; for example, by inducing transcription from the trp or tac promoters. Maximum accumulation of soluble product is best achieved by testing expression in several strains and at several temperatures,and picking the combination that works best. This is an active area of research at present; the rules are not yet understood,so little more than trial and error can be recommended.

TROUBLESHOOTING GENE EXPRESSION

Once an expression strategy has been chosen and the gene is introduced into an appropriate expression vector, several strains of E. coli should be transformed with the vector and protein production should be monitored. Ideally,the protein of interest will be produced in an active form and in sufficient amounts to allow its isolation. Often, however, the protein will be made either in very small amounts or in an insoluble form, or both. If this happens, there are various approaches that may correct the problem.

If not enough protein is produced

1. Reconstruct the 5′-end of the gene, maximizing its A+T content while preserving the protein sequence it encodes. This may reduce secondary structure within the mRNA, or it may alter an as yet undefined parameter of the reaction. Regardless of the underlying cause, this procedure usually increases translation efficiency.
2. Determine if a transcriptional terminator is present. If the vector does not have a transcriptional terminator downstream from the site at which the gene is inserted, put one in. This often aids expression, probably by increasing mRNA stability and by decreasing nucleotide drain on the cell.
3. Examine the sequence of the cloned gene for codons used infrequently in E. coli genes. These so-called rare codons are usually not a rate-limiting problem, but if four or more happen to occur contiguously, they can reduce expression significantly, perhaps by causing ribosomes to pause. Ribosomal pausing can uncouple transcription from translation, leading to premature termination of the message. Even if transcription proceeds normally, the mRNA 3′ to the stalled ribosomes can be exposed to degradation by host ribonucleases, reducing its stability. Thus,if stretches of rare codons are found, they should be altered to codons more favorable to high expression in E. coli.

If enough protein is produced, but it is insoluble when the application requires it to be active and soluble

1. Vary the growth temperature. As mentioned above, many proteins are more soluble at lower than at higher temperatures. On the other hand, some enzymes have a higher specific activity when made at temperatures >37°C. E. coli can synthesize proteins at temperatures ranging from 10° to 43°C, so trying expression at different temperatures is often worthwhile.
2. Change fermentation conditions. Many proteins contain metals as structural and catalytic cofactors. If the protein is being made faster than metals can be transported into the cell, the apoprotein without its metal cofactor will accumulate. This apoprotein will not fold correctly and will likely be insoluble. At the very least, the average specific activity of the expressed protein will be lower than expected. Different media and metal supplements can be tested and the best combination used. Clearly,if there is information about the metal content of the protein, these supplements can be designed more rationally. If no information is available, a more random approach must be tried.
3. Alter the rate of expression by using low-copy-number plasmids. This can be done by using the pACYC family or using single-copy chromosomal inserts of the cloned gene into a suitable target gene. Such reductions in gene dosage often reduce the final yield of protein, but the slower kinetics of synthesis they afford can sometimes result in production of soluble proteins.
To restate the obvious, protein expression is an inexact science at present. However, most proteins can be made in E. coli in a form that is useful for a variety of functions. The procedures employed are relatively quick and uncomplicated, and the rewards for success are great.







Related reading:     What Is Escherichia Coli?       Recombinant Protein Expression In E.Coli