The Human Plasma Proteome
2002; Elsevier BV; Volume: 1; Issue: 11 Linguagem: Inglês
10.1074/mcp.r200007-mcp200
ISSN1535-9484
AutoresN. Leigh Anderson, Norman G. Anderson,
Tópico(s)Advanced Biosensing Techniques and Applications
ResumoThe human plasma proteome holds the promise of a revolution in disease diagnosis and therapeutic monitoring provided that major challenges in proteomics and related disciplines can be addressed. Plasma is not only the primary clinical specimen but also represents the largest and deepest version of the human proteome present in any sample: in addition to the classical "plasma proteins," it contains all tissue proteins (as leakage markers) plus very numerous distinct immunoglobulin sequences, and it has an extraordinary dynamic range in that more than 10 orders of magnitude in concentration separate albumin and the rarest proteins now measured clinically. Although the restricted dynamic range of conventional proteomic technology (two-dimensional gels and mass spectrometry) has limited its contribution to the list of 289 proteins (tabulated here) that have been reported in plasma to date, very recent advances in multidimensional survey techniques promise at least double this number in the near future. Abundant scientific evidence, from proteomics and other disciplines, suggests that among these are proteins whose abundances and structures change in ways indicative of many, if not most, human diseases. Nevertheless, only a handful of proteins are currently used in routine clinical diagnosis, and the rate of introduction of new protein tests approved by the United States Food and Drug Administration (FDA) has paradoxically declined over the last decade to less than one new protein diagnostic marker per year. We speculate on the reasons behind this large discrepancy between the expectations arising from proteomics and the realities of clinical diagnostics and suggest approaches by which protein-disease associations may be more effectively translated into diagnostic tools in the future. The human plasma proteome holds the promise of a revolution in disease diagnosis and therapeutic monitoring provided that major challenges in proteomics and related disciplines can be addressed. Plasma is not only the primary clinical specimen but also represents the largest and deepest version of the human proteome present in any sample: in addition to the classical "plasma proteins," it contains all tissue proteins (as leakage markers) plus very numerous distinct immunoglobulin sequences, and it has an extraordinary dynamic range in that more than 10 orders of magnitude in concentration separate albumin and the rarest proteins now measured clinically. Although the restricted dynamic range of conventional proteomic technology (two-dimensional gels and mass spectrometry) has limited its contribution to the list of 289 proteins (tabulated here) that have been reported in plasma to date, very recent advances in multidimensional survey techniques promise at least double this number in the near future. Abundant scientific evidence, from proteomics and other disciplines, suggests that among these are proteins whose abundances and structures change in ways indicative of many, if not most, human diseases. Nevertheless, only a handful of proteins are currently used in routine clinical diagnosis, and the rate of introduction of new protein tests approved by the United States Food and Drug Administration (FDA) has paradoxically declined over the last decade to less than one new protein diagnostic marker per year. We speculate on the reasons behind this large discrepancy between the expectations arising from proteomics and the realities of clinical diagnostics and suggest approaches by which protein-disease associations may be more effectively translated into diagnostic tools in the future. Blood plasma is an exceptional proteome in many respects. It is the most complex human-derived proteome, containing other tissue proteomes as subsets. It is collected in huge amounts (millions of liters) for preparation of protein therapeutic products. It is the most difficult protein-containing sample to characterize on account of the large proportion of albumin (55%), the wide dynamic range in abundance of other proteins, and the tremendous heterogeneity of its predominant glycoproteins. And it is the most sampled proteome, with hundreds of millions of tubes withdrawn every year for medical diagnosis, making it clinically the most important. Proteins in plasma have been studied since before we knew genes existed.Having lived through the recent superlatives of the human genome effort(s) and the expectations these generated, it might be thought unwise to use such hyperbole in the more modest world of proteins and proteomics. In fact, the exceptional nature of plasma does not lead us to the sin of self-congratulation insofar as we are in no imminent danger of completing its analysis or even of making optimal use of its diagnostic possibilities. At this stage, the combination of extreme analytical difficulty with well founded hopes for radical improvements in disease diagnosis provides a strong case for increased research effort and in particular some systematic means of accelerating an exploration that has been in process for many decades while so far yielding only a handful of medically useful nuggets.Molecular biology, including the genome and proteome projects, is revolutionizing the biological and medical sciences, holding out the promise of both fully understanding and effectively treating all human diseases (1.Yudell M. DeSalle R. The Genomic Revolution: Unveiling the Unity of Life. Joseph Henry Press with the American Museum of Natural History, Washington, D. C.2002Google Scholar). These projects epitomize the ultimate goal of reductionist biology, which is a complete analysis and description of living systems at the molecular level. In the one case quasi-completed so far (the human genome), billions of dollars were raised, tens if not hundreds of thousands of patents were filed, and new large integrated laboratories were constructed and operated on a crash basis. And yet this is now generally concluded to have been simply laying the foundation for proteomics, a field that requires completely different technologies, sympathy for a very different sort of molecules, and ultimately a very different scale. We are currently in the phase of seeking shortcuts through proteomics analogous to the path that shotgun sequencing blazed through the genome, but without any guarantee that one exists.Against this backdrop it may be useful to take a somewhat broader view than might be expected in a review of one particular proteome. Hence we have attempted to survey the larger context of the plasma proteome as well as the history and status of efforts to explore and use it medically. Finally, we have indulged in some speculation as to the kinds of efforts needed to reach the next stage in the analysis of plasma and its diagnostic applications.In what follows we use the term "plasma" to embrace all the protein components of the blood soluble phase (excluding cells) and not as a prescription for a specific sample processing technique. We could have referred instead to the "serum" proteome but chose plasma because it is in a sense the larger, parent collection from which other related samples are derived.DEFINING THE PLASMA PROTEOMEIn his classic series entitled The Plasma Proteins, Putnam (2.Putnam F.W. The Plasma Proteins Structure, Function, and Genetic Control. Academic Press, New York1975–1987Google Scholar defined true plasma proteins as those that carry out their functions in the circulation, thus excluding proteins that, for example, serve as messengers between tissues (e.g. peptide hormones) or that leak into the blood as a result of tissue damage (e.g. cardiac myoglobin released into plasma after a heart attack). This functional definition correctly emphasized the fact that proteins may appear in plasma for a variety of different reasons, but it also hints at the fact that different methods and approaches were originally responsible for discovery of these classes. The "proteome" (or "protein index") concept, which stems from analytical advances promising a broad inventory of proteins in biological samples, suggests instead that we should aim for a general analytical foundation for the plasma proteome as a whole and later extract functional (diagnostic) utility for various proteins based on results of large scale systematic data collection. Hence in this article we assume that there is a reason to discover, characterize, and routinely measure every protein present in human plasma to the limits of detection. This approach is generating something appropriately called the plasma proteome and distinct from the plasma proteins.Elaborating on Putnam's classification from a functional viewpoint, we can classify the protein content of plasma into the following design/function groups.Proteins Secreted by Solid Tissues and That Act in Plasma—The classical plasma proteins are largely secreted by the liver and intestines. A key aspect of plasma proteins is a native molecular mass larger than the kidney filtration cutoff (∼45 kDa) and thus an extended residence time in plasma (albumin, which is just larger than the cutoff, has a lifetime of about 21 days).Immunoglobulins—Although the antibodies typically function in plasma, they represent a unique class of proteins because of their complexity: there are thought to be on the order of 10 million different sequences of antibodies in circulation in a normal adult."Long Distance" Receptor Ligands—The classical peptide and protein hormones are included in this group. These proteins come in a range of sizes, which may indicate a range of time scales for their control actions (i.e. rapid adjustment with small hormones such as insulin and slower adjustments with larger hormones such as erythropoietin)."Local" Receptor Ligands—These include cytokines and other short distance mediators of cellular responses. In general these proteins have native molecular weights under the kidney filtration cutoff (and hence relatively short residence times in plasma) and appear to be designed to mediate local interactions between cells followed by dilution into plasma at ineffective levels. High plasma levels may cause deleterious effects remote from the site of synthesis, e.g. sepsis.Temporary Passengers—These include non-hormone proteins that traverse the plasma compartment temporarily on their way to their site of primary function, e.g. lysosomal proteins that are secreted and then taken up via a receptor for sequestration in the lysosomes.Tissue Leakage Products—These are proteins that normally function within cells but can be released into plasma as a result of cell death or damage. These proteins include many of the most important diagnostic markers, e.g. cardiac troponins, creatine kinase, or myoglobin used in the diagnosis of myocardial infarction.Aberrant Secretions—These proteins are released from tumors and other diseased tissues, presumably not as a result of a functional requirement of the organism. These include cancer markers, which may be normal, non-plasma-accessible proteins expressed, secreted, or released into plasma by tumor cells.Foreign Proteins—These are proteins of infectious organisms or parasites that are released into, or exposed to, the circulation.Given this variety of classes of protein components, how many "proteins" are likely to be present in plasma? A reasonable calculation could be proposed in three stages. First, assume as a base line that there is a modest number (say 500) of true "plasma proteins" (the first group indicated above) and that each of these is present in 20 variously glycosylated forms (since most plasma proteins are heavily glycosylated) and in five different sizes (including precursors, "mature" forms, degradation products, and splice variants), yielding a total of 50,000 molecular forms. A second large set of components is contributed by tissue leakage: this is effectively the entire human proteome (say 50,000 gene products), each of these gene products having (on average) 10 splice variants, post-translational modifications, or cleavage products, yielding a further 500,000 protein forms. Finally consider the immunoglobulin class as containing perhaps 10,000,000 different sequences. At least in principal, plasma is thus the most comprehensive and the largest version of the human proteome. In comparison with the genome, its degree of complexity is reminiscent of the real number line as compared with the integers: in other words its complexity is not simply n-fold that of the genome but exists on another level entirely. This immense complexity does not doom current efforts to failure, however, because the measurement methods in many cases automatically simplify the picture, collapsing most of the fine variation to yield measurements of all the forms of a protein as one value or at most as a few special classes.Serum, the protein solution remaining after plasma (or whole blood) is allowed to clot, is very similar to plasma: prothrombin is cleaved to thrombin, fibrinogen is removed (to form the clot), and a limited series of other protein changes (mainly proteolytic cleavages) take place. We use the term plasma preferentially to refer to the soluble proteome of the blood because it is the parent mixture and because there may be persuasive reasons to avoid an in vitro proteolysis process (which may unexpectedly alter some proteins) as part of the preferred sample acquisition protocol.At present we do not know much about the detailed relationship between plasma (the routinely available sample), the much larger extracellular fluid compartment (∼17 liters in the average person but practically unsampleable), and the lymph derived from extracellular fluid. Roughly 2.5 liters of lymph flow through the thoracic duct into the blood each day plus another 500 ml through other channels, bringing into the blood much of the protein output of organs like muscles or the liver. Although the total protein concentration of thoracic duct lymph is only about half that of plasma (and must generally be so to support the Starling equilibrium governing fluid transport out of the capillaries), it transports a great deal of protein and in particular contains 5–10 times as much lipoprotein as plasma. A comprehensive examination of the relationship between lymph and plasma by proteomics methods remains to be done.A series of other body fluids including cerebrospinal fluid, synovial fluid, and urine (the ultimate destination of most of the <60-kDa protein material in plasma) share some of the protein content of plasma with specific local additions that reveal interesting clinical information. Unfortunately, these samples are more difficult to obtain in a useful state than plasma: collection of cerebrospinal fluid and synovial fluid are invasive procedures involving pain and some risk, while urine is more difficult to process to a useful sample quickly in a clinical setting (centrifugation to remove cells that can lyse if left in suspension, prevention of microbial growth, and concentration).Is there only one plasma proteome, or are there many: arterial, venous, capillary, capillary in different tissues, etc.? This question is, in many ways, one of timing. Pharmacokinetic studies indicate that there is a central volume of vascular blood that circulates (and is presumably mixed) fairly quickly: the almost immediate appearance in most organs of magnetic resonance imaging or computed tomography imaging contrast agents injected as bolus doses into the venous circulation attests to a very rapid (seconds to a minute) homogenization of this volume of blood. Exchange with the larger volume of blood that is not in the major vessels takes longer and depends on transport of blood through the whole path of arteries to arterioles to capillaries to venules to veins and back through the heart. This process generally has a time scale of minutes to a few hours depending on the molecular weight of the protein and on the flow of extracellular fluid from the site of manufacture either to a nearby capillary or to the lymphatics. We would thus expect the "immediacy" of a protein marker in plasma to depend on its site of origin with time scales of minutes to hours possible for different molecules and sites.A further important feature of the plasma proteome is that it is the furthest removed, among tissue proteomes, from the mRNA level. While many of the major plasma proteins are synthesized in the liver (and comprise many of its most abundant mRNAs (3.Anderson L. Seilhamer J. A comparison of selected mRNA and protein abundances in human liver.Electrophoresis. 1997; 18: 533-537Google Scholar), it is known that their plasma levels correlate only poorly with message abundance in liver (4.Kawamoto S. Matsumoto Y. Mizuno K. Okubo K. Matsubara K. Expression profiles of active genes in human and mouse livers.Gene (Amst.). 1996; 174: 151-158Google Scholar and presumably even more poorly for proteins synthesized in smaller organs (individually or collectively). For these reasons, plasma is a biological system that can only be approached with protein methods and thus remains beyond the scope of DNA- or RNA-based diagnostics.HISTORY OF SYSTEMATIC EXPLORATIONThe exploration of the plasma proteome can be divided roughly into six phases: 1) the earliest investigations before the nature of proteins was understood, 2) the era of fractionation (chemical methods), 3) the era of enzymes (biochemical methods), 4) the era of monoclonal antibodies (molecular biology), 5) the era of proteomics (separation technologies), and 6) the era of genomics (predictive proteomics). These approaches overlap in time and are arranged here in only a loosely historical order.Early History and Chemical Methods (Fractionation)Blood was first emphasized diagnostically by Hippocrates, who proposed that disease was due to an imbalance of four humors: blood, phlegm, yellow bile, and black bile. The importance of this idea was to propose a physical cause, and not a divine one, for human disease, and it remained basic to medical practice for over a thousand years. With Wohler's synthesis of urea in 1828, the distinction between living matter and chemicals began to disappear, and with the enunciation of the cell theory by Schleiden and Schwann, the question of the location of disease could be productively revisited: Virchow described the cellular (as opposed to humoral) basis of disease and finally put an end to phlebotomy as general therapy.Despite not being a humor or "vital principle," plasma remained a subject of interest throughout this period: in the 1830s Liebig and Mulder analyzed a substance called "albumin," in 1862 Schmidt coined the term "globulin" for the proteins that were insoluble in pure water, and in 1894 Gurber crystallized horse serum albumin (5.Putnam F.W. Putnam F.W. The Plasma Proteins Structure, Function, and Genetic Control. Academic Press, New York1975–1987: 1-55Google Scholar. Within the last 100 years, two groups revolutionized plasma protein chemistry. One was the group of Cohn (6.Cohn E.J. The history of plasma fractionation.Adv. Mil. Med. 1948; 1: 364-443Google Scholar and Edsall working on the preparation and fractionation of plasma; during the Second World War, they generated large amounts of albumin and gamma globulin for therapeutic use. The methods they developed are still used in the plasma fractionation industry described below. The second was the Behring Institute, which, using an unusual rivanol precipitation technique, discovered and prepared numerous human plasma proteins, made antibodies against them, and distributed these world wide (7.Schultze H.E. Heremans J.F. Molecular Biology of Human Proteins with Special Reference to Plasma Proteins. Elsevier, New York1966Google Scholar. This latter work, and the development of simple immunological methods for analysis, meant that researchers around the world could readily discover new correlations between the amounts of specific proteins and disease.Enzyme ActivitiesEnzyme activities were detectable in body fluids long before the enzyme proteins could be isolated and studied (8.Moss D.W. Henderson A.R. Burtis C.A. Ashwood E.R. Tietz Textbook of Clinical Chemistry. W. B. Saunders Co., Philadelphia, PA1999: 617-721Google Scholar. Alkaline and acid phosphatase activities were related to bone disease and prostate cancer, respectively, in the decades before 1950, and in 1955 the enzyme now called aspartate aminotransferase was detected in serum following acute myocardial infarction. The attraction of enzymes as analytes is the sensitivity with which the products of an enzymatic reaction can be detected and the lack of any necessity to fractionate the sample. Serum "chemistry analyzers" represented some of the first fruits of automation in clinical medicine and made possible the concept of batteries of tests rather than one or a very few tests ordered by the astute diagnostician. These evolved from the autoanalyzer developed by Leonard Skeggs (9.Skeggs Jr., L.T. Persistence and prayer: from the artificial kidney to the AutoAnalyzer.Clin. Chem. 2000; 46: 1425-1436Google Scholar in the 1950s and commercialized by Technicon, through computer-controlled instruments such as the centrifugal fast analyzer (10.Anderson N.G. Computer interfaced fast analyzers.Science. 1969; 166: 317-324Google Scholar, to the very sophisticated integrated instrument/reagent systems of today. Up to the present time, enzyme assays persist for protein markers of liver toxicity because they are so inexpensive relative to immunoassays (whose development requires development of specific antibody reagents instead of simple chemical substrates).Enzyme assays have the advantage over all the other assay methods that they measure level of function rather than amount of a molecule: unfortunately many of the enzyme activities measured in plasma probably do not have a physiological function there but rather represent leakage of protein from tissues. Additional drawbacks of enzymatic assays are the difficulty of obtaining an estimate of the mass of protein involved (since results are in activity units), the difficulty of associating some activities with a single protein and hence with a specific source, and the lack of isotype information unless some electrophoretic or other separation precedes the enzyme detection. In any case, since a large proportion of proteins in plasma and elsewhere are not enzymes, alternative means are required to discover and measure them.Antibodies and Monoclonal AntibodiesAll proteins have unique surface shapes, and antibodies are nature's answer to accurate shape recognition. It appears to be possible to make a specific antibody to any protein provided that pure protein is available to immunize an animal (the inverse of the limitation encountered with enzymes). Proteins purified by fractionation are useful as antigens for the preparation of classical rabbit and goat polyclonal antibodies, and these antibodies provide the basis for simple immunochemical tests for each protein (e.g. radial immunodiffusion, rocket electrophoresis, or more recently automated nephelometry) as well as more complex and sensitive sandwich assays with enzymatic or radiochemical detection. These technologies provide a general solution (as demonstrated by the Behring Institute) to the problem of measuring one or more proteins individually in large numbers of samples provided that relatively pure protein is available in significant quantities as antigen.The requirement for an isolated antigen was circumvented, however, following the introduction of monoclonal mouse antibodies by Kohler and Milstein (11.Kohler G. Milstein C. Continuous cultures of fused cells secreting antibody of predefined specificity.Nature. 1975; 256: 495-497Google Scholar in 1975. The general supposition, confirmed in many situations, is that a monoclonal "sees" a specific epitope that is likely to occur on one protein (or potentially on its very close relatives). The antibody thus serves as a ready-made detection reagent for constructing a specific assay, a potential drug (if the epitope is on a therapeutic target), and an immunoadsorbent useful for isolating the protein from plasma. Many different monoclonals can be produced after immunization with a complex antigen mixture (which can be a tissue or a body fluid), and one can test each to determine whether it sees an antigen that can distinguish diseased samples from normal ones for example. The approach can be likened to screening a library of chemicals against a protein target to discover most new drugs: it is allows one to be lucky if not smart.As a protein (or more properly, epitope) discovery process, the monoclonal approach was more useful in the discovery of new tissue antigens shed into plasma than in finding new plasma proteins. The reason is presumably that many immunodominant proteins are present at very high abundance in plasma, while tissue extracts are not so rich in a few already known antigens. Thus, for example, many of the newer monoclonal-based cancer detection tests are used clinically before the underlying protein is identified by sequence or gene. A striking example of this phenomenon is cancer antigen 125 (CA 125), 1The abbreviations used are: CA, cancer antigen; 2-DE, two-dimensional electrophoresis; LC, liquid chromatography; Ig, immunoglobulin; MS, mass spectrometry; MALDI, matrix-assisted laser desorption ionization; TOF, time of flight; ESI, electrospray ionization; GC, gas chromatography; FDA, Food and Drug Administration; CLIA, Clinical Laboratory Improvement Amendments; CV, coefficient of variation. used in the diagnosis of ovarian and other cancers. First reported in 1984 (12.Bast Jr., R.C. Klug T.L. St. John E. Jenison E. Niloff J.M. Lazarus H. Berkowitz R.S. Leavitt T. Griffiths C.T. Parker L. Zurawski Jr., V.R. Knapp R.C. A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer.N. Engl. J. Med. 1983; 309: 883-887Google Scholar, this protein marker is in widespread clinical use and has been the subject of more than 2,000 scientific publications, yet its sequence was not elucidated until recently (13.O'Brien T.J. Beard J.B. Underwood L.J. Dennis R.A. Santin A.D. York L. The CA 125 gene: an extracellular superstructure dominated by repeat sequences.Tumor Biol. 2001; 22: 348-366Google Scholar in part because the protein is huge: more than a million daltons. Such monoclonal-based discoveries exist initially outside the boundaries of the current genomic/proteomic space. When markers of this type are identified, it is sometimes the case that two epitopes (for which there may be competing commercial tests) are on the same molecule. The CA 27.29, CA 15-3, and CASA assays, for example, all recognize antigenic determinants on the MUC1 mucin protein (14.Devine P.L. McGuckin M.A. Quin R.J. Ward B.G. Serum markers CASA and CA 15-3 in ovarian cancer: all MUC1 assays are not the same.Tumor Biol. 1994; 15: 337-344Google Scholar, 15.Cheung K.L. Graves C.R. Robertson J.F. Tumour marker measurements in the diagnosis and monitoring of breast cancer.Cancer Treat. Rev. 2000; 26: 91-102Google Scholar). Current attempts to invert the monoclonal antibody generation process (i.e. expecting to make good antibodies to a list of specific proteins rather than embracing whatever the immune system selects as immunogenic in a mixture) have revealed that antibodies in general are idiosyncratic and not analogous to oligonucleotide probes in their painless generality.Profiling and ProteomicsThe use of analytical separations to look at the plasma proteome parallels very closely the development of the separations themselves: plasma is always among the first samples to be examined. Shortly after Svedberg, using the analytical ultracentrifuge, found that proteins had unique molecular weights, Tiselius found that serum could be fractionated into multiple components on the basis of electrophoretic mobility. His method of electrophoresis, first in liquid and then later in anticonvective media such as paper, cellulose acetate, starch, agarose, and polyacrylamide, has dominated the separative side of plasma proteome work until very recently, evolving through a series of one- and two-dimensional systems and finally to combinations with chromatography and mass spectrometry that generalize to n-dimensions. This evolution has resulted in an almost constant exponential increase in resolved protein species for the past 70 years (Fig. 1) within which one can distinguish at least three separate phases arising as increasingly complex separations were required to continue forward progress. We do not review in detail the "one-dimensional" phase of this development but begin with the era of "proteomics" and two-dimensional gels, which for 20 years have been the core of proteomic technology and the source of most published work on the plasma proteome.Two-dimensional ElectrophoresisSoon after the introduction of high resolution two-dimensional electrophoresis (2-DE) in 1975 by Klose (16.Klose J. Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals.Humangenetik. 1975; 26: 231-243Google Scholar), O'Farrell (17.O'Farrell P.H. High resolution two-dimensional electrophoresis of proteins.J. Biol. Chem. 1975; 250: 4007-4021Google Scholar), and others, the technique was applied to the plasma proteins by the present authors (18.Anderson L. Anderson N.G. High resolution two-dimensional electrophoresis of human plasma proteins.Proc. Natl. Acad. Sci. U. S. A. 1977; 74: 5421-5425Google Scholar) with the result that the number of resolved species increased to 300 or more. The 2-DE map of human plasma that resulted is recognizably the same as those produced later by many investigators: in contrast to cell
Referência(s)