Identification of the Genetic Basis for Complex Disorders by Use of Pooling-Based Genomewide Single-Nucleotide–Polymorphism Association Studies
2006; Elsevier BV; Volume: 80; Issue: 1 Linguagem: Inglês
10.1086/510686
ISSN1537-6605
AutoresJohn V. Pearson, Matthew J. Huentelman, Rebecca F. Halperin, Waibhav Tembe, Stacey Melquist, Nils Homer, Marcel Brun, Szabolcs Szelinger, Keith D. Coon, Victoria Zismann, Jennifer Webster, Thomas G. Beach, Sigrid Botne Sando, Jan Aasly, Reinhard Heun, Frank Jessen, Heike Kölsch, Magda Tsolaki, Makrina Daniilidou, Eric M. Reiman, Andreas Papassotiropoulos, Michael Hutton, Dietrich A. Stephan, David W. Craig,
Tópico(s)Genomics and Chromatin Dynamics
ResumoWe report the development and validation of experimental methods, study designs, and analysis software for pooling-based genomewide association (GWA) studies that use high-throughput single-nucleotide–polymorphism (SNP) genotyping microarrays. We first describe a theoretical framework for establishing the effectiveness of pooling genomic DNA as a low-cost alternative to individually genotyping thousands of samples on high-density SNP microarrays. Next, we describe software called "GenePool," which directly analyzes SNP microarray probe intensity data and ranks SNPs by increased likelihood of being genetically associated with a trait or disorder. Finally, we apply these methods to experimental case-control data and demonstrate successful identification of published genetic susceptibility loci for a rare monogenic disease (sudden infant death with dysgenesis of the testes syndrome), a rare complex disease (progressive supranuclear palsy), and a common complex disease (Alzheimer disease) across multiple SNP genotyping platforms. On the basis of these theoretical calculations and their experimental validation, our results suggest that pooling-based GWA studies are a logical first step for determining whether major genetic associations exist in diseases with high heritability. We report the development and validation of experimental methods, study designs, and analysis software for pooling-based genomewide association (GWA) studies that use high-throughput single-nucleotide–polymorphism (SNP) genotyping microarrays. We first describe a theoretical framework for establishing the effectiveness of pooling genomic DNA as a low-cost alternative to individually genotyping thousands of samples on high-density SNP microarrays. Next, we describe software called "GenePool," which directly analyzes SNP microarray probe intensity data and ranks SNPs by increased likelihood of being genetically associated with a trait or disorder. Finally, we apply these methods to experimental case-control data and demonstrate successful identification of published genetic susceptibility loci for a rare monogenic disease (sudden infant death with dysgenesis of the testes syndrome), a rare complex disease (progressive supranuclear palsy), and a common complex disease (Alzheimer disease) across multiple SNP genotyping platforms. On the basis of these theoretical calculations and their experimental validation, our results suggest that pooling-based GWA studies are a logical first step for determining whether major genetic associations exist in diseases with high heritability. Genomewide association (GWA) studies that use hundreds of thousands of SNPs have the potential to revolutionize our ability to identify the genetic influences of complex traits and diseases. Although potentially allowing for the identification of common variants to complex disease, GWA studies often require millions of dollars to complete and, as such, are beyond the reach of many research groups. Despite their inherent high costs, these studies will remain one of the best ways to study the genetic basis of complex diseases in a hypothesis-free study design. GWA studies are typically designed with three phases: (I) individual genotyping of ≥250,000 SNPs across hundreds to thousands of individuals, (II) validation of the most significant SNPs (typically tens to thousands of SNPs) by individual genotyping in new cohorts, and (III) fine-mapping SNPs adjacent to the validated SNPs (generally only a few regions) and/or validation in additional cohorts. One possible approach to reducing the overall cost of GWA studies is to replace individual genotyping in phase I with genotyping (or allelotyping) of pooled genomic DNA. Several previous reports have investigated the feasibility of pooling on SNP genotyping microarrays (or related technologies). With a few exceptions, these reports have focused on predicting allelic frequencies across thousands of SNPs rather than on the effectiveness of pooling in identifying the genetic basis of complex disorders.1Johnson C Drgon T Liu QR Walther D Edenberg H Rice J Foroud T Uhl GR Pooled association genome scanning for alcohol dependence using 104,268 SNPs: validation and use to identify alcoholism vulnerability loci in unrelated individuals from the collaborative study on the genetics of alcoholism.Am J Med Genet B Neuropsychiatr Genet. 2006; 141: 844-853Crossref Scopus (124) Google Scholar, 2Liu QR Drgon T Walther D Johnson C Poleskaya O Hess J Uhl GR Pooled association genome scanning: validation and use to identify addiction vulnerability loci in two samples.Proc Natl Acad Sci USA. 2005; 102: 11864-11869Crossref PubMed Scopus (81) Google Scholar, 3Craig DW Stephan DA Applications of whole-genome high-density SNP genotyping.Expert Rev Mol Diagn. 2005; 5: 159-170Crossref PubMed Scopus (42) Google Scholar, 4Butcher LM Meaburn E Dale PS Sham P Schalkwyk LC Craig IW Plomin R Association analysis of mild mental impairment using DNA pooling to screen 432 brain-expressed single-nucleotide polymorphisms.Mol Psychiatry. 2005; 10: 384-392Crossref PubMed Scopus (39) Google Scholar, 5Butcher LM Meaburn E Knight J Sham PC Schalkwyk LC Craig IW Plomin R SNPs, microarrays and pooled DNA: identification of four loci associated with mild mental impairment in a sample of 6000 children.Hum Mol Genet. 2005; 14: 1315-1325Crossref PubMed Scopus (75) Google Scholar, 6Hoogendoorn B Norton N Kirov G Williams N Hamshere ML Spurlock G Austin J Stephens MK Buckland PR Owen MJ et al.Cheap, accurate and rapid allele frequency estimation of single nucleotide polymorphisms by primer extension and DHPLC in DNA pools.Hum Genet. 2000; 107: 488-493Crossref PubMed Scopus (146) Google Scholar, 7Le Hellard S Ballereau SJ Visscher PM Torrance HS Pinson J Morris SW Thomson ML Semple CA Muir WJ Blackwood DH et al.SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis.Nucleic Acids Res. 2002; 30: e74Crossref PubMed Scopus (115) Google Scholar, 8Hinds DA Seymour AB Durham LK Banerjee P Ballinger DG Milos PM Cox DR Thompson JF Frazer KA Application of pooled genotyping to scan candidate regions for association with HDL cholesterol levels.Hum Genomics. 2004; 1: 421-434Crossref PubMed Scopus (54) Google Scholar, 9Norton N Williams NM O'Donovan MC Owen MJ DNA pooling as a tool for large-scale association studies in complex traits.Ann Med. 2004; 36: 146-152Crossref PubMed Scopus (61) Google Scholar, 10Sham P Bader JS Craig I O'Donovan M Owen M DNA pooling: a tool for large-scale association studies.Nat Rev Genet. 2002; 3: 862-871Crossref PubMed Scopus (444) Google Scholar, 11Barratt BJ Payne F Rance HE Nutland S Todd JA Clayton DG Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design.Ann Hum Genet. 2002; 66: 393-405Crossref PubMed Scopus (118) Google Scholar, 12Bansal A van den Boom D Kammerer S Honisch C Adam G Cantor CR Kleyn P Braun A Association testing by DNA pooling: an effective initial screen.Proc Natl Acad Sci USA. 2002; 99: 16871-16874Crossref PubMed Scopus (135) Google Scholar, 13Buetow KH Edmonson M MacDonald R Clifford R Yip P Kelley J Little DP Strausberg R Koester H Cantor CR et al.High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.Proc Natl Acad Sci USA. 2001; 98: 581-584Crossref PubMed Scopus (421) Google Scholar, 14Jurinke C van den Boom D Cantor CR Koster H Automated genotyping using the DNA MassArray technology.Methods Mol Biol. 2001; 170: 103-116PubMed Google Scholar, 15Jurinke C van den Boom D Cantor CR Koster H The use of MassARRAY technology for high throughput genotyping.Adv Biochem Eng Biotechnol. 2002; 77: 57-74PubMed Google Scholar, 16Kammerer S Burns-Hamuro LL Ma Y Hamon SC Canaves JM Shi MM Nelson MR Sing CF Cantor CR Taylor SS et al.Amino acid variant in the kinase binding domain of dual-specific A kinase-anchoring protein 2: a disease susceptibility polymorphism.Proc Natl Acad Sci USA. 2003; 100: 4066-4071Crossref PubMed Scopus (85) Google Scholar, 17Nelson MR Marnellos G Kammerer S Hoyal CR Shi MM Cantor CR Braun A Large-scale validation of single nucleotide polymorphisms in gene regions.Genome Res. 2004; 14: 1664-1668Crossref PubMed Scopus (61) Google Scholar, 18Tang K Oeth P Kammerer S Denissenko MF Ekblom J Jurinke C van den Boom D Braun A Cantor CR Mining disease susceptibility genes through SNP analyses and expression profiling using MALDI-TOF mass spectrometry.J Proteome Res. 2004; 3: 218-227Crossref PubMed Scopus (42) Google Scholar, 19Meaburn E Butcher LM Liu L Fernandes C Hansen V Al-Chalabi A Plomin R Craig I Schalkwyk LC Genotyping DNA pools on microarrays: tackling the QTL problem of large samples and large numbers of SNPs.BMC Genomics. 2005; 6: 52Crossref PubMed Scopus (55) Google Scholar, 20Macgregor S Visscher PM Montgomery G Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates.Nucleic Acids Res. 2006; 34: e55Crossref PubMed Scopus (47) Google Scholar, 21Zuo Y Zou G Zhao H Two-stage designs in case-control association analysis.Genetics. 2006; 173: 1747-1760Crossref PubMed Scopus (40) Google Scholar Indeed, it is not yet clear whether predicting allelic frequency to within 2% accuracy (as is frequently reported) is sufficient, when ≥250,000 SNPs have incremental allelic frequency differences that vary only between 0% and perhaps a maximum of 10%–15%. Simply, the imprecision of ≥250,000 pooled measurements may change a SNP ranked in the top 100 SNPs to a rank that misses a phase II cutoff—for example, to the top 1,000 SNPs. For instance, if the true allelic frequency difference between cases and controls is 11.0% for the best SNP (of 250,000 SNPs) and is 10.0% for the 1,000th best SNP, can we identify correctly the genetic loci when our measurement error is on the order of 2%? Simply by chance and with a 2% measurement error, we may predict the true best SNP at 9.5% or we may measure any one of several thousand other SNPs falsely at >11.0%. Clearly, multiple testing and imprecise measurements of allelic frequency make it difficult to accurately rank and identify associated SNPs by use of a pooling-based GWA design. We first investigate the factors influencing pooling-based and individual genotype–based GWA studies. For individual genotyping, there are a number of factors that influence the ability of GWA studies to detect genetic associations. These include but are not limited to: (1) the allele frequency of the causal variant; (2) its odds ratio (OR) or genetic relative risk; (3) the linkage disequilibrium (LD) between the causal variant and probed SNPs; (4) the number of individuals in each cohort; (5) the number of probed SNPs in LD with the causal variant; and (6) the analysis approach taken. Specific to pooling, there are additional factors that influence the ability to detect a true association. These additional factors include: (7) the precision of allele frequency measurements made by the SNP genotyping microarray; (8) the accuracy of pool construction by pippetting; (9) the integrity of the pooled genomic DNA; (10) the number of individuals pooled or overall pooling strategy; and (11) the number of microarray technical replicates. Furthermore, population stratification and admixture can mask true associations in all studies. Beyond these additional factors, by pooling one loses the abilities to compare subphenotypes of pools, to directly measure genotype, and to detect gene-gene interactions. However, perhaps the most important factor in favor of pooling-based GWA studies is that this study design can be completed for thousands of dollars, whereas individual genotyping may require millions of dollars simply to complete the first phase. There are numerous orphan diseases and many small populations which cannot realistically be studied using individual genotyping at this time, and a pooling-based GWA study is an attractive, cost-effective alternative. Unfortunately, the following questions have not been fully addressed in the context of >300,000 markers: (1) whether a pooling-based GWA can be effective; (2) how one should design a pooling-based GWA study; (3) what is the resolution of the study; and (4) how can one analyze the data. In this article, we investigate the factors that influence effectiveness of a pooling-based GWA study, develop analysis tools for completing pooling-based GWA studies (GenePool), and establish the practical capability of pooling-based studies to identify the correct genetic locus using actual case-control pooling data with published associated loci. We show that pooling-based GWA studies are a logical first step for studying many diseases with high heritability and that they provide an opportunity to screen for major genetic associations at a substantially lower cost. Before quantitation, all DNA samples were checked for quality using 1% agarose gel electrophoresis, and obviously degraded samples were excluded from the pooling analysis. Individual genomic DNA concentrations of each subject were determined in quintriplicate with the Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen) according to the manufacturer's instructions. The median concentration was calculated for each individual DNA. Alzheimer disease (AD [MIM 104300]) pools were constructed as four subpools divided by region or population, as shown in table 1. Individual DNA samples were then added to their respective pools in equivalent molar amounts. Each AD subpool was created de novo a total of three times, to control for pippetting errors, whereas each progressive supranuclear palsy (PSP [MIM 601104]) subpool was created five times. Each subpool contained identical samples per cohort to better assess variance. In the "Discussion" section, we describe potential advantages of each subpool containing independent samples. Once created, each pool was diluted to 50 ng/μl with sterile water, in preparation for the high-density SNP genotype assay. Sample DNA amplified through the use of available whole-genome amplification technologies was avoided because uneven amplification in some samples may substantially reduce power at regions of high amplification.Table 1Single-Marker Analysis of Pooled GWA Data from Three Disorders with Known Associated Genetic LociAnalysisDisorderVariantORNo. of Cases, No. of Controls (Ethnicity)Pooling Rank (by GenePool)Approximate No. of SNPs in LD (r2>.5)PlatformArrays per Cohort1ADaDiagnosed postmortem.ApoE-ɛ48.3280, 169 (US)6/500,5681Affymetrix 500K92ADaDiagnosed postmortem.ApoE-ɛ48.3280, 169 (US)18/317,208; 125/317,2082Illumina 300K23ADbVariously clinically diagnosed.ApoE-ɛ4∼3–8199, 191 (Norwegian); 214, 129 (German); 168, 69 (Greek)1/500,568; 21/500,568; 34/500,5681Affymetrix 500K94PSPMAPT3.3288, 344 (US)2/500,568 (best); 32 MAPT SNPs in top 1,000 SNPs168Affymetrix 500K105PSPMAPT3.3288, 344 (US)1/116,110 (best); 15 MAPT SNPs in top 1,000 SNPs38Affymetrix 100K106SIDDTTSPYLcData are provided in the work of Puffenberger et al.30…3, 100 (Amish)6/10,55513Affymetrix 10K3Note.—In each case, SNPs in LD with the previously published associated locus were in the top 50 SNPs overall and would have been flagged for validation.a Diagnosed postmortem.b Variously clinically diagnosed.c Data are provided in the work of Puffenberger et al.30Puffenberger EG Hu-Lince D Parod JM Craig DW Dobrin SE Conway AR Donarum EA Strauss KA Dunckley T Cardenas JF et al.Mapping of sudden infant death with dysgenesis of the testes syndrome (SIDDT) by a SNP genome scan and identification of TSPYL loss of function.Proc Natl Acad Sci USA. 2004; 101: 11689-11694Crossref PubMed Scopus (115) Google Scholar Open table in a new tab Note.— In each case, SNPs in LD with the previously published associated locus were in the top 50 SNPs overall and would have been flagged for validation. Pools were assayed on the Affymetrix 500K platform following the Affymetrix protocol for individual genotyping. Each AD case/control subpool was assayed in three technical replicates, and each PSP subpool was assayed in two technical replicates for the Affymetrix 500K platform. The US AD cohort was assayed in two technical replicates on the Illumina 300K platform by combining replicate subpools for each cohort and by following the protocol for individual genotyping version 1.0 Illumina HumanHap 300K arrays (Illumina). For samples in the US AD cohort, individual-genotype data were available for all samples in the pool on the Affymetrix 500K platform. For individual genotyping, SNPs were called using two genotyping calling algorithms, SNiPer-HD and BRLMM (Affymetrix). SNiPer-HD uses an expectation-maximization training-based algorithm, and BRLMM uses a modified robust linear model with Mahalanobis distance classifier (RLMM) algorithm.22Rabbee N Speed TP A genotype calling algorithm for affymetrix SNP arrays.Bioinformatics. 2006; 22: 7-12Crossref PubMed Scopus (249) Google Scholar Both algorithms provide superior calls over the standard dynamic modeling approach. However, SNiPer-HD uses only a subset of 380,000 SNPs with highly reliable calls. Only SNPs whose calls agreed in both BRLMM and SNiPer-HD were used for analysis, with ∼99.8% of the reduced 380,000 SNP set in agreement. Predicted allelic frequencies were calculated using the k-correction method described by Craig et al.23Craig DW Huentelman MJ Hu-Lince D Zismann VL Kruer MC Lee AM Puffenberger EG Pearson JM Stephan DA Identification of disease causing loci using an array-based genotyping approach on pooled DNA.BMC Genomics. 2005; 6: 138Crossref PubMed Scopus (57) Google Scholar Training for k-correction values resulted from separate individual-genotype data from ∼900 Affymetrix 500K array sets by the same laboratory. Comparing allelic frequencies predicted by pooling and measured by individual genotyping, we experimentally determined that nine Affymetrix 500K arrays measure allelic frequency with an SD of 2.5% with the use of typical DNA. Importantly, that is the measurement error of one cohort, and not the measurement error associated with subtracting the difference between cases and controls. Different reports find different accuracies, which may be largely because of different qualities of starting DNA. In this study, we used typical DNA, which includes samples that may have been stored in a freezer for several months or several years as part of a repository. We expect that, if freshly isolated cell DNA is used, accuracy would be substantially higher and similar to the values reported by other groups. Thus, it is possible that some groups identify "better" results when high-quality starting material is used. Pooling was simulated in Matlab 7.0 (MathWorks) on the basis of experimental measurements of probe intensities from pooled and individual samples run on Affymetrix 500K GeneChip Mapping arrays. Specifically, we generated paired case-control data sets equivalent to those expected by individual genotyping and if one were to have measured allelic frequencies by pooling. Thus, the pooled data sets are the individual-genotype data sets in which noise consistent with pooling measurement error is introduced. Simulated data for each case-control cohort was generated independently, under the assumptions of the number of chromosomes pooled (twice the number of individuals), the number of SNPs assayed, the LD between SNPs, and the minor-allele frequency (MAF) for each SNP. In addition to probed SNPs, the "associated causal variant" was simulated in the cases by indirect sampling from a neighboring variant, assumed to be in LD with the associated causal variant with r2=0.8 and MAF of 10%. Specifically, individual-genotype data were generated by multiple random sampling of a binomial distribution (binornd function) with the use of 250,000 SNPs and under the assumption of no LD between SNPs. In figure 1D, larger SNP sets were generated under the assumption that 500,000, 750,000, and 1,000,000 SNPs were measured with two, three, and four SNPs, respectively, in complete LD. A control data set and case data set were generated by pooling genotypes under an assumption of Hardy-Weinberg equilibrium. For both case and control data sets, pooled measurement noise was separately added by randomly sampling a normal distribution (normrnd function), under the assumption σ=2.5% from the individual-genotype data sets. The simulated pooled error was approximately equal to that experimentally observed in nine Affymetrix 500K arrays. Both cases and controls were each treated as single pools, rather than subpools. In each simulation, the rank of the "associated SNP" to the causal variant was recorded for both simulated individual genotyping and simulated pooling. If multiple SNPs were in LD with the associated causal variant (as in fig. 1D and 1E), we took the best rank of these SNPs. The number of SNPs in LD with an associated causal variant never exceeded four SNPs. In figure 1E, variable LD between all SNPs was added to the simulations. First, all SNPs were assigned an r2 value to the preceding and following SNP. Values for r2 were selected from a normal distribution in which 70% of the SNPs exceeded an r2 of 0.8 and the average r2 was 0.85. This distribution is roughly equivalent to that of the Affymetrix 500K on CEPH samples. SNP data were then sequentially constructed across 500,000 SNPs. Specifically, genotype data for the first SNP were generated by random sampling from a binomial distribution. Genotype data for all subsequent SNPs were then sequentially generated by adding genotype information from the neighboring SNPs and random sampling from a binomial distribution, according to the defined r2. Cases and controls were from a previously described case-control series, and institutional review board (IRB) approval was obtained for all human subjects.24Rademakers R Melquist S Cruts M Theuns J Del-Favero J Poorkaj P Baker M Sleegers K Crook R De Pooter T et al.High-density SNP haplotyping suggests altered regulation of tau gene expression in progressive supranuclear palsy.Hum Mol Genet. 2005; 14: 3281-3292Crossref PubMed Scopus (134) Google Scholar Case patients chosen for pooling had a primary pathological diagnosis of PSP according to standard criteria (n=288).25Hauw JJ Daniel SE Dickson D Horoupian DS Jellinger K Lantos PL McKee A Tabaton M Litvan I Preliminary NINDS neuropathologic criteria for Steele-Richardson-Olszewski syndrome (progressive supranuclear palsy).Neurology. 1994; 44: 2015-2019Crossref PubMed Google Scholar, 26Litvan I Hauw JJ Bartko JJ Lantos PL Daniel SE Horoupian DS McKee A Dickson D Bancher C Tabaton M et al.Validity and reliability of the preliminary NINDS neuropathologic criteria for progressive supranuclear palsy and related disorders.J Neuropathol Exp Neurol. 1996; 55: 97-105Crossref PubMed Scopus (381) Google Scholar The case patients had a mean (±SD) age at death of 75±7.6 years and were 51% male. Cognitively normal, age- and sex-matched controls were collected under the Normal and Pathological Aging Protocols at Mayo Clinic Scottsdale (n=344). All patients and controls used in the study were white.27Caselli RJ Osborne D Reiman EM Hentz JG Barbieri CJ Saunders AM Hardy J Graff-Radford NR Hall GR Alexander GE Preclinical cognitive decline in late middle-aged asymptomatic apolipoprotein E-e4/4 homozygotes: a replication study.J Neurol Sci. 2001; 189: 93-98Abstract Full Text Full Text PDF PubMed Scopus (56) Google Scholar, 28Caselli RJ Hentz JG Osborne D Graff-Radford NR Barbieri CJ Alexander GE Hall GR Reiman EM Hardy J Saunders AM Apolipoprotein E and intellectual achievement.J Am Geriatr Soc. 2002; 50: 49-54Crossref PubMed Scopus (9) Google Scholar Four white case-control cohorts were available for AD: three clinically characterized cohorts and one postmortem clinically and neuropathologically characterized cohort, as summarized in table 1. IRB approval was obtained for all human subjects. Both individual-genotype data and pooled-genotype data were available for the US postmortem cohort. DNA samples were extracted from brain tissue in 398 brain donors, who were at least 65 years of age at the time of their death. The donors included 242 patients who satisfied clinical and neuropathological criteria for the diagnosis of AD and 156 persons who did not meet neuropathological criteria for AD. All the brain donors were white. For the German cohort, AD patients were recruited from the Department of Psychiatry, University of Bonn. Patients were diagnosed according to DSM-IV, which was supported by clinical examination, detailed structured interviews, neuropsychological testing, cognitive screening done by Mini-Mental State Examination, and neuroimaging studies. Healthy controls were recruited with the support of the local census bureau and the regional Board of Data Protection (Nordrhein-Westphalia, Germany), and diagnosis was done by structured interviews and neuropsychological testing. All patients and control subjects gave informed consent for participation in the study. The study protocol was approved by the Ethics Committee of the Faculty of Medicine at the University of Bonn. For the Norwegian cohort, patients were recruited from the geriatric and neurological outpatient clinics at St. Olav's Hospital in Trondheim and from local nursing homes, as part of a study of the genetics of dementias in central Norway, as described elsewhere.29Toft M Sando SB Melquist S Ross OA White LR Aasly JO Farrer MJ LRRK2 mutations are not common in Alzheimer's disease.Mech Ageing Dev. 2005; 126: 1201-1205Crossref PubMed Scopus (26) Google Scholar In brief, guidelines given in the International Classification of Diseases (ICD-10) were applied for diagnosing of dementia, with patients who received the diagnosis of AD fulfilling National Institute of Neurological and Communicative Diseases and Stroke/Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) criteria. Controls from the same geographic area were recruited from societies for retired people or were spouses of patients with dementia. All controls had subjective good memory and no first-degree relatives with dementia, and diagnosis was done using a brief interview. For the Greek cohort, patients with AD were recruited from the Department of Neurology, University of Thessaloniki. Patients fulfilled the NINCDS-ADRDA criteria for probable AD after clinical examination and neuropsychological testing. Healthy controls were the patients' spouses and were cognitively intact as assessed by neuropsychological examination. For sudden infant death with dysgenesis of the testes syndrome (SIDDT [MIM 608800]), samples were genotyped and pooled as part of a separate study. Previously generated data were used to provide additional validation metrics for analysis procedures.23Craig DW Huentelman MJ Hu-Lince D Zismann VL Kruer MC Lee AM Puffenberger EG Pearson JM Stephan DA Identification of disease causing loci using an array-based genotyping approach on pooled DNA.BMC Genomics. 2005; 6: 138Crossref PubMed Scopus (57) Google Scholar, 30Puffenberger EG Hu-Lince D Parod JM Craig DW Dobrin SE Conway AR Donarum EA Strauss KA Dunckley T Cardenas JF et al.Mapping of sudden infant death with dysgenesis of the testes syndrome (SIDDT) by a SNP genome scan and identification of TSPYL loss of function.Proc Natl Acad Sci USA. 2004; 101: 11689-11694Crossref PubMed Scopus (115) Google Scholar GenePool is written in C++ (gpextract) and C (gpanalyze). These programs can be run individually using command line Unix. GenePool can be downloaded from the GenePool Web site. The software is currently provided as a precompiled binary for X86-Linux, and as source code. Man pages for all executables are bundled in both source and binary distributions and are also available from the GenePool Web site in PDF and HTML formats for online viewing. The SIDDT10K data set for Affymetrix is provided for download. In the Affymetrix platform, each SNP is interrogated by 6–10 probe quartets, where each quartet contains a perfect match (PM) probe for the A allele, a PM probe for the B allele, a mismatch (MM) probe for the A allele, and an MM probe for the B allele. A relative allele score (RAS) is calculated for each quartet. We considered each RAS1..10 to be an independent measure of allele frequency, where i refers to a quartet. RAS is equivalent to the ratio of A allele to A and B alleles for PM probes. That is, RASi=1..10=PMAi/(PMAi+PMBi). In the Illumina platform, each SNP is interrogated by a variable number of beads, with an average of 16 beads per SNP on an Illumina 300K HumanHap BeadChip. Unlike the Affymetrix platform, beads are assumed to have similar hybridization, and RAS is a one-dimensional vector (i=1). For each bead, red and green channel data corresponding to the two SNP alleles are acquired and are stored in 10 text files within a BeadStation-specified data folder. Both channels undergo a simple normalization by dividing the overall mean intensity value for that channel, because of the observation that the green channel has overall greater intensity than that of the red channel. We recognize that future research efforts may lead to development of more-advanced global normalization methods, noting that calculation of RAS values provides for SNP-specific normalization. Lastly, if any single SNP is probed by fewer than five beads, this SNP is discarded because this SNP measurement will have high variability due to under sampling. Typically, fewer than a few hundred SNPs were discarded because of this filter. Multiple test statistics were evaluated for their effectiveness in ra
Referência(s)