Power to the People: Addressing Big Data Challenges in Neuroscience by Creating a New Cadre of Citizen Neuroscientists

Artigo Acesso aberto Revisado por pares

Power to the People: Addressing Big Data Challenges in Neuroscience by Creating a New Cadre of Citizen Neuroscientists

2016; Cell Press; Volume: 92; Issue: 3 Linguagem: Inglês

10.1016/j.neuron.2016.10.045

ISSN

1097-4199

Autores

Jane Roskams, Zoran Popović,

Tópico(s)

Mobile Crowdsensing and Crowdsourcing

Resumo

Global neuroscience projects are producing big data at an unprecedented rate that informatic and artificial intelligence (AI) analytics simply cannot handle. Online games, like Foldit, Eterna, and Eyewire—and now a new neuroscience game, Mozak—are fueling a people-powered research science (PPRS) revolution, creating a global community of “new experts” that over time synergize with computational efforts to accelerate scientific progress, empowering us to use our collective cerebral talents to drive our understanding of our brain. Global neuroscience projects are producing big data at an unprecedented rate that informatic and artificial intelligence (AI) analytics simply cannot handle. Online games, like Foldit, Eterna, and Eyewire—and now a new neuroscience game, Mozak—are fueling a people-powered research science (PPRS) revolution, creating a global community of “new experts” that over time synergize with computational efforts to accelerate scientific progress, empowering us to use our collective cerebral talents to drive our understanding of our brain. Understanding how our brain works—and how we use this to tackle a myriad of brain diseases—is one of the biggest challenges facing humanity. Beyond the long-standing genomics and proteomics databases, large-scale federal initiatives, like the BRAIN initiative and the Precision Medicine Initiative (PMI), are stimulating innovative research producing complex datasets containing the unique patterns that hold the clues to our understanding of human disease and brain function (Bargmann and Newsome, 2014Bargmann C.I. Newsome W.T. JAMA Neurol. 2014; 71: 675-676Crossref PubMed Scopus (43) Google Scholar). Significant computational power and expertise are required to tackle this growing big data challenge, and modelers are working around the clock to stay ahead of the staggering amount of data being generated globally (Sejnowski et al., 2014Sejnowski T.J. Churchland P.S. Movshon J.A. Nat. Neurosci. 2014; 17: 1440-1441Crossref PubMed Scopus (166) Google Scholar). Bioinformaticians and modelers emerging from the systems’ biology, math, and physics communities—coupled with the expansion of the technology sector in the last decade—have produced tools to help us begin to crack the brain code. However, they have yet to span the sizeable gulf between the scientists producing the data and the planet full of people who may be capable of helping to analyze it. Overcoming the challenge of data hosting and storage developing of an international cloud interface is proposed elsewhere (Neuro Cloud Consortium, 2016Neuro Cloud ConsortiumNeuron. 2016; 92 (this issue): 621-627Google Scholar). Here, we are more concerned with how we will find needles of understanding in the data haystack. Advice on how to get there comes from a popular source, President Barack Obama (Obama, 2016, Frontiers conference): “The most important curator to be able to sort through what’s true and false and sustain those scientific values is the human brain.” Historically, solving large-scale data challenges has required relying on a highly trained data science and computational workforce. A new global workforce is now emerging that reaches out to tap into the ingenuity of people who are interested in solving scientific problems but whose talents often lie beyond the fringes of classical academic disciplinary training. This idea of “crowdsourcing” to develop new access routes to engage anyone with motivation and a computer to help science tackle its biggest scientific challenges is finally entering the mainstream across multiple disciplines (reviewed in Saez-Rodriguez et al., 2016Saez-Rodriguez J. Costello J.C. Friend S.H. Kellen M.R. Mangravite L. Meyer P. Norman T. Stolovitzky G. Nat. Rev. Genet. 2016; 17: 470-486Crossref PubMed Scopus (90) Google Scholar). This idea of using crowdsourcing talent to bring “non-traditional experts” together with experts and finding unexpected talent to discover novel solutions to scientific challenges is not new. The award-winning novel Longitude depicts the story of unknown clockmaker John Harrison, who was awarded the British Board of Longitude Prize (in 1714) for inventing the prototype of the marine chronometer, solving what was undoubtedly the most important technological challenge in navigation for the British Empire—determining the longitude of a ship at sea. Similarly (depicted in Oscar-winning film The Imitation Game), the development of the Enigma machine occurred after Alan Turing recruited creative problem solvers (many of whom were women whose career paths had not allowed them access to the hallowed halls of academia) by placing puzzles in newspapers to recruit “lay” alternative thinkers to the teams of mathematicians he had assembled. Collectively, experts and non-experts collaborated and were instrumental in turning the tide in WWII. We all know talented people who could probably change the world if only they had the opportunity; insight and inventiveness are not the exclusive domain of those tenacious enough to be awarded a Ph.D. The broad global reach of the internet has opened up a superhighway of opportunity to people previously excluded from academic science and allowed us to design ways to take large-scale science challenges to the global marketplace instead of hoping they will find us. This growing field—citizen science—engages people from beyond the traditional academic arena either to collect and generate data or to contribute as individuals or teams to analyze it. In the US, the charge to expand citizen science (people-powered research science [PPRS]) to fuel innovation has been led by President Obama. In addition to installing a rain gauge for a citizen science climate project in the First Lady’s garden, the Office of Science Technology Policy (OSTP) has developed a toolkit to assist scientists in entering citizen science (https://www.whitehouse.gov/blog/2014/12/02/designing-citizen-science-and-crowdsourcing-toolkit-federal-government) and a website (https://www.citizenscience.gov/) for expanding the practice. This website hosts links to over 300 different citizen science projects in extremely diverse areas, such as astronomy, environmental conservation, genomics, and technology innovation. A forum hosted by the OSTP at the White House, “Open Science and Innovation: Of the People, By the People, For the People,” recently drew together a diverse array of scientists representing projects in conservation, climate science, environmental monitoring, proteomics, and neuroscience to share approaches, tools, challenges, and best practices and to combine expertise to create a synergy that could bring crowdsourcing more broadly into the scientific mainstream (https://www.whitehouse.gov/blog/2015/09/30/accelerating-use-citizen-science-and-crowdsourcing-address-societal-and-scientific). This forum highlighted how science and society can progress on multiple levels symbiotically if we build the inroads that empower learners of all ages to engage in solving society’s biggest problems, and we re-direct their findings back into the scientific mainstream. At the White House Frontiers conference on October 13, 2016 that he hosted as the guest editor of Wired magazine, the President also highlighted how opening big data into the right hands can drive progress in scientific, educational, and economic ways:Whether it’s releasing big data—and the easiest example, I think, for the general public to think about is all the apps that now give us the weather over our phones, and those are all generated from inside government, but what used to be closed data now we let out there. Well, it turns out that we’ve got huge data sets on all kinds of stuff. And the more we’re opening that up and allowing businesses, individuals, to work with that information I think the more they feel empowered. And that makes a huge difference.—President Barack Obama By taking scientific challenges online in stimulating ways, we tap into the intellectual energy of the planet by creating access to people of all ages who have the potential for making discoveries from open shared data. In doing so, the brightest minds from all societies and vocations have the chance to contribute, vastly expanding the diverse talent pool that we need to come up with new solutions to our biggest questions. As a result of creating online avenues, we level the scientific playing field, science progresses faster, and everyone gets to play a role and learn, reflected again in the words of President Obama (on online science tools and STEM, October 2016): “We are working to help all of our children understand that they, too, have a place in science and tech—not just boys in hoodies, but girls on Native American reservations and kids whose parents can’t afford personal tutors. We want Jamal and Maria sitting right next to Jimmy and Johnny—because we don't want them overlooked for a job of the future.” Two pioneering approaches to accelerate and expand scientific discoveries in astronomy (Galaxy Zoo; https://www.galaxyzoo.org/) and protein folding (Foldit; http://www.fold.it) have led the way in taking crowdsourcing in the hard science arena to a new level. Galaxy Zoo—launched in 2007 by Chris Lintott (Oxford University) and colleagues—originally began by taking almost one million images generated from the Sloan Sky survey and making them available to participants to begin to classify. By engaging renowned astronomers in developing a tutorial of identification and morphological tracing and classification, players across the globe were able to become adept at identifying different classes of galaxies depicted by celestial images (Clery, 2011Clery D. Science. 2011; 333: 173-175Crossref PubMed Scopus (62) Google Scholar). Initially, simply aimed at identification and classification, the site received a deluge of over 70,000 classifications/hr within 24 hr of launch, and—with multiple eyes on each galaxy—classifications were deemed equal or superior to those of professional astronomers. The type of problem was ideally suited for crowdsourcing because people can easily observe images and find specific features with minimal training. The entire community continues to rapidly piece together an understanding of the universe, accelerating the pace of discovery far beyond the capabilities of the astronomy community. The project has since enabled people to view different types of datasets contributed from a wider variety of sources using different technologies (including recent publications from the Hubble telescope). Galaxy Zoo has produced over 50 publications from crowdsourced data, opened new avenues of discovery for astronomers, and inspired a whole new generation of amateur astronomers with the power to explore the universe through the eye of digital data without needing high-tech telescopes (https://www.zooniverse.org/about/publications). The Galaxy Zoo team has now branched out to host a variety of scientific crowdsourcing challenges on their Zooniverse website (https://www.zooniverse.org/). Many other scientific problems require sophisticated problem-solving skills that are honed through years of training during college and graduate school. Foldit, a biochemistry scientific discovery game, has shown that very deep expertise can be collectively developed with people without any prior knowledge of the scientific discipline. While the path to expertise is drastically decreased compared to the standard academic track, people still need time to hone highly specialized skills aimed at solving a specific science problem. To enable this expertise development, Foldit was developed as an online game to maximize the long-term engagements of “solvers” who collaborate on solving problems in proteomics in order to develop their expertise (Horowitz et al., 2016Horowitz S. Koepnick B. Martin R. Tymieniecki A. Winburn A.A. Cooper S. Flatten J. Rogawski D.S. Koropatkin N.M. Hailu T.T. et al.Nat. Commun. 2016; 7: 12549https://doi.org/10.1038/ncomms12549Crossref PubMed Scopus (25) Google Scholar). By using game design elements to provide a personalized and scaffolded way to build scientific skills, Foldit has developed experts that outperform world-class laboratories in predicting protein structures and even designing new proteins not yet invented by nature. Developed at the Center for Game Science at the University of Washington (UW) with proteomics expertise from the lab of David Baker, a UW Howard Hughes Medical Institute (HHMI) investigator, the Foldit game is continually evolving to maximize the collective problem-solving ability of its community. Foldit players have been successful in solving a range of proteomics challenges that predict the three-dimensional state of proteins. The game presents protein-folding problems as weekly puzzles for players to solve and gives players a variety of computational tools, collaboration mechanisms, and visualizations at their disposal. Updates to the game allow the development team to quickly react to the analysis of the results of scientific experiments in the game as well as player engagement metrics. The game is effectively allowing co-adaptation of computational tools (Foldit game) and game-developed experts so that together they can solve increasingly more challenging scientific problems. The game was released in 2008 and since then has had over a half million players. The success of Foldit has depended on the active and ongoing engagement of the scientific community—an active protein-folding community of experts and “new experts” is centered on the game, some members of which have been involved for years. The game has attracted and retained players from a wide variety of science and non-science backgrounds; 18 of the 20 highest-ranked Foldit players had not had any experience with biochemistry prior to playing the game, attesting to the game’s ability to develop experts from novices. It also produces results that continue to have impact even years later: 57,000 of Foldit’s players have produced useful results that matched or outperformed algorithmically computed solutions. Foldit players have achieved and led a number of significant discoveries, many of which have been published in some of the most respected scientific journals (e.g., Nature and PNAS) (Cooper et al., 2010Cooper S. Khatib F. Treuille A. Barbero J. Lee J. Beenen M. Leaver-Fay A. Baker D. Popović Z. Players F. Nature. 2010; 466: 756-760Crossref PubMed Scopus (895) Google Scholar, Gilski et al., 2011Gilski M. Kazmierczyk M. Krzywda S. Zábranská H. Cooper S. Popović Z. Khatib F. DiMaio F. Thompson J. Baker D. et al.Acta Crystallogr. D Biol. Crystallogr. 2011; 67: 907-914Crossref PubMed Scopus (19) Google Scholar, Khatib et al., 2011aKhatib F. Cooper S. Tyka M.D. Xu K. Makedon I. Popović Z. Baker D. Players F. Proc. Natl. Acad. Sci. USA. 2011; 108: 18949-18953Crossref PubMed Scopus (326) Google Scholar). For example, game-based experts resolved a long-standing scientific problem: in less than 10 days, the Foldit community was able to resolve the structure of an HIV coat protein that had remained unsolved by scientists for 15 years (Khatib et al., 2011bKhatib F. DiMaio F. Cooper S. Kazmierczyk M. Gilski M. Krzywda S. Zabranska H. Pichova I. Thompson J. Popović Z. et al.Foldit Contenders GroupFoldit Void Crushers GroupNat. Struct. Mol. Biol. 2011; 18: 1175-1177Crossref PubMed Scopus (335) Google Scholar). Perhaps more importantly, computational strategies from the teams engaged have been shown to be on par with state-of-the-art biochemistry optimization methods. Both of these are evidence that a well-designed science gaming platform can be truly revolutionary beyond its initial goals by creating a new cadre of domain experts to solve complex problems, in which human intuition and learning can outperform computer algorithms. By some estimates, Foldit has increased the worldwide proteomics research community by a factor of four. Increasing awareness of the prevalence of brain disorders in society has produced a potential global workforce that is highly motivated to learn about the brain and contribute to brain research, but most have little or no direct access to the corridors of neuroscience power that would give them the chance to do this. On the other hand, neuroscientists are generating terabytes a day of imaging, transcriptomic, and physiology data that they are increasingly willing to share openly. But there is a growing backlog in progress because of the lack of effective or freely available programs or manpower for analysis. Bringing these sides together around questions best suited for human insight and intuition will not only reduce the bottleneck and fuel progress, but also give us the chance to develop a whole new generation of neuroscientists of all ages. Global neuroscience becomes not just about sharing each other’s data, but combining our shared talents to decipher and understand it. Eyewire (http://eyewire.org/explore) was the first neuroscience game to achieve success in engaging an international community. Players were originally given densely detailed sequential electron microscopy (EM) images of the retina and had to move from one image to another, following the contour of neurons and labeling different parts to illuminate them within the matrix of the electron micrograph. Despite the relatively laborious nature of the task, since 2010, Eyewire has attracted the attention of over 250,000 players from 145 different countries, who have helped contribute to the first-ever 3D reconstruction of high-resolution networks of cells within the mouse retina (Helmstaedter et al., 2013Helmstaedter M. Briggman K.L. Turaga S.C. Jain V. Seung H.S. Denk W. Nature. 2013; 500: 168-174Crossref PubMed Scopus (632) Google Scholar). Over time, improved gaming techniques have enhanced this process to make the game more interactive and increase player retention. In doing so, Eyewire laid the groundwork for public online engagement in neuroscience research and developed the “ground truth” data needed to develop a new generation of artificial intelligence (AI) tools that are now a core essential component of large-scale projects to produce micro-scale mapping of the cortex. The early scientific success of Eyewire is not only because it engaged a broad community of people, but because its original data was produced using highly standardized protocols from a renowned lab. The next generation of Eyewire, Neo—due to be released in 2017—will move on from its success in the light-sensing retinal circuit to the termination of the visual pathway—V1 of the visual cortex. Clinical neuroscience has also benefited from crowdsourcing in the form of challenges (collaborative competitions), where the computational community is challenged to devise and submit proposed analyses to expert-provided patient-centric big data problems. The large databanks for challenges are made up of openly shared clinical data sourced from multiple sites hosted on a common platform and used to attract a broader community charged with developing new ways to find the gold in the midst of the data mines. In the biomedical research arena, this approach has been pioneered by the team at Sage Bionetworks (http://sagebase.org/), who has used its Synapse Open Data sharing platform to collaborate with IBM’s Gustavo Stolovitzky to host DREAM challenges (http://sagebase.org/challenges/). Their wide range of health data-driven challenges have recently included neurological disease. In two distinct DREAM ALS challenges to help advance our understanding of amyotrophic lateral sclerosis (ALS) progression, participants were challenged to develop new models to predict not just the future progression of disease in ALS patients based on the patient’s current disease status, but also to stratify (tinyurl.com/DREAMALS) based on clinical sub-criteria. Data were provided from more than 9,000 patient histories, including demographics, medical and family history data, functional measures, vital signs, and lab data. In total, this project comprised more than 100 scientists (about 20 organizers and 80 participants in 30 teams). Algorithms developed from these challenges have successfully produced predictive machine learning models that perform substantially better than clinical experts and can reduce the variance of the disease predictability so that new drugs that slow ALS can potentially be identified with smaller trials (Küffner et al., 2015Küffner R. Zach N. Norel R. Hawe J. Schoenfeld D. Wang L. Li G. Fang L. Mackey L. Hardiman O. et al.Nat. Biotechnol. 2015; 33: 51-57Crossref PubMed Scopus (142) Google Scholar). By finding new ways to analyze massive clinical data from 20 finished trials, the challenges also detected new biomarkers and defined new subgroups for stratification, i.e., to assign a patient to a disease subtype and determine perhaps whether he/she would be a responder or no responder for a given treatment. Their Alzheimer’s disease (AD) DREAM challenge, however, revealed some of the inevitable problems with using disparate datasets. Designed to provide an unbiased assessment of our current capability for estimating cognition and prediction of cognitive decline using genetic and imaging data from public data resources, 527 individuals from around the world registered to participate. 56 teams (each with one or more people) submitted models to the leaderboards, but—unlike the ALS challenge—predictive performance across all criteria was modest and less definitive (Allen et al., 2016Allen G.I. Amoroso N. Anghel C. Balagurusamy V. Bare C.J. Beaton D. Bellotti R. Bennett D.A. Boehme K.L. Boutros P.C. et al.Alzheimer’s Disease Neuroimaging InitiativeAlzheimers Dement. 2016; 12: 645-653Abstract Full Text Full Text PDF PubMed Scopus (57) Google Scholar). Rather than a failure of the submitted modeling methods, the analyses suggested that data used to address these questions were inadequate to support the tasks. A cautionary tale for assuming that all clinical data can be put into one pot, the challenge confirmed that mandates and guidelines on data sharing, considerations of standardized data collection and processing, and mechanisms to integrate heterogeneous data will be necessary to address issues related to highly complex diseases. How many different cells exist in a mouse or human brain? Although Santiago Ramón y Cajal was awarded the Nobel Prize over 100 years ago for his pioneering work in neuroanatomy, we still lack a modern classification of neuronal cell types based on a deep understanding of their 3D structure and related function (DeFelipe et al., 2013DeFelipe J. López-Cruz P.L. Benavides-Piccione R. Bielza C. Larrañaga P. Anderson S. Burkhalter A. Cauli B. Fairén A. Feldmeyer D. et al.Nat. Rev. Neurosci. 2013; 14: 202-216Crossref PubMed Scopus (537) Google Scholar). Led by the BRAIN initiative (Jorgenson et al., 2015Jorgenson L.A. Newsome W.T. Anderson D.J. Bargmann C.I. Brown E.N. Deisseroth K. Donoghue J.P. Hudson K.L. Ling G.S. MacLeish P.R. et al.Philos. Trans. R. Soc. Lond. B Biol. Sci. 2015; 370: 370Crossref Scopus (124) Google Scholar), new technologies from a variety of different global brain projects are fueling the production of many terabytes of data a day to answer this question, but neuroscientists are hitting successive roadblocks in combining and analyzing these datasets. Eyewire developed an approach for the highest (EM) resolution level of connectomics, but many other types of mega-data hold the answers to more extensive questions in brain structure and function; however, there are significant obstacles in the analysis and integration of these different data modalities. Even with new federal funding now available from multiple sources (e.g., BD2K, BRAIN) to support integrating data analysis efforts, the neuroscience community is not currently equipped to perform all the ground truth validation needed for basic data analysis before feeding it into more complex models. Can we begin by using a game-based neuroscience community approach to figure out how we bring together complex 3D morphologies of different neurons with their distinct genetic programs and activity of the circuits to which they belong? We believe that many neuroscience questions are amenable to a community-based approach to analysis, but given its novelty, a series of challenges must be overcome to make this practice more widespread (see Box 1).Box 1ChallengesNeuroscience problems that are amendable to the crowdsourcing approach fit into two main categories: (1) reconstruction of models from massive amounts of raw data, addressing the issue of increasing disparity between data collection speed and reconstruction speed, and (2) development of functional models of neural structures that use the structure and messaging information between neurons to explain different aspects of how brain works. Both of these require new experts to gain experience and expertise in many aspects of neuroscience scientific research pipeline so it is not enough to just attract new citizen scientists into the community, but to develop them over time into strong contributors to discovery. For either of these, we need to build a common infrastructure that can help every current neuroscience project with the new citizen neuroscience community. Since these two categories touch practically all aspects of neuroscience, they will require dedicated funding to realize. This includes:•The development of open common data sharing platforms that are easily accessible and navigable by experts and non-experts alike.•Establishment of data standards for “acceptability” of data from content partners that can be reviewed and applied in an unbiased way.•Fast feedback loops on the outcomes through open sharing of best practices across projects so that mistakes don’t get made, and repeated, and so that promising directions get pursued in parallel by the entire community.•The creation of a new apprenticeship and accreditation pathways for expertise developed and proven through contributions by citizen neuroscientists. There is no reason why strong citizen contributors should not get a paid for their amazing productivity and contribution to discoveries.•The creation of more rapid scaffolding and learning structures that can speed up expertise acquisition.•Development of a new breed of computer algorithms that symbiotically use the best of human and computer skills toward solving problems that neither humans nor computers can currently address.•The opening up of restrictions on equal internet access to people of all countries to ensure the success of crowdsourced science and global science. Neuroscience problems that are amendable to the crowdsourcing approach fit into two main categories: (1) reconstruction of models from massive amounts of raw data, addressing the issue of increasing disparity between data collection speed and reconstruction speed, and (2) development of functional models of neural structures that use the structure and messaging information between neurons to explain different aspects of how brain works. Both of these require new experts to gain experience and expertise in many aspects of neuroscience scientific research pipeline so it is not enough to just attract new citizen scientists into the community, but to develop them over time into strong contributors to discovery. For either of these, we need to build a common infrastructure that can help every current neuroscience project with the new citizen neuroscience community. Since these two categories touch practically all aspects of neuroscience, they will require dedicated funding to realize. This includes:•The development of open common data sharing platforms that are easily accessible and navigable by experts and non-experts alike.•Establishment of data standards for “acceptability” of data from content partners that can be reviewed and applied in an unbiased way.•Fast feedback loops on the outcomes through open sharing of best practices across projects so that mistakes don’t get made, and repeated, and so that promising directions get pursued in parallel by the entire community.•The creation of a new apprenticeship and accreditation pathways for expertise developed and proven through contributions by citizen neuroscientists. There is no reason why strong citizen contributors should not get a paid for their amazing productivity and contribution to discoveries.•The creation of more rapid scaffolding and learning structures that can speed up expertise acquisition.•Development of a new breed of computer algorithms that symbiotically use the best of human and computer skills toward solving problems that neither humans nor computers can currently address.•The opening up of restrictions on equal internet access to people of all countries to ensure the success of crowdsourced science and global science. The 3D shape of a neuronal soma with dendritic and axonal arbors is a core representation of its identity (phenotype), connectivity, synaptic integration, firing properties, and—ultimately—its role in the neural circuit, but we still lack robust ways to analyze even “normal” neurons in sufficient quantities to develop a frame of reference for when things are not working. This has certainly been aided by online resources, like Neuromorpho (http://neuromorpho.org/), that have provided a place for individual labs to upload their images and reconstructions, but we still lack a common platform for robust comparative analysis. Understanding neuronal diversity through capturing 3D morphology in vivo, and classifying each neuron according to form and function, will make a fundamentally important contribution toward our understanding of brain function and dysfunction. This is why a detailed classification of neuronal subtypes—including morphology and transcriptomics—is one of the lead programs of the BRAIN initiative

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Power to the People: Addressing Big Data Challenges in Neuroscience by Creating a New Cadre of Citizen Neuroscientists