Emerging technologies and radical collaboration to advance predictive understanding of watershed hydrobiogeochemistry
2020; Wiley; Volume: 34; Issue: 15 Linguagem: Inglês
10.1002/hyp.13807
ISSN1099-1085
AutoresSusan S. Hubbard, Charuleka Varadharajan, Yuxin Wu, Haruko Wainwright, Dipankar Dwivedi,
Tópico(s)Soil and Water Nutrient Dynamics
ResumoIncreasing population and resource-intensive lifestyles are driving enhanced demands for clean water, food, and energy. In parallel, land-use change, climate change, and perturbations—including drought, floods, fires, and early snowmelt—are significantly reshaping interactions within watersheds throughout the world. While watersheds are the Earth's key functional unit for assessing and managing water resources, hydrological processes in watersheds also mediate biogeochemical interactions that support terrestrial life on Earth (Kaushal, Gold, Bernal, & Tank, 2018; National Research Council, 2012). Although society is dependent upon clean water availability, tractable prediction of watershed hydrobiogeochemical behavior, including watershed response to perturbations, remains a challenge. Central to the challenge are complex, multiscale interactions between plants, microorganisms, organic matter, minerals, dissolved constituents, and migrating fluids, which occur within and across bedrock-to-canopy compartments and along extensive lateral gradients of a watershed. Several recent community reports have synthesized formidable challenges associated with watershed science and technology (AGU, 2018; Blöschl et al., 2019). Here, we discuss emerging technologies and collaboration modes that are critical for developing generalizable insights about and predictive understanding of complex watershed hydrobiogeochemical behavior, which are important for underpinning optimized natural resource management. Recent developments in field observatories and open-science principles provide foundational pillars for advancing predictive understanding of watershed hydrobiogeochemistry using emerging technologies. Field observatories have fostered crossdisciplinary collaboration and provided platforms for quantifying hydrological, biological, geological, geochemical, and atmospheric processes and their couplings (Bogena, White, Bour, Li, & Jensen, 2018). Observatory networks in the United States include the Critical Zone Observatories (Brantley et al., 2017), National Ecological Observatory Network (Loescher, Kelly, & Lea, 2017), the Long-Term Ecological Research Network (Hobbie, Carpenter, Grimm, Gosz, & Seastedt, 2003), and the Department of Energy (DOE) Watershed Network (U.S. DOE, 2019). Select international observatory networks include the German Terrestrial Environmental Observatories (Zacharias et al., 2011), the French OZCAR network (Gaillardet et al., 2018), and the Chinese observatories (Li et al., 2013). The observatories are complemented by long-term distributed measurement suites, such as the US Geological Survey stream discharge and concentration measurements (NASEM, 2018a) and the DOE AmeriFlux network carbon, water, and energy flux measurements (Novick et al., 2018). Open-science concepts (NASEM, 2018a), which have recently started to permeate watershed science, provide another foundational pillar. While the open-data FAIR (findable, accessible, interoperable, and reusable) principles (Wilkinson et al., 2016) are perhaps the most recognized aspect of open-science, open-science concepts are also critical for generating and sharing data, knowledge, and models in a manner that promotes transferability and generalizability across watershed networks (U.S. DOE, 2019). Several emerging technologies hold potential to greatly enhance predictive understanding of watershed hydrobiogeochemical behavior, including machine learning (ML) and artificial intelligence, exascale computing, 5G wireless communications, and cloud data storage and compute capacity. Deep learning using large neural networks with multilayered structures can identify abstract concepts about datasets (Schmidhuber, 2015); these methods have been useful in other fields to discover physical concepts from data. Several recent publications have also illustrated the potential of ML for advancing Earth sciences (Bergen, Johnson, Maarten, & Beroza, 2019; Nearing et al., 2020; Reichstein et al., 2019; Shen, 2018). Exascale computing will provide computing systems capable of at least a quintillion (or billion billion) calculations per second, representing a thousand-fold increase over the first petascale computers that came into operation about a decade ago. Teams of scientists are currently developing exascale-ready codes for several applications (Alexander et al., 2020), including for Earth sciences (Johansen et al., 2017). In addition, cloud platforms (such as Google Earth Engine and Amazon Web Services) are enabling an era of "Big Data" in the Earth sciences by providing on-demand storage, networking, and mid-range computing capacity. These developments are already spurring advances in geospatial data processing and bioinformatics (Yang, Huang, Li, Liu, & Hu, 2017). 5G technology refers to the fifth generation of digital wireless communication technologies, which will offer significantly enhanced data rates and low latency. 5G will also offer enhanced connectivity of massively parallel Internet of things devices—up to 1 million per square kilometer (Forbes, 2019). Below, we provide examples of how emerging technologies are starting to be used to advance three key elements: watershed hydrobiogeochemical characterization, data management and informatics, and modeling. The desire to characterize and monitor watershed hydrobiogeochemical dynamics at increasing spatial and temporal resolutions has driven an explosion of observational technologies and platforms. For example, fiber-optic distributed sensors can now autonomously measure temperature and strain with very high spatiotemporal resolution in terrestrial and aquatic systems (Ajo-Franklin et al., 2017; Joe, Yun, Jo, Jun, & Min, 2018; Slater et al., 2010), and fiber-based approaches for sensing chemical and biological properties are in development (Ding et al., 2015; Lu, Thomas, & Hellevang, 2019). New sensing strategies are being tested to noninvasively monitor active plant-root functions in situ (Benjamin et al., 2020; Peruzzo et al., 2020) and to monitor nutrient fluxes (MacDonald, Levison, & Parker, 2017). Autonomous unmanned aerial vehicles (UAVs) equipped with various instruments can now sense previously difficult-to-reach environments in high-resolution. UAV approaches complement airborne and satellite imaging strategies (McCabe et al., 2017), all enabled through cloud storage. Remotely sensed, spatially distributed "data layers" of watershed compartments and associated properties now enable creation of 3D bedrock-to-canopy watershed "digital twins" (Wainwright et al., 2019). Networked sensing systems can now coincidentally and autonomously monitor bedrock-through-canopy fluxes (Dafflon et al., 2017), providing a "window" to remotely track fluxes across watershed compartments. ML is starting to show promise for advancing watershed characterization using these and other diverse datasets (Ahmad, Kalra, & Stephen, 2010; Oroza, Zheng, Glaser, Tuia, & Bales, 2016). One promising approach has focused on using ML-based spatial clustering approaches for characterizing functional zones: parcels within a landscape that have unique distributions of properties relative to neighboring regions that influence how that zone functions from a hydrobiogeochemical perspective. The approach can include geomorphorphic, topographic, vegetation, hydrogeologic, geochemical, and other properties that may influence behavior. For example, Hubbard et al. (2013) and Wainwright et al. (2015) used ML with geophysical and other datasets to identify functional zones in an Arctic tundra and to quantify the zone-based property suites important for carbon fluxes. Wainwright et al. (2019) used a suite of remotely sensed bedrock-through-canopy watershed data layers to identify zonation within a mountainous watershed. Current investigations are focused on quantifying how properties associated with distinct functional zones govern water and nitrogen export in response to snow dynamics and in turn contribute to the aggregated watershed concentration-discharge signature (Hubbard et al., 2018). Management, sharing, and reuse of environmental data have greatly increased over the past decade in response to their increasing volume, diversity, and complexity (Blair et al., 2019; Rode et al., 2016). In parallel, advances in environmental data infrastructure, cloud computing, and ML have dramatically improved the ability to store and utilize diverse data for watershed science. Data from research efforts are becoming available through open-data movements based on FAIR principles, which advocate the use of metadata and standards. Data from operational and experimental monitoring networks are now widely available through data systems such as the USGS National Water Information System, the US interagency water quality portal (Blodgett, Read, Lucido, Slawecki, & Young, 2016), and the European Union's Water Information System for Europe (Hering et al., 2010). Several repositories now enable scientists to easily archive and publish data with essential metadata and provide easy-to-use watershed data access mechanisms, such as interactive portals and web service application programming interfaces. Examples in the US include the NSF-supported Hydroshare (Horsburgh et al., 2016) for hydrological data, and the DOE-supported ESS-DIVE (Varadharajan et al., 2019) for watershed hydrological and biogeochemical data from experimental and modeling research. In parallel, cyberinfrastructure for synthesizing and processing these increasing amounts of data are being developed. For example, Varadharajan et al. (2019) recently demonstrated the utility of watershed-centric data infrastructure and tools using diverse hydrological, climate, geochemical, and biological data from a mountainous watershed in Colorado. The end-to-end data pipeline required development and use of several novel methods, including new software that allowed seamless, real-time integration of data stemming from a variety of sources with differing metadata, source formats, and vocabularies, through the use of a data broker (BASIN-3D). BASIN-3D, which also provides tools for semiautomated QA/QC, has enabled rapid synthesis of watershed time-series observations and interactive visualizations (Hubbard et al., 2018). The Consortium of Universities for the Advancement of Hydrologic Science hydrologic information system pulls time-series data from over 95 sources, which can be accessed via the Hydroclient interactive portal and the WaterOneFlow web services (Horsburgh et al., 2016). Other efforts, such as Pangeo (an ecosystem of open-source, interoperable, scalable tools), are enabling big data integration for HPC using cloud services (Eynard-Bontemps, Abernathey, Hamman, Ponte, & Rath, 2019). The availability of these data and tools is enabling a new paradigm wherein data-driven methods are being used to probe hydrobiogeochemical scaling, similarity and function, as well as to generate and test new hypotheses (Peters-Lidard et al., 2017). Classical methods, such as statistical time-series analyses and concentration-discharge relationships, are being applied to large, regional data products to determine streamflow and water quality trends across catchments with different characteristics (Godsey, Hartmann, & Kirchner, 2019; Murphy & Sprague, 2019). Data mining and classification algorithms, such as ensemble decision trees and unsupervised clustering, can recognize patterns in coupled human–natural system behavior and watershed response to disturbance (Hamshaw, Dewoolkar, Schroth, Wemple, & Rizzo, 2018; Smith, Knight, & Fendorf, 2018). The application of mutual information theory and causal inference approaches is being used to derive explanatory relationships between environmental variables and hypothesis testing (Goodwell & Kumar, 2017; Nearing, Ruddell, Bennett, Prieto, & Gupta, 2020). While process-based, integrated numerical models have been used to predict watershed hydrological behavior (Fatichi et al., 2016; Maxwell et al., 2014; Troch, Carrillo, Sivapalan, Wagener, & Sawicz, 2013), a significant challenge remains to develop computationally efficient capabilities that also incorporate reactive transport. Advancing a robust predictive understanding of watershed hydrobiogeochemical behavior requires numerical representation and coupling of hydrological and biogeochemical processes—from reaction (mm–cm) to watershed (km) scales and across bedrock-to-canopy compartments and terrestrial–aquatic interfaces (Bao, Li, Shi, & Duffy, 2017; Li, 2019; Li et al., 2017; Steefel, 2019; Troch et al., 2009). Here, we describe the potential for emerging technologies and strategies to meet this objective, including model interoperability, computational meshing strategies and architectures, and ML-based approaches. There are tremendous opportunities to advance interoperability between models that solve for integrated (surface–subsurface) hydrology and reactive transport. While hydrology and reactive transport modeling have historically evolved along different paths, disparate communities have worked together in recent years to couple hydrologic and reactive transport models. Examples of models used for such coupling include PFLOTRAN (Hammond & Lichtner, 2010), advanced terrestrial simulator (ATS; Coon, Moulton, & Painter, 2016), ParFlow (Kollet & Maxwell, 2008), and CrunchFlow (Steefel et al., 2015). While these and other codes each offer certain strengths, none can currently simulate full multiscale, multiphysics watershed reactive transport (Dwivedi et al., 2016; Steefel et al., 2015). Recently, adoption of model interoperability approaches has enabled researchers to take advantage of select strengths of different codes to enhance predictive understanding (Heroux et al., 2020). For example, while ATS can simulate coupled surface–subsurface flow but not reactive transport, new interfaces now allow ATS to access the reaction engines of either the PFLOTRAN or CrunchFlow (Coon et al., 2016) to enable simulation of surface–subsurface hydrobiogeochemical processes in watersheds. Following the hydrological modeling community's lead in considering the need for hyperresolution (0.1–1 km resolution; Bierkens et al., 2015), efforts are underway to advance the ability to similarly simulate watershed reactive transport in high resolution using leadership-class supercomputers. For example, Dwivedi, Arora, Steefel, Dafflon, and Versteeg (2018) used PFLOTRAN to numerically examine hot spots and hot moments influencing nitrogen cycling in a small watershed floodplain in 3D and with meter-scale spatial resolutions. This simulation required 250,000 CPU hours (or 24 wall clock hours) on the National Energy Research Scientific Computing Center supercomputer to simulate processes occurring over one water year. As exascale computers come online, the reaction calculations are expected to undergo a substantial speed up, rendering watershed-scale reactive transport and associated uncertainty quantification strategies not only feasible, but practical from a research perspective. As exascale simulation capabilities are not expected to be accessible to the wider community and practitioners for many years, several strategies hold potential for reducing the computational burden required for simulating watershed reactive transport using current leadership-class computers. For example, approaches for adjusting the resolution in computational grids, such as including static and adaptive mesh refinement methods (AMR; Blayo & Debreu, 1999; Wang, Liu, & Kumar, 2018), offer potential for simulating small-scale reactive hotspots and their influence on larger system behavior in a computationally efficient manner. The ability to employ variable resolution in mechanistic watershed reactive transport models, allowing codes to "telescope" into regions that are rapidly evolving, may provide a path forward for balancing accuracy and tractability associated with stimulating watershed reactive transport processes. AMR approaches have recently been developed to improve the efficiency of watershed hydrologic simulations (Wang et al., 2018) and to enable simulation of coupled hydrobiogeochemical processes (Özgen-Xian et al., 2020). Recently, data-driven approaches have gained momentum in watershed modeling because of their computational efficiency and agility to incorporate diverse and multiscale data that are often difficult to incorporate into current process-based models (Shen, 2018). The value of ML for watershed modeling has been illustrated by applications focused on streamflow prediction (Kratzert, Klotz, Brenner, Schulz, & Herrnegger, 2018), early warning of droughts and floods (Mosavi, Ozturk, & Chau, 2018; Park, Im, Jang, & Rhee, 2016), groundwater level fluctuations (Müller et al., 2019), and chemical equilibrium calculations (Leal, Kulik, & Saar, 2017). However, as data-driven models are developed directly from observations, their effectiveness is limited when data are sparse. Data-driven modeling also does not provide insight about processes, which limits transferability of results. A strategy for taking advantage of increasing data availability while honoring mechanistic process representation in a computationally efficient manner is hybrid modeling, also known as physics-based ML (Bergen et al., 2019; Reichstein et al., 2019). Hybrid-modeling strives to marry complementary aspects of mechanistic process-based models and ML, data mining, and genetic algorithms. Given the complexity of hydrobiogeochemical data and processes that occur across scales and compartments of a watershed, this strategy holds significant promise for advancing predictive understanding of watershed hydrobiogeochemical behavior. Nearing, Kratzert, et al. (2020) advocated the importance of integrating ML into hydrological workflows. We contend that ML, and in particular the hybrid modeling strategy, holds significant potential for advancing prediction of watershed hydrobiogeochemistry. Early research is illustrating the promise of ML, 5G, computational strategies and architectures, and cloud-based technologies for improving watershed characterization, data handling, and modeling. We envision a future where the emerging technologies will be able to seamlessly unify sensing systems, data infrastructure, and computational tools to allow near real-time, autonomous communication, and feedback. To realize this vision, "codesign" strategies are needed whereby watershed sensing, data, and modeling systems are "born" to communicate with each other across multiple scales (Varadharajan et al., 2019). Enabled by the emerging technologies, codesign strategies hold potential for rapid synthesis and assimilation of increasingly diverse and autonomous data streams into data systems and models, and near real-time feedback from models to observing systems, including instructions regarding what data should be collected where. We urge the community to work together to advance observation-data-modeling system codesign strategies, with an aim to improve watershed characterization and prediction within and across observatories, and eventually enable near real-time information for resource managers. We recognize that building and maintaining elaborate codesign strategies is challenging for individual research efforts, and that incorporation of technologies alone will not allow us to address the many existing watershed scientific questions. This brings us to a second recommendation. We submit that new modes of "radical" collaboration are needed to facilitate teams-of-teams to work together in a coordinated fashion across watershed networks using "Open Science by Design" principles (NASEM, 2018b; U.S. DOE, 2019). The collaborations could focus on addressing specific watershed science questions; advancing common measurements, models and codesign strategies; and discovering generalizable metrics and transferable insights. For example, collaborations working across diverse, globally distributed watersheds that span hydroclimatic and property gradients could investigate questions such as: how do different types of watersheds respond to different stressors, such as climate change, droughts, floods, wildfire, and land-use? How will multiple stressors impact sustainability of water, food, energy systems that rely on water? Can generalizable metrics of resilience be identified and tracked? What is the minimum but sufficient amount of information needed to predict watershed behavior at temporal and spatial scales critical for underpinning resource management decisions? In addition to working across gradients, international collaborations could use distributed watersheds to address more focused questions relevant to specific types of watersheds, such as pristine mountainous watersheds, contaminated watersheds, agriculture-dominated watersheds, or urban watersheds. Figure 1 illustrates how investigations carried out across geographically distributed mountainous or agriculture-dominated watersheds could be useful for exploring the influence of various stressors on the functioning of these systems, and the associated impacts on water supply and water quality, power generation, agricultural productivity, and other societal benefits. Systematic incorporation of emerging technologies and adoption of radical modes of collaboration require substantial coordination, resources and commitment to overcome technical, social, and organizational barriers. We are encouraged by the many recent efforts focused on advancing collaborations and tools across watershed communities, observatories, and government agencies (U.S. DOE, 2020; NSF critical zone collaborative network opportunity; U.S. DOE, in prep.). We are also encouraged by a new generation of watershed scientists who embrace open watershed science philosophies and approaches to advance connectivity across scales, sites, disciplines, and nations (Arora et al., 2019; Wymore et al., 2017). As resource managers struggle to make increasingly difficult decisions in the coming decades, we envision that the emerging technologies and radical collaborations described here will mobilize the scientific enterprise toward providing actionable information over space and time scales useful for such decisions (Figure 1). This material is based upon work supported as part of the Watershed Function Scientific Focus Area and the Early Career Research Program funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under Award Number DE-AC02-05CH11231.
Referência(s)