Artigo Acesso aberto Revisado por pares

Science gateway workshops 2014 special issue conference publications

2015; Wiley; Volume: 27; Issue: 16 Linguagem: Inglês

10.1002/cpe.3615

ISSN

1532-0634

Autores

Sandra Gesing, Nancy Wilkins‐Diehr,

Tópico(s)

Research Data Management Practices

Resumo

Science gateways are a solution for user communities to access applications and data via a graphical user interface. These graphical user interfaces hide the underlying infrastructure, as far as feasible and as far as desired by the users. In general, science gateways offer a single point of entry to create and/or analyze domain-specific data. Their core goal is to increase the usability and accessibility of computational tools and digitized data as well as to leverage reproducibility of scientific processes. While the user interfaces are especially tailored to the specific demands of a user community, the underlying infrastructures, for example, national or international distributed computing infrastructures (e.g., XSEDE), are mainly applicable for a wide range of use cases. Thus, science gateway frameworks and science gateway APIs, which offer building blocks for the management of jobs and data within such infrastructures, ease the implementation of science gateways for developers. The latter can focus on the domain-specific demands while reusing or extending available building blocks. The contributions to this special issue present the current state-of-the-art research and elucidate trends in the area of science gateways as well as demonstrate available solutions for the users. Submissions are grouped in five general areas: science gateway use and sustainability, generic development frameworks, novel workflow-oriented approaches, data management, and use cases from diverse domains. The statistics illuminate among other topics the increased usage of science gateways, which is also reflected in high number of submissions demonstrating specific use cases. Consequently, sustainability approaches have found their way into the special issue reflected not only in a submission about a model for sustainability but also in numerous submissions on developments and enhancements for generic building blocks of diverse existing mature science gateway frameworks and APIs. While novel approaches for workflow management and data management can be also considered under the enhancements for generic building blocks addressing new technologies such as mobile applications, they have already been core subjects for a couple of years and are presented in own sections emphasizing their importance for the science gateway community. The close collaboration between user communities and developers as well as providers is crucial for developing and offering effective science gateways widely used by science communities. Insights about the demands of user communities and about possibilities to support science gateway developers can immensely improve both the efficiency of creating science gateways and their long-term sustainability. Lawrence et al. 1 present results from a large-scale user survey of nearly 5000 researchers from diverse research domains and computer science departments involved in science gateway provisioning, who answered a questionnaire about demands on and existing courses for developing science gateways. Major topics include the support of user communities, aiding developers in choosing a suitable science gateway technology and involving specific expertise. The paper also notes that for the first time in the National Science Foundation (NSF's) supercomputing program, more users have accessed resources via gateways than by using the command line. The manuscript ‘Reflections on Science Gateways Sustainability Through the Business Model Canvas: Case Study of a Neuroscience Gateway’ 2 goes into detail on sustainability approaches for science gateways applying a methodology from lean business development, the Business Model Canvas. The authors adopted the model for the Amsterdam Medical Center Computational Neuroscience Gateway and suggest using the model as draft for science gateways in general for structuring various factors, which have to be considered not only by industry but also by providers of science gateways in academia. In the last 10 years, quite a few mature and reliable science gateway frameworks and APIs have evolved, which aid developers with building blocks for generic tasks such as authentication as well as job, data, and workflow management. Thus, these tasks can be efficiently implemented in diverse science gateways without the need to develop them for each science gateway from scratch. Marru et al. 3 present the science gateway API Apache Airavata and its roadmap. The open-source API offers rich features from connectors for multiple infrastructures (clusters, cloud, and grids) through workflow support, as well as data management capabilities to multi-language support. The use of Apache Airavata as middleware is described in ‘The GenApp Framework Integrated with Airavata for Managed Compute Resource Submissions’ 4. The GenApp Framework is designed for creating flexible user interfaces in general and for science gateways while considering aspects like re-submission. Thus, via the integration with Apache Airavata, the authors deliver a full science gateway framework for all layers of a science gateway—frontend, middleware, and connectors to diverse infrastructures. Cholia et al. 5 aspire to ease the development of science gateways by targeting the backend of science gateways. They demonstrate the Nice and Easy Web Toolkit (NEWT) platform, which consists of standards-based RESTful services for the application of HPC infrastructures. Once the services are integrated within High-Performance Computing (HPC) infrastructures, they allow for exploiting the resources via a common web API. NEWT has been applied at the National Energy Research Scientific Computing Center since 2010. A similar approach is followed by Caballer et al. 6. They have developed services for scientific virtual infrastructures on the cloud provisioned as Infrastructure-as-a-Service. The services allow for flexible allocation of resources with features for sharing them with other users, deploying and undeploying them and adding or removing resources dynamically. The manuscript goes into detail for three successful use cases. ‘Enabling Cloud Bursting for Life Sciences within Galaxy’ 7 targets also the provision of cloud services. The authors describe the ongoing efforts in creating a ubiquitous platform capable of simultaneously utilizing dedicated as well as on-demand cloud resources. While Galaxy is widely used especially by the life sciences community, the developed technologies are applicable for diverse research domains. The use of Galaxy as generic employable science gateway framework is also tackled in 8. The authors describe a domain-independent, cloud-based science gateway platform, the Globus Galaxies platform, which provides a set of hosted services that directly address the needs of science gateway developers and deliver a science gateway as service. Science gateways are often tailor made to provide a harness for and interface to advanced workflow tools. Two submissions presented novel developments integrated within workflow solutions. In ‘Mobile Application Development Exploiting Science Gateway Technologies’ 9, the authors present a mobile application connected to a workflow-enabled framework. Mobile devices are, of course, increasingly common and can be invaluable in some domains, for example, those requiring extensive fieldwork. This paper describes how the capabilities of mobile devices can be extended by using distributed computing infrastructures for visualization and analysis of large astrophysics datasets and also highlights areas where web-based gateways must be adapted to be mobile friendly. The interfaces will be further adapted as usage by astrophysicists increases. ‘WorkWays: Interacting with scientific workflows’ 10 also discusses how human endeavors can be assisted by workflows. In this example, human interactions with the workflow happen through a dynamic IO model where users can insert data into or export data out of a continuously running workflow dynamically. Previous Workways papers have demonstrated interactivity in the analysis of Magnetic Resonance Imaging (MRI) images and in aerospace design optimization 10. This submission includes the incorporation of Paraview Web for visualization, which is used in the analysis of the fluid flow through a ‘micromixer’. Here, the user can actually steer the computation by selecting a parameter space and having the optimization workflow focus on only the selected region. Data management is an increasingly time-consuming task for scientists and can be fraught with error. Gateways again can provide a natural interface to managing data effectively. Our first submission, ‘Remote Storage Management in Science Gateways via Data Bridging’ 11 actually connects challenges in data management with the challenges of workflows and distributed computing infrastructures described in the preceding section. A data bridging service called Data Avenue has been integrated into the WS-PGRADE/gUSE portal framework to provide a common interface to the myriad storage resources, thus streamlining the use of workflows that use many different underlying infrastructures. While not strictly a data management application, Araport 12 is an open-source, online community resource for discovery of both data and applications that support the study of the Arabidopsis thaliana genome through an app store-like approach. Users can both choose tools from Araport and contribute their own. User registrations have doubled because the launch of the Science Apps Workspace with over 30 registered app developers, so with all of this parallel effort there, will be tremendous leverage to the original investment. Five submissions addressed science gateway use cases. For the editors, the practical use of science gateways in a variety of fields is often the highlight of these workshops. ‘FACE-IT: A Science Gateway for Food Security Research’ 13 develops a framework for crop and climate impact assessments. Data are ingested from diverse geospatial archives and often require regridding and additional processing. Large-scale climate simulations, including agricultural models, are then conducted and comparisons made between regional and global models. Workflows are executed through the Globus Galaxies platform, and outputs are captured in well-defined, reusable and comparable formats. FACE-IT will be used to achieve the goals by the Agricultural Model Intercomparison and Improvement Project at the Center for Robust Decision-making on Climate and Energy Policy. Hu et al. also address agricultural issues in their submission, ‘CyberGIS-BioScope: A Cyberinfrastructure-based Spatial Decision-Making Environment for Biomass-to-Biofuel Supply Chain Optimization’ 14, although here, we are looking at agricultural production in support of biofuels. Here, although, the entire supply chain must be considered. This requires collaborative data integration, model specification, analysis, and coordinated implementation and management. As in the FACE-IT paper, preprocessing of the data is crucial for interoperability of data sources. Here, this is between bioenergy models and geographic information systems. Next, interactive scenarios are developed for evaluation and sharing, and as a final step, the optimization problem can be solved. CyberGIS-BioScope is the resulting product that accomplishes these tasks. The result is an interface that is tailored to both agricultural scientists and decision makers. In a completely different field, the IMP Science Gateway 15 is used to further virtual experimental labs and their use in multiscale courses in e-learning. IMP uses WS-PGRADE and gUSE technologies for molecular dynamics simulations of nanostructures. Workflow components are used as Lego-style construction units for learning modules of various duration and complexity. These learning modules can be used in a variety of settings—such as lifelong learning and vocational training. The final two use case submissions come from the medical field. ‘Building a Medical Research Cloud in the EASI-CLOUDS Project’ 16 describes a European research project that supports a group at Charite University Hospital with a high demand for computation to analyze MRI images. In order to meet the needs of this customer, the gateway had to integrate, monitor, and manage services all within a defined service level agreement (SLA). More on how this influenced design and their experiences meeting the SLAs. Science gateways are also used in the medical field in the area of computer-aided drug design in a process called virtual drug screening. Large amounts of often difficult to manage high-throughput computation are needed in this process. The Docking gateway 17 was developed to allow biochemists to easily conduct these screenings. In a nice development, the authors were able to reuse generic layers developed for a neuroimaging gateway. Data and computation management as well as operational support processes could all be reused. This really should become one of the hallmarks of science gateways. The submission highlights the user-centered design process, a contribution that should be of interest to many developers, and also includes a performance assessment of three different analysis approaches—a gLite grid, Hadoop (running on the Dutch Hadoop cluster), and a local cluster. The editors are once again pleased with the quality and variety of submissions to IWSG14 and GCE14 and the interest in and effort taken to submit extended papers for this special issue. We are grateful to both authors and reviewers for their invaluable help in making this possible.

Referência(s)