GCE15 Special Issue Conference Publications
2015; Wiley; Volume: 28; Issue: 7 Linguagem: Inglês
10.1002/cpe.3743
ISSN1532-0634
AutoresSílvia D. Olabarriaga, Nancy Wilkins‐Diehr,
Tópico(s)Distributed and Parallel Computing Systems
ResumoGateway workshops have been held on multiple continents in 2015, with the International Workshop on Science Gateways held in Budapest in June 2015 and the recent Australasian installation in October. This special issue represents contributions from the tenth Gateway Computing Environments workshop, GCE15, held in Boulder, CO September 30–October 1, 2015. The workshop was held in conjunction with the 3rd Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). GCE15 attracted 40 attendees, including international researchers from academia and industry, program officers, gateway developers, and students and faculty from underrepresented communities. Science gateways are web portals developed by and for communities of practice. They provide access to all types of resources – remote instruments, curated data collections, supercomputers, and many others. They are used by scientists, engineers, researchers, and educators in many fields. The complex, digital, collaborative nature of research has resulted in accelerated growth of gateways. The purpose of science gateway workshops is to provide a forum to showcase science gateway projects and related technologies. The workshops feature keynote talks, contributed talks, and posters. This year attendees were able to set a portion of the agenda interactively through the use of Open Space Technology. The workshops also provide opportunities to publish, thereby furthering the exchange of knowledge in the gateway community. At the workshops, developers can learn from one another and learn about new technologies. Principal investigators can keep abreast of the state of the field. This special issue features contributions in the areas of technologies for building and operating gateways as well as experience papers describing fully functional gateways. Several submissions highlighted tools for building and operating science gateways. Apache Airavata is a framework for managing jobs and workflows on distributed computing resources – supercomputers, campus resources, international grids, and so on. Airavata can also include a graphical user interface and web service APIs. Pankaj et al. 1 highlight work integrating Airavata with Docker 2, Marathon 3, and Mesos 4. Docker is a container that allows applications to run in an isolated environment, but on shared hardware and is an example of a tool used to create virtual machines. Marathon is a control system for launching and controlling containers. It can provide fault tolerance and growth in services scaled with demand. Mesos provides more fine-grained scheduling across what may be diverse machines in a datacenter. So, the integration of Airavata with Docker, Marathon, and Mesos improves fault tolerance and streamlines the use of diverse resources by Airavata workflows. Of great interest to other gateway developers is the authors' experiences both using open source technologies to manage Airavata and integrating Airavata with existing open source packages. Brookes et al. 5 describe work with Google Summer of Code students who contributed to both GenApp and Apache Airavata. While Airavata was as described earlier, GenApp is a tool for generating graphical user interfaces to scientific applications. GenApp supports a variety of target languages and was developed when a lab faced the challenge of a rapidly changing hardware infrastructure and the lack of a dedicated software team. The 2014 Google Summer of Code students developed GenApp execution models in HTML5/PHP, Qt3/C++, and Qt4/C++. This year's students extended the work to include Qt5/C++, Qt5/Android, and Java. They also updated API integration between Airavata and GenApp. While GenApp was initially developed for use by the small angle scattering field, its use is not restricted to that discipline. Portability of a gateway across languages and devices, with a limited software development team, is an issue facing many gateway developers. Finally, in their paper describing failure analysis and prediction in the CIPRES science gateway, Singh et al. 6 describe their methodology in effectively operating a gateway with 3000 active users. While use of an increasing number of computational resources is needed to meet user demand, this can also affect the reliability of services delivered to the users. This work describes analysis of historical job data using a machine learning algorithm to predict success. The authors were able to detect 50% of job failures, with false detection of only 5%. Anticipating these failures would have resulted in a savings of 900,000 CPU hours. The authors classified errors into system errors, user errors, portal errors, or capacity errors, and some assumptions can be made about each, for example, system errors or portal errors may be expected to affect the next several jobs, whereas user errors may be more isolated. These sorts of tests are clearly generalizable to other gateways and the authors plan to address this in future work. Whereas research and development of new science gateways technologies continues, fully functional science gateways are created and offered to large numbers of users from various scientific domains. Four papers in this special issue describe how existing science gateways are being created and evolving for communities as diverse as materials science, environmental management, risk communication and hydrometeorological research. The Materials Project (MP) 7 offers an open, collaborative, and data-rich ecosystem through a science gateway for accelerated materials design. The MP science gateway is visited by over 13,000 users in academia, government and industry, who can explore materials data through applications. Cholia et al. 8 present how the MP science gateway has been extended to enable user contributions both as data and applications. The proposed framework – MPContrib – features a text format and associated parsing tools for flexibly incorporating user-defined data; a RESTful API to receive and update records in this text format to a back-end database, and a display framework for showing the data in the context of MP core databases. In addition, a code base – MPContribsUser – is available to facilitate the development and submission of contributions although reusable functions and examples. Examples illustrate how the framework was used in X-ray spectroscopy and nanoporous exploration, concluding that the approach is valuable to create a common platform to host a rich set of applications and datasets for different user communities. Romosan et al. 9 describe a gateway for environmental management in the scope of the Advanced Simulation Capability for Environmental Management (ASCEM) project 10, 11. The gateway is built on top of established web service technologies as a data management layer that handles complex spatiotemporal data and provides various data access mechanisms and visualization of spatiotemporal data records. The data gateway also integrates seamlessly with the Akuna distributed workflow system 12. The main distinguishing feature of this gateway respectively to others in this field is the sophisticated spatiotemporal model. ASCEM project scientists have been using this data gateway since 2011. The science gateway described by Kar 13 covers a different field, namely, ‘risk communication’, which is the exchange of information among stakeholders about an impending disaster and its risks to help individuals take appropriate actions to mitigate hazard impacts. The CIGIR Gateway – CyberInfrastructure for GeoInformatics and Community Resilience – is a contributory citizen science project, with the goal of increasing public participation in emergency management, specifically, risk communication and community resilience building efforts of the Mississippi Gulf Coast residents. CIGIR aims at reducing rumors, as an alternative to social media, providing means for citizens to share data and information about a hazard, and participate in building community resilience. The paper describes the gateway's main functions: data collection and storage, data processing and query, data visualization, data and tools download, and user access. It concludes with a reflection about future challenges, for example, translating the gateway from proprietary to open software. The last paper in this special issue, by D'Agostino et al. 14 covers yet another field: hydrometeorology. They describe experiences with the development and maintenance of the gateway for the Distributed Research Infrastructure for Hydro-Meteorology project (DRIHM) 15, which is based on gUSE/WS-PGRADE science gateway toolkit 16. Besides presenting an overview of the gateway, the paper also discusses lessons learned from implementation. In conclusion, the authors highlight the need of coherent policies in the management of data, computational resources, and software components that represent the ecosystem to develop science gateways. The editors are grateful to both authors and reviewers for the GCE15 workshop for their invaluable help in making this issue possible.
Referência(s)