XSEDE13 Special Issue Conference Publications
2014; Wiley; Volume: 26; Issue: 13 Linguagem: Inglês
10.1002/cpe.3308
ISSN1532-0634
AutoresNancy Wilkins‐Diehr, A. Majumdar,
Tópico(s)Genomics and Phylogenetic Studies
ResumoThis special issue is based on presentations at the XSEDE13 conference held in San Diego, July 2013. The conference is an annual gathering of the extended community of those interested in advancing research cyberinfrastructure (CI) and digital resources. This includes researchers using XSEDE and like resources, XSEDE staff members, Campus Champions, and especially students. Seven hundred twenty people attended the conference, from 50 states, 1 territory, and 14 countries. As a summertime conference with terrific student programs from high school to post-graduate levels, we were fortunate to attract 200 students. From the Call for Participation, the purpose of the conference is to ‘showcases the discoveries, innovations, challenges and achievements of those who utilize and support XSEDE resources and services, as well as other digital resources and services throughout the world’. The conference included tutorials, birds-of-a-feather sessions, a poster session, visualization showcase, and job fair. Each conference has a unique theme. In 2013, the focus was the impact of science gateways and the relevance of computation and high-end analysis in the biosciences. The core technical program consisted of four tracks—science and engineering; technology; software and software environments; and training, education, and outreach. Selected papers from the core technical program were invited to expand content and submit to this special issue. XSEDE13's science track received 47 submissions, 27 of which were accepted for the conference. Eight of the best were asked to develop extended papers for inclusion in this special journal issue. Papers in the science track covered various research topics and all of them involve utilizing XSEDE resources in innovative ways. Research topics span wide areas such as biological sequence analysis, modeling of energy usage in buildings, analysis of error correcting codes (ECC) for simulations on GPUs, the impact of Campus Bridging for researchers transitioning to XSEDE resources, algorithms for computational finance analyses, methods enabling improved search capabilities over large digitized document archives, and the implementation of algorithms on Intel Xeon Phi coprocessors. The paper on functional annotation of newly sequenced genomes 1 describes an optimized workflow to enable large-scale protein annotation. It utilizes a special classification algorithm and high performance computing (HPC), and the demonstrated results show capabilities which scientists will be able to utilize to annotate big genome data. The Building Energy Modeling (BEM) approach is combined with machine learning methods in another paper 2 to enable efficient modeling of US buildings. The parametric space makes the use of supercomputers a necessity, and the results are utilized to train machine learning algorithms. Another paper 3 looks at ECC available on many of the modern graphics processing units (GPUs) used in HPC machines. It looks at the penalty involved in utilizing ECC and compares molecular dynamics simulation results regarding ECC events triggered during such simulations. It discusses if error checking is necessary for such simulations given the penalty associated with it. Transitioning from campus resources to XSEDE resources is not an easy task for researchers. Making Campus Bridging Work for Researchers 4 proposes utilization of Campus Bridging experts to make this transition easier while requiring minimal investment from the organizing body. The paper on computational finance 5 is looking into the impact of high frequency trading on the stock market. It discusses how the simulation, for this research, was speeded up by two orders of magnitude on XSEDE HPC resources. Biological sequence analyses 6 involve the use of a large shared memory machine and the application of various bioinformatics tools on different kinds of sequence data sets. A new parallel command execution program managed these analyses. The overall capabilities will allow tackling challenging scientific problems. In Using Lucene to Index and Search the Digitized 1940 US Census 7, a framework was implemented to provide automatic searchable access to census data, enabling search capabilities over large digitized document archives. Successes and challenges are described in this paper. Partial Correlation Coefficient with Information Theory (PCIT) method, which is an important technique for detecting interactions between networks, is implemented on the Xeon processors and Xeon Phi coprocessors as a part of another research effort. The paper 8 shows the optimized performance and analyzes the results. XSEDE13's technology track received 26 submissions 16 of which were accepted for the conference. Four of the best were asked to develop extended papers for inclusion in this special issue. A range of important topics for high end CI were highlighted in this track, from sophisticated analysis of machine statistics (both for storage and HPC machines) to federated authentication to the technology needed to operate a network of 14 earthquake engineering labs. The XDMoD (XSEDE Metrics on Demand) project 9 provides detailed information on computing system utilization and performance on all XSEDE-allocated systems. Data are collected at the job level, application level, user level, and system level and presented through a very usable, customizable interface. A new addition to XDMoD described in this paper is the addition of ‘TACC_Stats’, which collects comprehensive resource use metrics on all compute nodes and additional data mapping and analysis tools. This node-level data can reveal some interesting performance characteristics not captured in job-level statistics. Work at the National Center for Atmospheric Research (NCAR) highlights not the more common compute system monitoring but the accounting system for NCAR's archival storage system and its recent implementation on a new 11 PB disk system 10. The growth of data holdings among users and very large systems at data centers makes analysis and monitoring an increasingly critical part of center operations. The accounting systems provide needed information that helps with system management and guides policies. As CI systems continue to expand and grow, researchers find they often have to make use of multiple systems. With federated authentication, researchers can use existing identities, for example, those at their university, Google or Facebook to obtain XSEDE certificates. CILogon 11 provides a federated certification authority, serving a national-scale user community without a large network of authorities performing manual user identification. The paper in this journal describes 3 years of experiences operating CILogon. A final paper describes technology serving a research community using a network of 14 shared-use earthquake engineering laboratories. NEES, the Network for Earthquake Engineering Simulation 12, connects these labs and a central data repository through the NEEShub science gateway. Researchers access computing systems at XSEDE, the Open Science Grid, Purdue supercomputers and NEEShub servers through this interface as well. The software track covers many aspects of software—from science gateway development to code optimization on XSEDE resources. XSEDE13's software track received 27 submissions 16 of which were accepted for the conference. Four of the best were asked to develop extended papers for inclusion in this special issue. Several papers in this section highlight data analysis solutions, all of which run on XSEDE resources. FluMapper 13 exploits the geospatial characteristics of social media data. By analyzing location-based Twitter data, FluMapper demonstrates how geographic information systems (GIS) and CI combine (in a concept called CyberGIS) through a data-driven framework to understand the spread of the flu. Challenges here include the huge data volumes, dynamic generation, and unstructured nature of these data. Globus Genomics 14 also analyzes large volumes of data, in this case genomic data from next-generation sequencers. This unique environment supports a full analysis pipeline, from data acquisition, to on-demand computing including the reuse of the Galaxy 15-17 workflow systems. Researchers need only a web browser to make use of this system. The Ultrascan science gateway 18 provides remote access to high speed ultracentrifuges and their associated analysis software and has been supporting analytical ultracentrifugation (AUC) experiments since 2006. Developments described in this paper include a standardized job management system and production deployment in Europe including on the European Grid Infrastructure (EGI) and the Partnership for Advanced Computing in Europe (PRACE) infrastructures using the Apache Airavata framework. The final contribution from the software track describes software infrastructure for a non domain-specific set of resources, while addressing the expectations of researchers in today's world. Mobile devices are in wide use today. Researchers using supercomputers expect mobile interfaces as well. The TACC user portal 19 is an interface to local, state, and national resources housed at the Texas Advanced Computing Center (TACC). This submission describes its recent redesign and inclusion of mobile device support. Training, education, and outreach are a core part of XSEDE's mission. As a summer conference, 200 out of 720 conference attendees were students. XSEDE13's training, education, and outreach track received 23 submissions, 15 of which were accepted for the conference. Four of the best were asked to develop extended papers for inclusion in this special issue. One challenge in developing the next generation of computational scientists is fostering early and continued interest. Papers in this section focused on both instructors and students. A yearlong game design project involving high school students 20 covered topics including bioinformatics, parallel programming, and the collaborative nature of today's CI-based research. The success of this approach demonstrated the utility of games in computing and CI education, with this particular implementation serving as a model. INSTANCES, Incorporating Computational Scientific Thinking Advances into Education & Science Courses 21, is an NSF-funded program that introduces computational thinking into the science education curriculum by developing modules for use in the university-level classes taken by teachers. Mathematics, programming, algorithmic thinking, and computational accuracy are all included. At the college level, several universities present shared experiences developing computational science programs at the undergraduate level 22. There are significant challenges to expanding existing curricula to include the requirements necessary for a computational science focus, but the authors point out there is a pressing need for a workforce trained in this area. The shared experiences presented by Clark Atlanta University, University of Mary Washington and Southern University may help others facing similar challenges. XSEDE as an organization is interested in broadening both the use and the impact of its resources. Its regional workshop series, held at Minority Serving Institution campuses and aimed broadly at researchers, instructors, and students, is part of this strategy. The first of the larger workshops and its impact on the execution and evaluation of subsequent events 23 is described in this final paper. We would like to acknowledge the XSEDE13 organizing committees, who worked tirelessly to put together a tremendous event. Particular recognition goes to the technical program committee, whose influence is evident in this special journal issue.
Referência(s)