2017 Smoky Mountains Conference Sets Record, Sights on Big Data
Thought leaders from across the high-performance computing (HPC) spectrum once again descended on Gatlinburg, Tennessee, for the 2017 Smoky Mountains Computational Sciences and Engineering Conference (SMC).
The conference, which took place from August 29–31 and was hosted by Oak Ridge National Laboratory (ORNL), explored “the integration of experiment, data analytics, and modeling and simulation into instruments for discoveries in science and engineering.”
HPC and big data have greatly increased the potential for scientific discovery, but they have also created immense challenges that require cooperation and collaboration among the scientific and industrial communities.
SMC is unique in its approach to these challenges by bringing together the four primary HPC stakeholders, namely national laboratories, academia, vendors, and industry, to apply the discovery paradigms of big data, deep learning, modeling and simulation, and experiment to some of science’s most pressing problems.
“Smoky Mountains looks at big problems from a diverse set of perspectives,” said ORNL’s Associate Laboratory Director for Computing and Computational Sciences and conference organizer Jeff Nichols. “It’s unlike any other HPC conference out there, and it gets bigger and better every year.”
In fact, 2017’s more than 150 attendees set an SMC record.
The conference’s keynote address was given by Two Sigma Vice President Donour Sizemore. Other speakers included Peter Ungaro from Cray, Inc.; Jim Sexton of IBM; Peter Highnam of the National Geospatial-Intelligence Agency; Chakra Chennubhotla of the University of Pittsburgh; and Noa Marom of Carnegie Mellon University.
“The Smoky Mountains Conference brings together an impressive collection of leaders representing the computational science research community along with experts on the underlying technologies,” said Greg Peterson, Director of the University of Tennessee’s National Institute for Computational Sciences. “The program's presentations are thought-provoking, and the informal discussions during the breaks and meals always prove to be invaluable. In short, it's a great venue to meet and reconnect with the HPC community.”
While 2017 marked the SMC’s 15th consecutive year, it played host to the first ever Data Challenge, an event aimed at students, faculty, laboratory scientists, and industry professionals interested in performing novel analysis on real scientific data sets.
The inclusion of a Data Challenge parallels perfectly with the laboratory’s growing expertise in the deep learning, data analytics, and visualization.
While ORNL is widely known for its headline-grabbing hardware such as Titan, the nation’s fastest computer for open science, the laboratory also has a rich history of groundbreaking research in data science. Much of that research has been geared toward classified applications in defense and national security, where it has been tried and tested, and it is now being harnessed to tackle some of science’s most pressing problems.
The laboratory’s data portfolio has also been buttressed by a partnership with the University of Tennessee’s Bredesen Center on a joint Ph.D. program in data sciences and engineering. This unique program, with a curriculum focused on specific science domains, addresses the need for data scientists to make sense of and analyze the massive scientific data sets being generated via today’s most complex experiments and simulations, such as those taking place at ORNL.
The Data Challenge featured teams of one to four people divided into “student/novice” and “non-student/advanced” categories. Each challenge included a data set donated by ORNL researchers, known as “data sponsors,” and a set of progressively harder questions related to the donated data sets given to the teams in advance of the conference. Of the 41 teams registered to compete, 11 teams reached the finals.
Contestants analyzed data from one of the following five ORNL research projects:
· Uncovering Explanatory Power of Large-scale Data Expression
· Permutations for Model Selection of Genetic Regulatory Pathways
· Data Mining Atomically Resolved Images for Material Properties
· Automated Discovery of Temperature-Dependent Structural Change
· Scientific Publication Mining
Winners from the novice and advanced categories, both who happened to be solo competitors, were invited to SMC17 to present their work.
Sebastian Klaasen of the University of Vienna won the student/novice category with a presentation of a complete code and algorithm for “data mining atomically resolved images for material properties,” and ORNL postdoc Travis Johnston took home the non-student/advanced category with a unique approach and illuminating results for the “automated discovery of temperature-dependent structural change,” a data set completely outside of his science domain.
Honorable mentions included Team Starving Interns, the summer students of Arvind Ramanathan of ORNL’s Computational Science and Engineering (CSE) Division, who received praise for the originality of their work, which used methods well beyond current standards. Team A&M, the summer students of Gina Tourassi and Folami Alamudun, both of CSE as well, was recognized for “best data story” due to their achieving correct results despite very little programming experience.
“Data challenges are an exciting way to learn about other science domains or gain knowledge of analysis techniques,” said Oak Ridge Leadership Computing Facility User Support Specialist Suzanne Parete-Koon who, alongside Jayson Hines, project manager for ORNL’s Computing and Computational Sciences directorate and ORNL Computer Scientist Tiffany Mintz, helped organize the challenge. “We really enjoyed engaging with the students and researchers who took up SMC challenges and provided the data.”
Added Mintz: “The data challenge was an excellent opportunity for us to engage research scientists across the laboratory and highlight the ever-expanding capabilities in data analytics and machine learning enabled by advanced computing technologies. The enthusiasm of our data sponsors and the creativity and innovation of the work submitted by the data challenge participants made this a memorable and worthwhile experience.”
Data sponsors included ORNL’s Ryan McCormick, Sandra Truong, and Daniel Jacobson (Uncovering Explanatory Power of Large-scale Data Expression and Permutations for Model Selection of Genetic Regulatory Pathways); Alex Belianinov, Sergei Kalinin, and Stephen Jesse (Data Mining Atomically Resolved Images for Material Properties); Garrett Granroth, Thomas Proffen, and Peter Peterson (Automated Discovery of Temperature-Dependent Structural Change); and Drahomira Herrmannova and Robert Patton (Scientific Publication Mining).
“Perhaps the most exciting aspect for us as judges was to see the quality of the student submissions,” said data sponsors Alex Belianinov and Stephen Jesse. “They often rivaled those submitted by experts, and their creativity in approaching and solving the problem set was top-notch. During our deliberations to pick winners, the judges unanimously agreed to expand the award criteria to include a number of honorable mentions, as even though some teams were less competitive than others, every team showcased absolute brilliance in some aspect of the task they were working on.”
To watch the summaries and solutions for all 11 finalists, visit https://www.youtube.com/channel/UCwoC2F6mPD5ssnItCiPd69w.