Christian Engelmann Senior Scientist and Group Leader, Intelligent Systems and Facilities Research Contact 865.574.3132 | ENGELMANNC@ORNL.GOV All Publications Privacy-Preserving Federated Learning for Science: Challenges and Research Directions Shaping the Future of Self-Driving Autonomous Laboratories Workshop... A Microservices Architecture Toolkit for Interconnected Science Ecosystems Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study Science Use Case Design Patterns for Autonomous Experiments INTERSECT Architecture Specification: Use Case Design Patterns (Version 0.9) INTERSECT Architecture Specification: Microservice Architecture (Version 0.9)... Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale (Version 2.0) INTERSECT Architecture Specification: Microservice Architecture (Version 0.5) INTERSECT Architecture Specification: Use Case Design Patterns (Version 0.5) RDPM: An Extensible Tool for Resilience Design Patterns Modelling Resiliency in numerical algorithm design for extreme scale simulations Study of Interconnect Errors, Network Congestion, and Applications Characteristics for Throttle Prediction on a Large Scale HPC System PLEXUS: A Pattern-Oriented Runtime System Architecture for Resilient Extreme-Scale High-Performance Computing Systems GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability Models for Resilience Design Patterns 3D Coded SUMMA: Communication-Efficient and Robust Parallel Matrix Multiplication Self-stabilizing Connected Components Concepts for OpenMP Target Offload Resilience Performance Efficient Multiresilience Using Checkpoint Recovery In Iterative Algorithms A Comprehensive Informative Metric for Analyzing HPC System Status Using the LogSCAN Platform Analyzing the Impact of System Reliability Events on Applications In the Titan Supercomputer A Big Data Analytics Framework for HPC Log Data: Three Case Studies Using the Titan Supercomputer Log Machine Learning Models for GPU Error Prediction in a Large Scale HPC System Understanding and Analyzing Interconnect Errors and Network Congestion on a Large Scale HPC System Pagination Current page 1 Page 2 Page 3 … Next page ›› Last page Last » Key Links Curriculum Vitae Google Scholar Web of Science ORCID LinkedIn IMPACT@ORNL GitHub Researcher Website INTERSECT Initiative Organizations Computing and Computational Sciences Directorate Computer Science and Mathematics Division Advanced Computing Systems Research Section Intelligent Systems and Facilities Group
Research Highlight PLEXUS: A Pattern-Oriented Runtime System Architecture for Resilient Extreme-Scale High-Performance Computing Systems