Terry R Jones
I’m a scientist who’s found a love of working on powerful computing systems, particularly the kind which use multiple computers to tackle massive problems by working in parallel. I started my career as a computer scientist worrying about time-critical realtime programming in the aerospace industry. In the 90’s, I moved into doing programming environments at a DOE national laboratory in California, Lawrence Livermore National Laboratory (LLNL), where I worked with various domain scientists with specialties like physics, chemistry, and mathematics. Over time, I drifted into working on runtime systems and operating systems for large parallel computers. In 2008, I joined Oak Ridge National Laboratory (ORNL) to work on system software for supercomputers. I’m probably most known for technical contributions in middleware and system software, and lately I’ve been applying my knowledge on distributed clock synchronization into new areas like our nation’s power grid. While I’ve kept my love for the technical aspects of the work, I’ve gradually assumed larger roles in conceiving and pitching new ideas, forming effective R&D teams, managing projects with distributed teams, and making sure our projects deliver milestones on time and on budget.
Areas of expertise: High performance parallel computing, operating systems, parallel middleware & runtime systems, distributed clock synchronization, parallel file systems, scheduling resources and coordination, performance analysis & optimization tools.
In a parallel computing environment comprising a network of SMP nodes each having at least one processor, this patent presents a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.