A Case Study of MPI Over Long Distance Connections

by Nageswara S Rao, Neena Imam, Swen Boehm

Publication Type

Conference Paper

Book Title

Proceedings of SYSCON2019

Publication Date

April, 2019

Page Numbers

1 to 4

Conference Name

13th Annual IEEE International Systems Conference (SysCon 2019)

Conference Location

Orlando, Florida, United States of America

Conference Sponsor

IEEE

Conference Date

Apr 8, 2019 - Apr 11, 2019

Abstract

Scientific workflows are increasingly being distributed across wide-area networks, and their code executions are expected to span across geographically dispersed computing systems. MPI has been extensively used to support communications for distributed computations, typically, over compute clusters and high-performance systems within a single facility. We present a case study of performance of MPI basic operations over long distance connections, wherein TCP is used for the underlying transport. We present measurements of execution times of MPI codes that utilize MPI Sendrecv operations over emulated 10Gbps connections with 0-366ms round-trip times, including the longest one spanning the globe. They demonstrate that basic MPI codes can be sustained over long distance connections under external packet loss rates up to 10%. They also highlight the qualitative effects of losses which manifest as increased execution times as a consequence of TCP’s loss recovery process.

A Case Study of MPI Over Long Distance Connections

Abstract

Researchers

Organizations