Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization...

by Sadaf R Alam, Jeffrey S Vetter

Publication Type

Conference Paper

Book Title

Computational Science -- ICCS 2005

Publication Date

May, 2005

Page Numbers

304 to 312

Volume

Conference Name

International Conference on Computational Science 2005

Conference Location

Atlanta, Georgia, United States of America

Conference Date

May 22, 2005 - May 25, 2005

Abstract

Cray X1 Fortran and C/C++ compilers provide a number of loop
transformations, notably vectorization and multistreaming, in order to
exploit the multistreaming processor (MSP) hardware resources and
its high memory bandwidth. A Cray X1 node is
composed of four MSPs, which in turn are composed of four single streaming
processors (SSP). Each SSP contains a superscalar processing unit and
two vector processing units. Compiler vectorization provides loop level
parallelization and uses the vector processing
hardware. Multistreaming code generation by the compiler permits
execution across the SSPs of an MSP on a block of code. In this paper,
we analyze
overall impact of loop-level compiler optimization on a scientific
application called Parallel Ocean Program (POP). POP has been
extensively optimized for X1 by instrumenting the code using X1
compiler directives. We compare and contrast automatic and manual
optimization schemes available on X1 and analyze their impact on the
code performance and scalability. Our results show that the addition
of compiler directives increases the average vector length, thereby
improving the single node performance significantly. However, this
code scales at a slower rate as the local workload volume decreases
and the communication costs increase.

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization...

Abstract

Researchers

Organizations