RUS  ENG
Full version
JOURNALS // Numerical methods and programming // Archive

Num. Meth. Prog., 2021 Volume 22, Issue 1, Pages 14–28 (Mi vmp1024)

Parallel software tools and technologies

Developing a model for holistic workload analysis of large supercomputer systems

P. A. Shvets, Vad. V. Voevodin, S. A. Zhumatii

Lomonosov Moscow State University, Research Computing Center

Abstract: Any modern supercomputer has an extremely complex architecture, and efficient usage of its resources is often a very difficult task, even for experienced users. At the same time, the field of high-performance computing is becoming more and more in demand, so the issue of efficient utilization of supercomputers is very urgent. Therefore, users should know everything important about performance of their jobs running on a supercomputer in order to be able to optimize them, and administrators should be able to monitor and analyze all the nuances of the efficient functioning of such systems. However, there is currently no complete understanding of what data are best to be studied (and how it should be analyzed) in order to have a whole picture of the state of the supercomputer and the processes taking place there. In this paper, we make our first attempt to answer this question. To do this, we are developing a model that describes all the potential factors that may be important when analyzing the performance of supercomputer applications and the HPC system as a whole. The paper provides both a detailed description of this model for users and administrators and some interesting real-life examples discovered on the Lomonosov-2 supercomputer using a software implementation based on the proposed model.

Keywords: high-performance computing; supercomputer; workload analysis; application performance; model development.

Received: 22.12.2020

DOI: 10.26089/NumMet.v22r102



© Steklov Math. Inst. of RAS, 2024