RUS  ENG
Full version
JOURNALS // Vestnik Yuzhno-Ural'skogo Gosudarstvennogo Universiteta. Seriya "Vychislitelnaya Matematika i Informatika" // Archive

Vestn. YuUrGU. Ser. Vych. Matem. Inform., 2021 Volume 10, Issue 1, Pages 62–74 (Mi vyurv253)

This article is cited in 1 paper

Modeling influence of monitoring system on performance of MPI collective operations

A. A. Khudoleeva, K. S. Stefanov

Lomonosov Moscow State University (GSP-1, Leninskie Gory 1, Moscow, 119991 Russia)

Abstract: Studying parallel program with the means of monitoring systems is a common practice. To collect data about application, monitoring system agent activates periodically during the run of application, occupying resources and causing perturbation. Monitoring system developers often ignore studying the problem of monitoring tools interference into application performance, this problem remains poorly examined. This article discusses ways to study influence of supercomputer monitoring system on users' applications. We suggest to use MPI collective operations as a tool to measure this influence. This method also allows to estimate influence of monitoring system noise on a synchronized application. MPI collective operations are measured in presence of injected noise generated by the program that imitates interference of monitoring tool. We estimate the noise level that each of the used collective operations is capable to sense in chosen configuration. All-to-All, All-Reduce and Barrier are used in the noise detection tool. We find parameters for All-to-All and Barrier operations to perform stably and detect low noise level.

Keywords: supercomputer, performance monitoring, monitoring system noise, parallel job slowdown, modeling influence of monitoring system.

UDC: 519.6

Received: 02.10.2020

DOI: 10.14529/cmse210105



© Steklov Math. Inst. of RAS, 2024