Abstract:
The goal of this study is to develop and verify a monitoring model for reliability and availability in distributed systems, built on probabilistic component characteristics and accounting for dependent failures. Modern distributed systems require accurate failure prediction methods that can account for complex dependencies between nodes and support reliable performance under high loads. Traditional approaches based on empirical data analysis often fall short in predicting system states under changing loads, which limits their applicability. In this research, the developed probabilistic model underwent verification using numerical simulation and accuracy assessment through Kullback–Leibler divergence and mean squared error (MSE), confirming its accuracy and practical value. The model's versatility was proven experimentally, demonstrating its ability to adapt to various types of distributed systems while providing precise real-time predictions of availability and resilience. Numerical experiments showed that the proposed model can be a reliable tool for managing fault tolerance and load balancing. Thus, the developed model is an effective solution for enhancing the reliability of distributed systems, exhibiting a high degree of versatility and making it valuable for a wide range of applications.