Abstract:
The aim of this study is to develop and validate a model for assessing the resilience of distributed systems that considers both the structural characteristics of the network and the dynamic behavior of connections. The proposed model combines graph analysis (based on the Erdős – Rényi model) and statistical modeling (Gilbert – Elliott model), integrating connectivity probabilities and successful connection metrics to evaluate network resilience. The primary objective was to create an approach capable of accurately describing real-world network processes and identifying potential points of degradation. The model was tested using a local Kubernetes cluster, where a test CRUD service was deployed under load for 24 hours. Collected metrics, including packet loss, latency, and throughput, were compared to the model's predictions. The results showed minimal deviations between theoretical predictions and empirical data, confirming the model's adequacy. The study concludes that the proposed approach can not only accurately describe current network processes but also serve as a foundation for decision-making regarding scaling and replication. The model's flexibility ensures its relevance in scenarios involving changes in network topology or connection quality, making it applicable for analyzing modern distributed systems.