Abstract:
A new approach to estimating the fault-tolerance of the parallel control computing systems relies on the mathematical model-based determination of the probability of successful completion in a given schedule time of an arbitrary set of interdependent jobs (tasks) with random times of job execution and asynchronous job redundancy. The estimates were determined both for the standard execution of a set of tasks and for the case of single malfunction (fault or failure) of any computing system processor detected at execution of any job from the set. The basic distinction of this approach lies in that here the numerical values of the reliability parameters (probabilities or intensities of faults or failures) of the computing resources are neither given nor used.