Abstract:
This paper addresses fault-tolerance challenges in distributed computing environment. Increasing scalability of modern computational clusters leads to an increasing probability of an interrupt occuring. In a number of cases computational algorithms, such as genetic algorithms, Monte Carlo based algorithms, have the mathematical properties that they get the correct answer despite the occurrence of faults in the system. This paper proposes methods for implementation such class of algorithms despite software and hardware faults. Some example of monotonous reducing object is implemented using C++ template class library T-Sim. Moreover, some test realizations are implemented.
Key words and phrases:Fault-tolerance, T-Sim C++ template library, monotonous object, local synchronization.