Abstract:
The authors consider two methods of transparent fault tolerance implementation for application servers with non-deterministic behavior and provide their comparison. The first method — (snapshot/restore) — is based on the well-known mechanism of checkpoints (snapshots), which is supplemented with logging of events happened with resources and having influence on determinism of behavior (resource histories). The behavior after failure provides recovery of application states and controlled execution, using resource histories. The second method — (lock-step) — uses only the events logging which is accompanied by the permanent controlled execution on the reserve node of the application server. The arguments in favor of “snapshot/restore” method are presented.