Restoring Consistency after Network Partitions

The western world has already left the stage of one computer in every home. Nowadays, computers are literally everywhere, and we use them daily for chatting, watching movies, booking tickets and so on. Each of these activities use some service that is accessible through the Internet. So in effect, we have made ourselves dependent on these computer systems being available.

As systems increase in size and complexity so does the risk that some part will fail. Unfortunately, it has proven hard to tackle faults in distributed systems without a rigorous approach. Therefore, it is crucial that the scientific community can provide answers to how distributed computer systems can continue functioning despite faults. Our contribution in this thesis is regarding a special class of faults which occurs when network links fail in such a way that parts of the network become isolated, such faults are termed network partitions. We consider the problem of how systems that have integrity constraints on data can continue operating in presence of a network partition…

Contents

1 Introduction
1.1 Motivation
1.2 Problem Formulation
1.3 Contribution
1.4 Publications
1.5 Outline
2 Background
2.1 Dependability
2.1.1 Measuring Availability
2.1.2 Dependability threats
2.1.3 Dealing with faults
2.2 Fault Tolerance in Distributed Systems
2.2.1 Fault models
2.2.2 Timing models
2.2.3 Consensus
2.2.4 Failure Detectors
2.2.5 Group communication and group membership
2.2.6 Fault-tolerant middleware
2.3 Consistency
2.3.1 Replica consistency
2.3.2 Ordering constraints
2.3.3 Integrity constraints
2.4 Partition tolerance
2.4.1 Limiting Inconsistency
2.4.2 State and operation-based reconciliation
2.4.3 Operation Replay Ordering
2.4.4 Partition-tolerant Middleware Systems
2.4.5 Databases and File Systems
3 Overview and System Model
3.1 Overview
3.2 Terminology
3.2.1 Objects
3.2.2 Order
3.2.3 Consistency
3.2.4 Utility
3.2.5 Processes
3.3 Fault and Timing Model
3.4 Operation Ordering
3.5 Integrity Constraints
3.6 Notation Summary
4 Reconciliation Algorithms
4.1 Help Functions
4.2 Choose states
4.3 Merging operation sequences
4.4 Greatest Expected Utility
4.5 Continuous Service
4.5.1 Reconciliation Manager
4.5.2 The Continuous Server Process
4.6 Additional Notes
5 Correctness
5.1 StopTheWorld-Merge
5.2 Assumptions
5.2.1 Notation
5.2.2 Some basic properties
5.2.3 Termination
5.2.4 Correctness
5.3 StopTheWorld-GEU
5.3.1 Basic properties
5.3.2 Termination
5.3.3 Correctness
5.4 Continuous Service Protocol
5.4.1 Assumptions
5.4.2 Termination
5.4.3 Correctness
6 CS Implementation
6.1 DeDiSys
6.2 Overview
6.3 Group Membership and Group Communication
6.4 Replication Support
6.4.1 Replication Protocols
6.4.2 Replication Protocol Implementation
6.5 Constraint Consistency Manager
6.6 Ordering
6.7 Sandbox
6.8 Test application
7 Evaluation
7.1 Performance metrics
7.1.1 Time-based metrics
7.1.2 Operation-based metrics
7.2 Evaluation of Reconciliation Approaches
7.3 Simulation-based Evaluation of CS
7.3.1 Simulation setup
7.3.2 Results
7.4 CORBA-based Evaluation of CS
7.4.1 Experimental Setup
7.4.2 Results
8 Conclusions and Future Work
8.1 Conclusions
8.1.1 Optimistic Replication with Integrity Constraints
8.1.2 State vs Operation-based Reconciliation
8.1.3 Optimising Reconciliation
8.1.4 Continuous Service
8.2 Future work
8.2.1 Algorithm Improvements
8.2.2 Mobility and Scale
8.2.3 Overloads

Author: Asplund, Mikael

Source: Linköping University

Download URL 2: Visit Now

Leave a Comment