COLLABORATE
Developing a Strongly Consistent, Long-Lived, Fault-Tolerant, Distributed Storage System with a Failure Prediction Mechanism
Strong Consistency (Atomicity)
Despite the existence of concurrent operations, asynchrony, and node failures, our goal is to design algorithms for read/write objects that guarantee that each read operation returns a value no older than the value written by its latest preceding write and no older than the one returned by any preceding read. Such consistency guarantee is known as Atomicity. Atomicity is the most natural consistency guarantee as it provides the illusion of a centralized, sequentially accessed storage.
Fault Tolerance
The service will allow the termination of read/write operations, despite the existence of transient or persistent failures of data hosts in the system. In this project, we focus on crash failures.
Long Liveness
To ensure that persistent faults will not affect the operation of the service in the future, the service will implement mechanisms to remove faulty data hosts, insert new healthy alternatives, and migrate the data for a seamless uninterrupted experience to the clients. Such mechanisms are known as reconfigurations since they result in updating the membership of the host nodes.
Failure Prediction
It is one thing to reconfigure and another to know when to reconfigure. The last characteristic of the service is to implement Machine Learning algorithms in order to predict when soon to fail storage devices. This will allow determining which hosts will become unavailable and thus how the service needs to reconfigure to maintain functionality.
Minimum Viable Prototype
Essentially we would like to devise an efficient prototype of an atomic, distributed storage system, by combining the following key services:
- Distributed Object Management,
- Data Fragmentation,
- Object Reconfiguration, and
- Failure Prediction