What’s: Redundancy and Replication
Redundancy involves duplicating critical data or services to improve a system's reliability. For example, if a customer database is stored on a single server, losing that server would result in losing access to the entire database. To prevent such a scenario, redundant copies of the database can be stored across multiple servers to ensure data availability even during server failures.
The same approach applies to services. Suppose an application relies on a payment processing service; running multiple instances of this service ensures that if one instance encounters an issue, the system can still process payments using another instance.
Redundancy helps eliminate single points of failure and provides contingency options during emergencies. For example, in a cloud-based application with two active web servers, if one crashes or becomes unresponsive, traffic can be automatically redirected to the other server. These transitions, known as failovers, can occur without human intervention or be handled manually if required.
Another key element of service redundancy is designing systems with a shared-nothing architecture, where each node functions independently. For instance, a distributed file storage system might assign files across multiple servers, with each server capable of handling its own tasks. This allows new servers to be added effortlessly while making the system more robust against individual node failures.