Understanding Paxos: A Practical Guide
·3 min read
Introduction
Paxos is one of the most influential algorithms in distributed systems, yet it's notoriously difficult to understand. After implementing Multi-Paxos for my distributed datastore project, I want to share some insights that helped me grasp this powerful algorithm.
The Problem: Consensus
In distributed systems, we often need multiple nodes to agree on a single value. This is harder than it sounds because:
- Network Failures: Messages can be lost, delayed, or duplicated
- Node Failures: Any node can crash at any time
- Asynchrony: We can't assume anything about timing
Basic Paxos Roles
Paxos defines three roles (a single node can play multiple roles):
Proposers
- Propose values to be chosen
- Drive the protocol forward
- Handle conflicts between competing proposals
Acceptors
- Vote on proposals
- Form a quorum to accept values
- Provide durability through persistence
Learners
- Learn the chosen value
- Can be any node that needs to know the result
The Two Phases
Phase 1: Prepare
Proposer Acceptors
| |
|--- Prepare(n) ----------->|
| |
|<-- Promise(n, v?) --------|
| |
- Proposer picks a proposal number
n - Sends
Prepare(n)to a majority of acceptors - Each acceptor promises not to accept proposals <
n - Acceptors reply with any value they've already accepted
Phase 2: Accept
Proposer Acceptors
| |
|--- Accept(n, v) --------->|
| |
|<-- Accepted(n, v) --------|
| |
- If proposer received promises from a majority, it sends
Accept(n, v) - Value
vis either the highest-numbered accepted value from Phase 1, or proposer's own value - Acceptors accept if they haven't promised to a higher proposal
Key Insight: Why It Works
The beauty of Paxos lies in how it handles competing proposals:
- Proposal numbers create a total ordering
- A proposer must learn about previous accepted values before proposing
- This ensures once a value is chosen, no other value can be chosen
Multi-Paxos Optimization
Basic Paxos requires two round-trips per value. Multi-Paxos optimizes this:
- Elect a stable leader
- Leader skips Phase 1 for subsequent values
- Only need one round-trip in the common case
Lessons from Implementation
- Handle edge cases carefully: Especially around leadership changes
- Persistence matters: Acceptors must persist their state before responding
- Timeouts are tricky: Too short causes thrashing, too long reduces availability
- Testing is essential: Fault injection helped find many bugs
Conclusion
Paxos is challenging but elegant. Understanding it opens doors to understanding Raft, Zab, and other consensus protocols that power modern distributed databases.
Further Reading
- Paxos Made Simple by Leslie Lamport
- Paxos Made Live by Google