Unclean Leader Election & Availability vs Durability Trade-Off
The Scenario: Leader Broker Failure and Data Loss Risk#
Consider a situation where you have a topic partition P2
with replicas on Broker 1
, Broker 2
, and Broker 3
. Suppose Broker 3
holds the leader for P2
, and Broker 1
and Broker 2
hold its followers. Initially, all three are in-sync (ISR: 3, 2, 1).
New Leader Election
If the leader broker (Broker 3
) goes down, a new leader must be chosen from the remaining in-sync replicas. For example, Broker 2
might be elected as the new leader. At this point, the ISR would only include the active, in-sync replicas (e.g., ISR: 2, 1), as the original leader's broker is down.
The Critical Data Loss Scenario
Now, let's say the new leader (Broker 2
) receives Message 3
from a producer, but before Message 3
can be fully replicated to the remaining follower (Broker 1
), Broker 2
also goes down.
At this point:
Message 3
was written to Broker 2
's partition.
Broker 2
is now down, making Message 3
inaccessible via its original location.
Broker 1
is still up, but it only has Message 1
and Message 2
, not Message 3
because replication was incomplete.
Result: There is currently no in-sync replica that holds all the latest data, including Message 3
. This is where the trade-off comes in.
The Trade-Off: Availability vs. Durability#
When there are no in-sync replicas available for a partition, Kafka faces a critical decision:
Option 1: Prioritize Durability (No Data Loss)
Action: The partition P2
becomes unavailable, and producers cannot publish new messages to it.
Outcome: Data loss is prevented because the system waits for the original leader or a lagging replica to come back online and fully synchronize before accepting new writes.
System State: The system is highly durable (reliable) but potentially less available.
Option 2: Prioritize Availability (Potential Data Loss)
Action: A partition that is not fully in-sync (e.g., Broker 1
's partition P2
, which is missing Message 3
) is chosen as the new leader. Producers can then immediately start publishing new messages (Message 4
, Message 5
, etc.) to this new leader.
Outcome: Message 3
is permanently lost because it was only present on the now-down Broker 2
and never fully replicated to Broker 1
.
System State: The system is highly available (continuous operation) but potentially less durable due to data loss.
This choice is controlled by a Kafka configuration property: unclean.leader.election.enable
.
unclean.leader.election.enable Configuration#
This crucial Kafka property determines which of the two options (durability or availability) Kafka will prioritize during a leader failure when no in-sync replicas are available.
- unclean.leader.election.enable = true
Behavior: Allows Kafka to elect a replica that is not in-sync (i.e., it's "unclean" because it doesn't have all the latest data) as the new leader.
Benefit: Keeps the system highly available. Producers can continue writing immediately.
Risk: There might be some amount of data loss (as seen with Message 3
in our example).
Use Cases: Suitable for scenarios where a small amount of data loss is acceptable, such as: Log aggregation Metrics calculation
- unclean.leader.election.enable = false
Behavior: Prevents a replica that is not in-sync from becoming the leader. The system will wait until an in-sync replica becomes available or the original leader recovers and synchronizes.
Benefit: Ensures the system is highly durable, meaning no data will be lost.
Risk: The partition will be unavailable for a period, blocking producers from publishing messages until an in-sync leader is established.
Use Cases: Essential for scenarios where data loss is absolutely unacceptable, such as: Transaction-related information in banking or financial sectors Any system where monetary transactions are involved