Skip to content

Overview

Apache Kafka is a distributed streaming platform that was developed by LinkedIn in 2010 and later donated to the Apache Software Foundation. It's designed to handle high-throughput, fault-tolerant, and real-time data streaming. It is often referred to as the distributed commit log.

Messaging Systems#

The main task of messaging system is to transfer data from one application to another so that the applications can mainly work on data without worrying about sharing it. Distributed messaging is based on the reliable message queuing process. Messages are queued non-synchronously between the messaging system and client applications.

There are two types of messaging patterns available:

  1. Point to point messaging system: In this messaging system, messages continue to remain in a queue. More than one consumer can consume the messages in the queue but only one consumer can consume a particular message. After the consumer reads the message in the queue, the message disappears from that queue.

  2. Publish-subscribe messaging system: In this messaging system, messages continue to remain in a Topic. Contrary to Point to point messaging system, consumers can take more than one topic and consume every message in that topic. Message producers are known as publishers and Kafka consumers are known as subscribers.

Key characteristics#

  1. Scalable: It supports horizontal scaling by allowing you to add new brokers (servers) to the clusters.
  2. Fault-tolerant: It can handle failures effectively due to its distributed nature and replication mechanisms.
  3. Durable: Kafka uses a "distributed commit log," which means messages are persisted on disk . This ensures data is not lost even if a server goes down.
  4. Fast: Designed to be as fast as possible.
  5. Performance: Achieves high throughput for both publishing (producers) and subscribing (consumers).
  6. No data loss: Guarantees that messages are not lost once they are committed to Kafka.
  7. Zero down time: Designed for continuous operation without interruption.
  8. Reliability: Provides reliable message delivery.

Kafka vs Traditional Messaging systems#

Feature Traditional Messaging System Kafka Streaming Platform
Message Persistence The broker is responsible for keeping track of consumed messages and removing them when messages are read. Messages are typically retained in Kafka topics for a configurable period of time, even after they have been consumed. Kafka offers message persistence, ensuring data durability.
Scalability Not a distributed system, so it is not possible to scale horizontally. It is a distributed streaming system, so by adding more partitions, we can scale horizontally.
Data Model Primarily point-to-point (queues/topics) messaging model. Built around a publish-subscribe (logs) model, which enables multiple consumers to subscribe to topics and process data concurrently.
Ordering of Messages Message ordering can be guaranteed within a single queue or topic but may not be guaranteed across different queues or topics. Kafka maintains message order within a partition, ensuring that messages within a partition are processed in the order they were received.
Message Replay Limited or no built-in support for message replay. Once consumed, messages may be lost unless custom solutions are implemented. Supports message replay from a specified offset, allowing consumers to reprocess past data, which is valuable for debugging, analytics, and data reprocessing.
Use Cases Typically used for traditional enterprise messaging, remote procedure calls (RPC), and task queues. Well-suited for real-time analytics, log aggregation, event sourcing, and building data-intensive, real-time applications.