Kafka – Work Flow
Kafka Messaging: Pub-Sub vs Queue-Based Systems
Kafka supports both pub-sub and queue-based messaging systems, offering a fast, reliable, persistent, fault-tolerant, and zero-downtime solution. Producers send messages to a topic, and consumers can choose the messaging model based on their needs. Let’s explore how consumers can select the appropriate messaging system and understand the workflows for each.
Workflow of Pub-Sub Messaging
The following is the step-by-step workflow for Pub-Sub Messaging:
- Producers send messages to a topic at regular intervals.
- Kafka brokers store messages in the partitions configured for that topic, ensuring messages are distributed equally across partitions. For example, if two messages are sent and there are two partitions, Kafka will place one message in each partition.
- Consumers subscribe to a specific topic to begin receiving messages.
- Upon subscribing, Kafka provides the current offset to the consumer and stores this offset in the Zookeeper ensemble.
- Consumers regularly request new messages (typically every 100ms) from Kafka.
- Kafka forwards the new messages received from producers to the consumers.
- Once a consumer receives and processes the message, it sends an acknowledgment back to Kafka.
- Upon receiving the acknowledgment, Kafka updates the offset to the new value and stores it in Zookeeper. This ensures that in case of a server failure, the consumer can resume from the correct message offset.
- This cycle repeats until the consumer stops requesting messages.
- Consumers can rewind or skip to a desired offset at any time, allowing them to re-read messages or jump forward in the stream.
Workflow of Queue Messaging (Consumer Group)
In queue-based messaging, instead of a single consumer, a group of consumers with the same Group ID subscribes to the topic. This means that consumers within the same group share the messages. Here’s how the system works:
- Producers send messages to a topic at regular intervals.
- Kafka stores the messages in the topic’s partitions, similar to the Pub-Sub messaging model.
- A single consumer subscribes to a topic (e.g., Topic-01) with a Group ID (e.g., Group-1).
- Kafka handles the subscription in the same way as Pub-Sub messaging, but when a new consumer with the same Group ID subscribes to the same topic, Kafka shifts to share mode, where the data is distributed between the consumers.
- This data sharing continues until the number of consumers equals the number of partitions for the topic.
- If the number of consumers exceeds the number of partitions, new consumers will not receive any messages until an existing consumer unsubscribes. This is because each consumer must be assigned at least one partition, and once all partitions are allocated, new consumers must wait.
- This setup is known as a Consumer Group, where Kafka efficiently manages the distribution of messages to consumers within a group, providing the benefits of both pub-sub and queue-based messaging systems.
Role of ZooKeeper in Kafka
Apache Kafka heavily depends on Apache Zookeeper, a distributed configuration and synchronization service. Zookeeper acts as the coordination layer between Kafka brokers and consumers, facilitating communication and maintaining metadata. Kafka relies on Zookeeper for several critical tasks:
- Storing metadata: Kafka stores important metadata such as topic information, broker details, and consumer offsets in Zookeeper.
- Failure recovery: Zookeeper replicates Kafka data across its ensemble. If a Kafka broker or Zookeeper itself fails, the system remains unaffected, and Kafka restores its state once Zookeeper is back online, ensuring zero downtime.
- Leader election: Zookeeper handles leader election in Kafka, ensuring that if a leader broker fails, a new leader is chosen seamlessly.
Since Kafka relies on Zookeeper for metadata and synchronization, it ensures high availability and resilience even during failures.
To learn more about Zookeeper, please refer to the official Zookeeper documentation.
Recent Comments