Apache Kafka – Cluster Setup
Setting up a Kafka Cluster involves running multiple Kafka broker instances on different machines (or on different ports on the same machine). Kafka brokers in a cluster work together to distribute partitions of topics, ensuring high availability, load balancing, and fault tolerance.
This guide provides an example of how to set up a Kafka cluster with 3 brokers running on separate machines (or on separate ports in a local machine for simplicity).
Prerequisites:
- Java: Make sure Java 8 or later is installed on your machines.
- Kafka: Download Apache Kafka from Kafka Downloads.
- Zookeeper: Apache Kafka 2.8.0 and later requires Zookeeper to manage the cluster, unless you’re using KRaft mode (which doesn’t require Zookeeper).
Step 1: Download and Extract Apache Kafka
- Download Kafka: Go to the Apache Kafka download page and download the latest version.
- Extract Kafka: Extract the Kafka archive to your desired directory. For example:
tar -xvf kafka_2.13-2.8.0.tgz cd kafka_2.13-2.8.0
Step 2: Configure Zookeeper
Kafka relies on Zookeeper for cluster management (leader election, partition management, etc.). In a multi-broker Kafka cluster, Zookeeper coordinates the brokers.
If you’re using Kafka version 2.8.0 or earlier (with Zookeeper), you’ll need to start a Zookeeper instance first.
- Start Zookeeper:Kafka comes with a default Zookeeper configuration file (config/zookeeper.properties), which you can use to start a Zookeeper server.To start Zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
By default, Zookeeper will run on localhost:2181.
Step 3: Configure Kafka Brokers
Now, let’s configure the Kafka brokers. We will set up 3 Kafka brokers in this example.
Broker 1 Configuration (Server 1)
Copy the server.properties file to create separate configurations for each broker.
cp config/server.properties config/server1.properties
- Edit config/server1.properties for Broker 1 configuration:
- Set broker.id to 1 (this is the unique ID for the broker).
- Configure listeners to listen on port 9092.
- Set log.dirs to a directory where logs will be stored (e.g., /tmp/kafka-logs).
- Set zookeeper.connect to the address of the Zookeeper server.
Example of config/server1.properties:
# Broker 1 Configuration
broker.id=1
listeners=PLAINTEXT://localhost:9092
log.dirs=/tmp/kafka-logs
zookeeper.connect=localhost:2181
num.partitions=3
Broker 2 Configuration (Server 2)
Copy the server.properties file to create a configuration for Broker 2:
cp config/server.properties config/server2.properties
- Edit config/server2.properties for Broker 2 configuration:
- Set broker.id to 2.
- Configure listeners to listen on port 9093.
- Set log.dirs to a different directory (e.g., /tmp/kafka-logs2).
- Set zookeeper.connect to the address of the Zookeeper server (same as Broker 1).
Example of config/server2.properties:
# Broker 2 Configuration
broker.id=2
listeners=PLAINTEXT://localhost:9093
log.dirs=/tmp/kafka-logs2
zookeeper.connect=localhost:2181
num.partitions=3
Broker 3 Configuration (Server 3)
Copy the server.properties file to create a configuration for Broker 3:
cp config/server.properties config/server3.properties
- Edit config/server3.properties for Broker 3 configuration:
- Set broker.id to 3.Configure listeners to listen on port 9094.
- Set log.dirs to a different directory (e.g. /tmp/kafka-log3).
- Set zookeeper.connect to the address of the Zookeeper server (same as Broker 1).
- Example of config/server3.properties:
# Broker 3 Configuration
broker.id=3
listeners=PLAINTEXT://localhost:9094
log.dirs=/tmp/kafka-logs3
zookeeper.connect=localhost:2181
num.partitions=3
Step 4: Start Kafka Brokers
Now that you have configured 3 Kafka brokers, you need to start each broker instance.
- Start Broker 1:
bin/kafka-server-start.sh config/server1.properties
- Start Broker 2: Open a new terminal and run:
bin/kafka-server-start.sh config/server2.properties
- Start Broker 3: Open another terminal and run:
bin/kafka-server-start.sh config/server3.properties
At this point, you have 3 Kafka brokers running on ports 9092, 9093, and 9094. They are all connected to the same Zookeeper instance.
Step 5: Verify Kafka Cluster
Once the brokers are started, you can verify that your Kafka cluster is running correctly by listing the topics.
- List Topics:Run the following command to list the topics across the entire cluster:
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
- Check Broker Status:You can check the status of your brokers by running:
bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092
This will show you the API versions supported by the brokers in the cluster.
Step 6: Create a Topic with Multiple Partitions
In a Kafka cluster, topics are divided into partitions. Each partition is managed by a broker, and the data is distributed among the brokers in the cluster.
To create a topic with 3 partitions:
bin/kafka-topics.sh --create --topic insurance_policies --bootstrap-server localhost:9092 --partitions 3 --replication-factor 2
- insurance_policies : The name of the topic.
- –partitions 3 : The number of partitions for the topic
- –replication-factor 2 : The number of replicas of each partition (to ensure fault tolerance).
Kafka will automatically replicate the topic’s data across brokers based on the replication factor you specify.
Step 7: Produce and Consume Messages
- Produce Messages: To produce messages to the topic insurance_policies, run:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic insurance_policies
Type messages like:
{"policyId": "P12345", "customer": "John Doe", "coverage": "Health", "amount": 50000}
Press Enter after each message.
- Consume Messages: To consume messages from the topic insurance_policies, run:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic insurance_policies --from-beginning
Step 8: Scaling the Kafka Cluster
To scale Kafka, simply add more brokers. Follow these steps:
- Add More Brokers: Add additional Kafka brokers by replicating the configuration of existing brokers and assigning new broker.id and port numbers.
- Rebalance Partitions: As you add more brokers, you may need to rebalance partitions across brokers to ensure even data distribution. Kafka provides tools for rebalancing, like the kafka-reassign-partitions.sh script.
Conclusion
Congratulations! You’ve successfully set up a Kafka Cluster with 3 brokers. This cluster is now capable of handling large-scale data streaming workloads. You can add more brokers, topics, partitions, and consumers as needed to meet the demands of your application.
Recent Comments