Kafka
Use Case - Logistics revisited
x
x
Pentaho Data Integration
Kafka-docker-compose is a tool or method that allows you to easily configure and set up Apache Kafka along with its components such as Kafka Brokers, ZooKeeper, Kafka Connect, and more in a Docker environment. Using docker-compose, you can define and run multi-container Docker applications where each service (like a Kafka broker or ZooKeeper) is defined in a docker-compose.yml file.
This approach simplifies the complexities of network configurations between these services and ensures that you have a reproducible and isolated environment for development, testing, and potentially production scenarios. It allows for easy scaling of Kafka brokers and other services within your cluster.
The kafka-docker-compose tool requires Python3 and jinja2 installed.
Create a Kafka directory.
cd
mkdir -p ~/KafkaFollow the instructions in the link below to create
Lets start with a simple cluster that consists of: 1 Broker & 1 Controller - Zookeeper mode.
Execute the following command (adds JMX agents for Prometheus & Grafana).
Change the following values in the generated docker-compose.yml file:
Execute the generated docker-compose.yml file.

KaDeck
KaDeck is a specialized tool designed for working with Apache Kafka, offering a user-friendly interface for Kafka monitoring, management, and data exploration. It serves as a comprehensive client that allows developers, data engineers, and operations teams to interact with their Kafka clusters more efficiently.
The tool provides real-time visibility into Kafka topics, consumer groups, and messages, allowing users to browse and search through data streams with advanced filtering capabilities. This makes troubleshooting and debugging significantly easier compared to command-line alternatives. KaDeck also offers features for monitoring cluster performance, analyzing consumer lag, and visualizing message flow throughout the system.
One of KaDeck's key strengths is its ability to decode various message formats automatically (including Avro, JSON, and Protocol Buffers), presenting the data in a structured, readable format. The tool supports both cloud-based and on-premises Kafka deployments, making it versatile for different enterprise environments. For teams working extensively with event streaming platforms, KaDeck helps bridge the gap between technical Kafka operations and business-relevant data insights.
Try KaDeck for free.!
Run the following command - Change the ports to prevent conflict.
Open the Lenses HQ at: http://localhost:8070

Log in with:
Username: admin
Password: admin
xx
We'll generate IoT sensor data using PDI.
Start Pentaho Data Integration:
Kafka Producer
The Kafka Producer step allows you to publish messages in near-real-time to an Kafka broker. Within a transformation, the Kafka Producer step publishes a stream of records to a Kafka topic.
Open the following transformation:
~/Workshop--Data-Integration/Labs/Module 3 - Data Sources/Streaming Data/04 Kafka/tr_kafka_producer.ktr
just for 1 vehicle_id 111 - every 5 seconds
timestamp added
remove some fields
javascript to generate sensor data
dummy step to collect data streams
concat the fields into a 'message' payload
Kafka Producer - connect to broker & publish message / payload
Double-click on Kafka producer step and configure with the following settings.
Setup

Connection
Select a connection type:
Direct: Specify the Bootstrap servers from which you want to receive the Kafka streaming data.
Cluster: Specify the Hadoop cluster configuration from which you want to retrieve the Kafka streaming data. In a Hadoop cluster configuration, you can specify information like host names and ports for HDFS, Job Tracker, security, and other big data cluster components. Multiple servers can be specified if these are part of the same cluster.
Client ID
The unique Client identifier, used to identify and set up a durable connection path to the server to make requests and to distinguish between different clients.
Topic
The category to which records are published.
Key Field
In Kafka, all messages can be keyed, allowing for messages to be distributed to partitions based on their keys in a default routing scheme. If no key is present, messages are randomly distributed to partitions.
Message Field
The individual record contained in a topic.
Options
The Options tab enables you to secure the connection to the broker.
x

Kafka Consumer
The Kafka Consumer step pulls streaming data from Kafka through a transformation. Within the Kafka Consumer step you enter the path that will execute the transformation according to message batch size or duration in near real-time. The child transformation must start with the Get records from stream step.
Additionally, from the Kafka Consumer step, you can select a step in the child transformation to stream records back to the parent transformation. This allows records processed by a Kafka Consumer step in a parent transformation to be passed downstream to any other steps included within the same parent transformation.
Open the following transformation:
~/Workshop--Data-Integration/Labs/Module 3 - Data Sources/Streaming Data/04 Kafka/tr_kafka_consumer.ktr
x
x
x
Get Records
This Get Records step returns records that were previously generated by, in this case, by the Kafka Consumer step.
x
Open the following transformation:
~/Workshop--Data-Integration/Labs/Module 3 - Data Sources/Streaming Data/04 Kafka/tr_process_sensor_data.ktr
x
x
Last updated
Was this helpful?

