# Kafka

{% hint style="info" %}

#### Apache Kafka

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and applications. It allows for the publishing, subscribing to, storing, and processing of streams of records in a fault-tolerant way. Kafka is designed for high volume publish-subscribe messages and streams, meant to be durable, fast, and scalable. Essentially, Kafka enables you to model your data as a continuous stream of events which can be consumed in real time or stored for later processing. Common use cases include messaging, website activity tracking, metrics collection and monitoring, log aggregation, stream processing, and event sourcing.
{% endhint %}

<figure><img src="/files/MNKdi0ZYkYMy2QpNSHur" alt=""><figcaption><p>Kafka Cluster</p></figcaption></figure>

{% hint style="info" %}

#### **Kafka with KRaft**

Kafka is deprecating Zookeeper ..

Its going to be replaced with Kraft .. if you haven't come across it before here's some blurb ..

The core concept of the new quorum controller setup is the fact that Kafka is itself log-based. The changes in the metadata can be presented as messages stored in the log, which can be then streamed to subscribers.

In KRaft mode, we can designate multiple Kafka instances as controllers. The single node can be either working solely as a broker or controller or performing both roles at once (very handy in smaller clusters). This is different from the legacy setup, where we had only one controller. Still, even though we can have multiple controllers, only one is active at the particular moment, and all others work on standby. If the active controller fails, one will take over its tasks.

Only the active controller can make changes to the cluster’s metadata. It persists the updates in a special internal topic (with just one partition) called *\_\_cluster\_metadata*. Messages from that topic are then replicated by all other controllers. This way all of them have almost the newest version of the data in their local replicas. This is a big deal - a new controller no longer has to fetch all the data from Zookeeper. It has all the data in its local log, maybe it just needs to catch up on a few missed messages.
{% endhint %}

{% embed url="<https://kafka.apache.org/documentation/#kraft>" %}

To see the Kafka EE plugIn in action see: [Kafka Use Case](https://academy.pentaho.com/pentaho-data-integration/use-cases/streaming-data/kafka)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.pentaho.com/pentaho-data-integration/data-integration/ee-plugins/kafka.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
