Reading time ~9 minutes
So I Heard You Want to Learn Kafka
If you are a security professional working within the Kubernetes ecosystem, there’s a high chance that, sooner or later, you’ll face Apache Kafka.
With names like Netflix, Linkedin, Microsoft, Goldman Sachs, and many others listed as corporate users, it is a technology heavily used by high-performance applications. At the same time, the usual perception from the security community is that Kafka is often seen as an obscure system.
This post, part of the “Kubernetes Primer for Security Professionals” series, is going to try to help security professionals approach Kafka, by walking through the journey I undertook to get the basics first, and later to focus on the security aspects of it.
Hopefully this will help a little in your own journey to understand Kafka.
What is Kafka
To start, I really like Confluent’s definition of Kafka:
Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform.
In a few lines, it concisely summarises what Kafka is, and how it has evolved since its inception. But what exactly does “distributed streaming platform” mean?
The “Introduction” page of the official Kafka website does a decent job in explaining that a streaming platform has three key capabilities:
Capability | Description |
---|---|
Publish and Subscribe |
|
Store |
|
Process |
|
In addition, it is imperative to be aware that Kafka is composed by 5 core APIs:
- The Producer API = allows an application to publish a stream of records to one or more Kafka topics.
- The Consumer API = allows an application to subscribe to one or more topics and process the stream of records produced to them.
- The Streams API = allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
- The Connector API = allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.
- The Admin API = allows managing and inspecting topics, brokers and other Kafka objects.

Once grasped these foundational topics, you’ll have to familiarise yourself with the core abstraction Kafka provides for a stream of records, namely topics and partitions. Once again, the “Introduction” page of the official Kafka website does a decent job in explaining these concepts.


At the same time, I’ve also found “Thorough Introduction to Apache Kafka” particularly useful to understand how Kafka works under the hood and how it manages data distribution and replication.
This should be enough to get you started, but I highly recommend to explore the inner workings of Kafka with resources like “Kafka: The Definitive Guide” (available for free download).
A Special Mention to Zookeeper
If you want to deal with Kafka, you should also familiarise yourself, at least at a high-level, with Apache Zookeeper: an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems.
Zookeeper is important for Kafka, as Kafka uses it to store its metadata: data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a separate Zookeeper cluster.

It should be noted, though, this is meant to change, as in 2019 a plan was outlined to break Kafka’s dependency from Zookeeper and bring metadata management into Kafka itself.
The dependency hasn’t been removed as of yet, so expect to see Zookeeper lying around for a while (especially for legacy clusters).
Getting Some Hands-On Experience
For some hands-on experience, there are many different ways/tutorials on how to deploy Kafka in a (more or less) straightforward way.
I personally found there wasn’t something as straightforward as I wanted (especially if we are talking about a testing lab, rather than production environments), so I created k8s-lab-plz, a modular Kubernetes Lab which provides an easy and streamlined way to deploy a test cluster with support for different components. It currently supports Vault, ELK (Elasticsearch, Kibana, Filebeats), Metrics (Prometheus, Grafana, Alertmanager), and, most importantly, Kafka (Kafka, Zookeeper, KafkaExporter, Entity Operator).
Each component (like Kafka) can be deployed in a repeatable way with one single command:
❯ plz run //components/kafka:deploy
❯ kubectl -n kafka get pods
NAME READY STATUS RESTARTS AGE
kafka-cluster-entity-operator-77b9f56dd-625m7 3/3 Running 0 22s
kafka-cluster-kafka-0 2/2 Running 0 47s
kafka-cluster-kafka-exporter-795f5ccb5b-2hlff 1/1 Running 0 74s
kafka-cluster-zookeeper-0 1/1 Running 0 2m59s
strimzi-cluster-operator-54565f8c56-nf24m 1/1 Running 0 5m46s
❯ plz run //components/kafka:list-topics
[+] Starting container...
my-topic
test-topic
test.topic
pod "kafka-interact-list" deleted
I’ll leave you to the “Kafka Setup” documentation of k8s-lab-plz if you want to explore it further.
What About Security?
Security of Kafka deployments mainly revolves around three components:
- Transport Layer Encryption (TLS): allows for data to be encrypted in flight between consumers and producers.
- Authentication (mTLS/SASL): allows to verify the identity of consumers and producers, by authenticating them against the Kafka cluster using TLS or SASL.
- Authorization (ACLs): once clients are authenticated, access control lists (ACL) can be leveraged to determine whether or not a particular client would be authorised to read from or write to some topic.

Transport Layer Encryption
As pointed out by the Banzai Cloud team in the “Transport Layer Encryption” section of their “Kafka security on Kubernetes, automated” post, messages routed towards, within, or out of a Kafka cluster are unencrypted by default. By enabling TLS support, data can securely be transmitted over the network while preventing man-in-the-middle attacks.
If you are looking at rolling out TLS onto a Kafka cluster, I’ve found the “Encryption with SSL” section of the Confluent documentation to be a good starting point, as it provides a detailed walkthrough tutorial.
Authentication
There are two common ways of providing authentication:
- TLS: TLS-based authentication basically means mutual authentication (mTLS), which ensures that traffic is both secure and trusted in both directions between two parties. In practice, this involves the provisioning of certificates to clients, so that the Kafka brokers will be able to verify their identity.
- SASL: with SASL-based authentication, the authentication mechanism itself is separated from the Kafka protocol. SASL supports many different mechanisms, as nicely described by “Introduction to Apache Kafka Security”. Below you can see a summary:
Method | Description |
---|---|
SASL/PLAIN |
|
SASL/SCRAM (SCRAM_SSL) |
|
SASL/GSSAPI (Kerberos) |
|
SASL/OAUTHBEARER |
|
For practical tips on how to roll-out authentication to a Kafka cluster, the Confluent documentation has got you covered for both TLS (“Encryption and Authentication with SSL”) and SASL (“Authentication with SASL”).
Authorization
There are two common ways of providing authorization:
- Access Control Lists (ACLs): Kafka, by default, provides an Authorizer implementation that uses Zookeeper to store all the ACLs.
- Open Policy Agent (OPA): OPA has a plugin for Kafka.
Authorization via ACLs
As briefly mentioned above, Kafka ships with an authorizer, a server plugin used by Kafka to authorize operations based on the principal and the resource being accessed.
To learn about the different kinds of principals (based on the authentication provider used), resources, and working with ACLs (adding, removing, etc.), “Authorization using ACLs” is a quite exhaustive resource.
To simplify management of ACLs, tools like kafka-security-manager can also be used. kafka-security-manager
, infact, manages Kafka ACLs by leveraging an external source as the source of truth. Zookeeper just contains a copy of the ACLs instead of being the source.
Authorization via OPA
The Open Policy Agent (OPA) is an open source, general-purpose policy engine that unifies policy enforcement. OPA provides a high-level declarative language (Rego) that allows to specify policy as code and simple APIs to offload policy decision-making from your software. The really nice thing of OPA is that can be used to enforce policies in, among others, microservices, Kubernetes, CI/CD pipelines, API gateways, Kafka, and more.
Explaining how OPA work is out of scope for this blog post, but I highly recommend to check it out. The OPA documentation is a good place to start to get more information about it.
The interesting part is that OPA has a plugin for providing authorization to Kafka. For example, the policy below restricts consumer access to topics containing Personally Identifiable Information (PII):
#-----------------------------------------------------------------------------
# High level policy for controlling access to Kafka.
#
# * Deny operations by default.
# * Allow operations if no explicit denial.
#
# The kafka-authorizer-opa plugin will query OPA for decisions at
# /kafka/authz/allow. If the policy decision is _true_ the request is allowed.
# If the policy decision is _false_ the request is denied.
#-----------------------------------------------------------------------------
package kafka.authz
default allow = false
allow {
not deny
}
deny {
is_read_operation
topic_contains_pii
not consumer_is_whitelisted_for_pii
}
#-----------------------------------------------------------------------------
# Data structures for controlling access to topics. In real-world deployments,
# these data structures could be loaded into OPA as raw JSON data. The JSON
# data could be pulled from external sources like AD, Git, etc.
#-----------------------------------------------------------------------------
consumer_whitelist = {"pii": {"pii_consumer"}}
topic_metadata = {"credit-scores": {"tags": ["pii"]}}
#-----------------------------------
# Helpers for checking topic access.
#-----------------------------------
topic_contains_pii {
topic_metadata[topic_name].tags[_] == "pii"
}
consumer_is_whitelisted_for_pii {
consumer_whitelist.pii[_] == principal.name
}
#-----------------------------------------------------------------------------
# Helpers for processing Kafka operation input. This logic could be split out
# into a separate file and shared. For conciseness, we have kept it all in one
# place.
#-----------------------------------------------------------------------------
is_write_operation {
input.operation.name == "Write"
}
is_read_operation {
input.operation.name == "Read"
}
is_topic_resource {
input.resource.resourceType.name == "Topic"
}
topic_name = input.resource.name {
is_topic_resource
}
principal = {"fqn": parsed.CN, "name": cn_parts[0]} {
parsed := parse_user(urlquery.decode(input.session.sanitizedUser))
cn_parts := split(parsed.CN, ".")
}
parse_user(user) = {key: value |
parts := split(user, ",")
[key, value] := split(parts[_], "=")
}
Conclusion
In this post, part of the “Kubernetes Primer for Security Professionals” series, I tried to collate different resources that can be used to start approaching Kafka and its security model.
Hopefully this will help a little in your own journey to understand Kafka.
If something is unclear, or if I’ve overlooked some aspects, please do let me know on Twitter @lancinimarco.