If you are a security professional working within the Kubernetes ecosystem, there’s a high chance that, sooner or later, you’ll face Apache Kafka.

With names like Netflix, Linkedin, Microsoft, Goldman Sachs, and many others listed as corporate users, it is a technology heavily used by high-performance applications. At the same time, the usual perception from the security community is that Kafka is often seen as an obscure system.

This post, part of the “Kubernetes Primer for Security Professionals” series, is going to try to help security professionals approach Kafka, by walking through the journey I undertook to get the basics first, and later to focus on the security aspects of it.

Hopefully this will help a little in your own journey to understand Kafka.


What is Kafka

To start, I really like Confluent’s definition of Kafka:

Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform.

In a few lines, it concisely summarises what Kafka is, and how it has evolved since its inception. But what exactly does “distributed streaming platform” mean?

The “Introduction” page of the official Kafka website does a decent job in explaining that a streaming platform has three key capabilities:

Capability Description
Publish and Subscribe
  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system
  • This functionality is backed by an immutable commit log
Store
  • Kafka can store streams of records in a fault-tolerant durable way
  • As such, it can act as a source of truth for distributing data across multiple nodes
Process
  • Kafka can also process streams of records as they occur
  • The Streams API allows for on-the-fly processing of data

In addition, it is imperative to be aware that Kafka is composed by 5 core APIs:

  1. The Producer API = allows an application to publish a stream of records to one or more Kafka topics.
  2. The Consumer API = allows an application to subscribe to one or more topics and process the stream of records produced to them.
  3. The Streams API = allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
  4. The Connector API = allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.
  5. The Admin API = allows managing and inspecting topics, brokers and other Kafka objects.
High Level Overview of Kafka - Courtesy of Apache Kafka
High Level Overview of Kafka - Courtesy of Apache Kafka

Once grasped these foundational topics, you’ll have to familiarise yourself with the core abstraction Kafka provides for a stream of records, namely topics and partitions. Once again, the “Introduction” page of the official Kafka website does a decent job in explaining these concepts.

At the same time, I’ve also found “Thorough Introduction to Apache Kafka” particularly useful to understand how Kafka works under the hood and how it manages data distribution and replication.

This should be enough to get you started, but I highly recommend to explore the inner workings of Kafka with resources like “Kafka: The Definitive Guide” (available for free download).

A Special Mention to Zookeeper

If you want to deal with Kafka, you should also familiarise yourself, at least at a high-level, with Apache Zookeeper: an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems.

Zookeeper is important for Kafka, as Kafka uses it to store its metadata: data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a separate Zookeeper cluster.

Migration Plan - Courtesy of Confluent
Migration Plan - Courtesy of Confluent

It should be noted, though, this is meant to change, as in 2019 a plan was outlined to break Kafka’s dependency from Zookeeper and bring metadata management into Kafka itself.

The dependency hasn’t been removed as of yet, so expect to see Zookeeper lying around for a while (especially for legacy clusters).

Update April 2021: Confluent announced that the early access of the KIP-500 code has been committed to trunk and is expected to be included in the upcoming 2.8 release. For the first time, you can run Kafka without ZooKeeper.

Getting Some Hands-On Experience

For some hands-on experience, there are many different ways/tutorials on how to deploy Kafka in a (more or less) straightforward way.

I personally found there wasn’t something as straightforward as I wanted (especially if we are talking about a testing lab, rather than production environments), so I created k8s-lab-plz, a modular Kubernetes Lab which provides an easy and streamlined way to deploy a test cluster with support for different components. It currently supports Vault, ELK (Elasticsearch, Kibana, Filebeats), Metrics (Prometheus, Grafana, Alertmanager), and, most importantly, Kafka (Kafka, Zookeeper, KafkaExporter, Entity Operator).

Each component (like Kafka) can be deployed in a repeatable way with one single command:

❯ plz run //components/kafka:deploy
❯ kubectl -n kafka get pods
NAME                                            READY   STATUS    RESTARTS   AGE
kafka-cluster-entity-operator-77b9f56dd-625m7   3/3     Running   0          22s
kafka-cluster-kafka-0                           2/2     Running   0          47s
kafka-cluster-kafka-exporter-795f5ccb5b-2hlff   1/1     Running   0          74s
kafka-cluster-zookeeper-0                       1/1     Running   0          2m59s
strimzi-cluster-operator-54565f8c56-nf24m       1/1     Running   0          5m46s

❯ plz run //components/kafka:list-topics
[+] Starting container...
my-topic
test-topic
test.topic
pod "kafka-interact-list" deleted

I’ll leave you to the “Kafka Setup” documentation of k8s-lab-plz if you want to explore it further.


What About Security?

Security of Kafka deployments mainly revolves around three components:

  1. Transport Layer Encryption (TLS): allows for data to be encrypted in flight between consumers and producers.
  2. Authentication (mTLS/SASL): allows to verify the identity of consumers and producers, by authenticating them against the Kafka cluster using TLS or SASL.
  3. Authorization (ACLs): once clients are authenticated, access control lists (ACL) can be leveraged to determine whether or not a particular client would be authorised to read from or write to some topic.
High Level Overview of Kafka Security Controls - Courtesy of Confluent
High Level Overview of Kafka Security Controls - Courtesy of Confluent

Transport Layer Encryption

As pointed out by the Banzai Cloud team in the “Transport Layer Encryption” section of their “Kafka security on Kubernetes, automated” post, messages routed towards, within, or out of a Kafka cluster are unencrypted by default. By enabling TLS support, data can securely be transmitted over the network while preventing man-in-the-middle attacks.

If you are looking at rolling out TLS onto a Kafka cluster, I’ve found the “Encryption with SSL” section of the Confluent documentation to be a good starting point, as it provides a detailed walkthrough tutorial.

Authentication

There are two common ways of providing authentication:

  1. TLS: TLS-based authentication basically means mutual authentication (mTLS), which ensures that traffic is both secure and trusted in both directions between two parties. In practice, this involves the provisioning of certificates to clients, so that the Kafka brokers will be able to verify their identity.
  2. SASL: with SASL-based authentication, the authentication mechanism itself is separated from the Kafka protocol. SASL supports many different mechanisms, as nicely described by “Introduction to Apache Kafka Security”. Below you can see a summary:
Method Description
SASL/PLAIN
  • Basic username/password combination, not recommended as it does not provide a sufficient level of protection (especially since, if TLS is not enabled, credentials will be transmitted in plaintext)
  • The credentials have to be stored on the Kafka brokers (i.e., each change will require a restart of the brokers)
SASL/SCRAM (SCRAM_SSL)
  • Improvement over the previous method, which consists of a username/password combination alongside a challenge (salt)
  • TLS still need to be enabled to avoid transmitting credentials in plaintext
  • Username and password hashes are stored in Zookeeper
SASL/GSSAPI (Kerberos)
  • For organizations already using Kerberos (e.g., Active Directory), this method is probably the most obvious choice given the benefits in term of security over the previous 2 methods
SASL/OAUTHBEARER
  • Allows to leverage OAuth2 for authentication
  • Since the default OAUTHBEARER implementation in Kafka creates and validates Unsecured JSON Web Tokens, its implementation for production environments must be overridden using custom login and SASL Server callback handlers

For practical tips on how to roll-out authentication to a Kafka cluster, the Confluent documentation has got you covered for both TLS (“Encryption and Authentication with SSL”) and SASL (“Authentication with SASL”).

Authorization

There are two common ways of providing authorization:

  1. Access Control Lists (ACLs): Kafka, by default, provides an Authorizer implementation that uses Zookeeper to store all the ACLs.
  2. Open Policy Agent (OPA): OPA has a plugin for Kafka.

Authorization via ACLs

As briefly mentioned above, Kafka ships with an authorizer, a server plugin used by Kafka to authorize operations based on the principal and the resource being accessed.

To learn about the different kinds of principals (based on the authentication provider used), resources, and working with ACLs (adding, removing, etc.), “Authorization using ACLs” is a quite exhaustive resource.

To simplify management of ACLs, tools like kafka-security-manager can also be used. kafka-security-manager, infact, manages Kafka ACLs by leveraging an external source as the source of truth. Zookeeper just contains a copy of the ACLs instead of being the source.

Authorization via OPA

The Open Policy Agent (OPA) is an open source, general-purpose policy engine that unifies policy enforcement. OPA provides a high-level declarative language (Rego) that allows to specify policy as code and simple APIs to offload policy decision-making from your software. The really nice thing of OPA is that can be used to enforce policies in, among others, microservices, Kubernetes, CI/CD pipelines, API gateways, Kafka, and more.

Explaining how OPA work is out of scope for this blog post, but I highly recommend to check it out. The OPA documentation is a good place to start to get more information about it.

The interesting part is that OPA has a plugin for providing authorization to Kafka. For example, the policy below restricts consumer access to topics containing Personally Identifiable Information (PII):

#-----------------------------------------------------------------------------
# High level policy for controlling access to Kafka.
#
# * Deny operations by default.
# * Allow operations if no explicit denial.
#
# The kafka-authorizer-opa plugin will query OPA for decisions at
# /kafka/authz/allow. If the policy decision is _true_ the request is allowed.
# If the policy decision is _false_ the request is denied.
#-----------------------------------------------------------------------------
package kafka.authz

default allow = false

allow {
    not deny
}

deny {
	is_read_operation
	topic_contains_pii
	not consumer_is_whitelisted_for_pii
}

#-----------------------------------------------------------------------------
# Data structures for controlling access to topics. In real-world deployments,
# these data structures could be loaded into OPA as raw JSON data. The JSON
# data could be pulled from external sources like AD, Git, etc.
#-----------------------------------------------------------------------------

consumer_whitelist = {"pii": {"pii_consumer"}}

topic_metadata = {"credit-scores": {"tags": ["pii"]}}

#-----------------------------------
# Helpers for checking topic access.
#-----------------------------------

topic_contains_pii {
	topic_metadata[topic_name].tags[_] == "pii"
}

consumer_is_whitelisted_for_pii {
	consumer_whitelist.pii[_] == principal.name
}

#-----------------------------------------------------------------------------
# Helpers for processing Kafka operation input. This logic could be split out
# into a separate file and shared. For conciseness, we have kept it all in one
# place.
#-----------------------------------------------------------------------------

is_write_operation {
    input.operation.name == "Write"
}

is_read_operation {
	input.operation.name == "Read"
}

is_topic_resource {
	input.resource.resourceType.name == "Topic"
}

topic_name = input.resource.name {
	is_topic_resource
}

principal = {"fqn": parsed.CN, "name": cn_parts[0]} {
	parsed := parse_user(urlquery.decode(input.session.sanitizedUser))
	cn_parts := split(parsed.CN, ".")
}

parse_user(user) = {key: value |
	parts := split(user, ",")
	[key, value] := split(parts[_], "=")
}

Conclusion

In this post, part of the “Kubernetes Primer for Security Professionals” series, I tried to collate different resources that can be used to start approaching Kafka and its security model.

Hopefully this will help a little in your own journey to understand Kafka.

If something is unclear, or if I’ve overlooked some aspects, please do let me know on Twitter @lancinimarco.