In today’s fast-moving world, organizations face the challenge of delivering software that is not only innovative and feature-rich, but also fast, reliable, and resilient. To stay competitive, businesses must rapidly respond to customer demands, scale efficiently, and continuously evolve their systems. As microservices.io highlights,
“For a business to thrive in today’s volatile, uncertain, complex, and ambiguous world, IT must deliver software rapidly, frequently, and reliably.”
From an architectural perspective, this is where microservices come into play. The microservice architecture has gained a lot of attention in recent years for application development. At its core, the architecture is about breaking down applications into smaller, self-contained units that are independently deployable and loosely coupled.
(ref; https://microservices.io/)
This provides greater flexibility and resilience to software systems. Typically, each microservice handles a specific business function, and it can evolve independently, scale as needed, and adapt quickly to customer needs. This allows businesses to stay competitive in a time where speed and reliability are crucial factors to not only enhance user experience but to ensure that customers are not dissatisfied, which in today's market could mean turning into alternative products or solutions.
An important aspect to address when implementing microservices is how these independent services communicate with each other. On this topic an important question that arises is: should the communication be synchronous or asynchronous?
Two popular communication models for microservices are Request/Response (e.g., REST APIs) and the Event-Driven Architecture (EDA).
This blog explores how Kafka enables building event-driven microservices by facilitating asynchronous, decoupled communication.
To understand more about the Event-Driven Architecture and how Kafka fits into this approach, let’s take a look at how EDA differs from traditional Request/Response models.
Request/ Response Architecture vs Event-Driven Architecture
How Kafka Fits Into Event-Driven Microservices
On the topic of event-driven microservices, Apache Kafka has been considered a widely adopted technology to achieve the requirements of facilitating asynchronous communication across microservices with resilience and high availability. Kafka acts as a centralized event hub, where events produced by services (producers) are stored and consumed by other services (consumers). To dive deeper, let's understand some of the features that kafka provides to ensure that your production application is capable of handling the asynchronous needs of the system.
Decoupling of Services
One of the biggest benefits of introducing kafka is that services don’t need to communicate with each other directly. They produce and consume events from Kafka topics. In other words, Kafka decouples the services, allowing them to evolve independently. If a service is unavailable, the consumer services can still process the events when they are back online, ensuring fault tolerance.
Fundamentals concepts of Kafka
To effectively use Kafka in your microservices, it’s important to understand the key building blocks:
- Cluster: A Kafka deployment typically consists of multiple servers (nodes), forming a cluster to ensure high availability and scalability.
- Broker: a broker is a single server that handles data storage and processing within a Kafka cluster.
- Topic: Think of a topic as a category or feed name to which records are sent. Certain services will be writing messages to kafka topics (produce) while others will be consuming events from kafka topics.
- Partition: Topics are split into partitions for scalability and parallelism. Each partition is an ordered, immutable sequence of records. It’s important to note that Kafka only guarantees message ordering within a single partition—not across multiple partitions of the same topic. To maintain order for related messages, you can use a message key (discussed below) which Kafka uses to consistently route messages with the same key to the same partition. This ensures ordering for use cases where sequence matters.
- Message (or Record): A single unit of data. This is the event that producers send and consumers read. Primarily a message consist of the following,
- Key - Acts as an identifier. Kafka guarantees ordering within the same partition using the message key (messages with the same key, will essentially end up in the same partitions). As an example, if your system processes customer operations identified by customerId, using customerId as the Kafka message key ensures all events for that customer are stored in the same partition and processed in order.
- Message payload - The payload is the actual data content of the message. Kafka is agnostic to the payload format and supports various serialization formats such as JSON, Avro, Protobuf etc..
- Headers - Optional key-value pairs that accompany a Kafka message, which are great for things like metadata. They allow consuming applications to inspect message attributes. such as event type for example before deserializing the payload. This can help filter or route messages efficiently, improving performance and flexibility in event processing.
- Key - Acts as an identifier. Kafka guarantees ordering within the same partition using the message key (messages with the same key, will essentially end up in the same partitions). As an example, if your system processes customer operations identified by customerId, using customerId as the Kafka message key ensures all events for that customer are stored in the same partition and processed in order.
- Producer: A service or application that writes messages to Kafka topics.
- Consumer: A service or application that reads messages from Kafka topics.
- Consumer Group: A group of consumers that work together to consume data from a topic. Kafka ensures that each message is consumed by one and only one member within a consumer group. This is an important concept, particularly when deploying applications to a kubernetes cluster, because you may set the replica count of a pod to more than 1 but if you don’t assign them within the same groupId, that would result in duplicate processing. On the other hand, you might have other use cases where there are multiple interested parties to consume from the same kafka topic, in such scenarios you want to ensure that they are in different consumer groups.
Simplified analogy with consumer groups
Imagine a fast-food restaurant that receives orders through a mobile app. When a customer places an order, an OrderRequested event is generated.
In the kitchen, there might be two chefs responsible for preparing orders. To avoid duplication of effort, both chefs should belong to the same consumer group for the OrderRequested topic. Kafka will ensure that only one chef receives and processes each order event, so the meal is prepared just once.
Once an order is prepared, an OrderPrepared event is published. This event needs to be consumed by different teams:
- The rider team, responsible for delivering the order, should be in one consumer group. This ensures only a single rider picks up each delivery task.
- The customer notification service, which informs customers that their order is ready, should be in a different consumer group.
This allows both the rider and the customer notification services to independently consume the same OrderPrepared event without interfering with each other.
Considerations when using kafka for Production applications:
Event storage and retention
Kafka also offers event storage backed by a configurable retention period which not only provides a durable storage for events for a limited specified time but also provides essentially an audit log, that can be used to track various events in the system, that can be eventually moved from kafka to a more permanent storage like a database.
Integration with Other Services
Kafka integrates well with other services and connectors provide a robust solution for integrating with external systems.
- Source connectors ingest data from external systems into Kafka topics.
- Sink connectors export data from Kafka topics to external systems.
For example, if you want to maintain an audit table with all events produced to a Kafka topic, you can set up a sink connector to push data from the topic into your database or data store, ensuring that all events are persisted externally.
Similarly, if you want your consuming applications to act upon changes to a database table, you can use a source connector that captures database changes and publishes them as events into Kafka topics, which the consumer applications can then process.
Monitoring & traceability
Both monitoring and traceability are critical for production-grade applications that use kafka. Below are some essential considerations:
Health checks
When your applications depend not only on Kafka but other external systems, it is essential to implement health checks. In Kubernetes environments, these are typically wired into liveness and readiness probes. Health checks help detect broker unavailability or application issues early, preventing cascading failures that could disrupt application functionality.
Prometheus & Grafana
Similar to the health checks, you must also have prometheus metrics that are scrapped and visualized via grafafana dashboards. Key metrics to monitor include event production and consumption success/failure counts, request latencies etc.
Usage of Dead Letter Queues
DLQs are essential for traceability and error handling. They capture messages that failed processing, ensuring that problematic data is not lost and can be inspected or reprocessed later.
Transactions
Without transactions, producing and consuming messages across multiple topics or partitions can potentially lead to duplicates, out-of-order events, or partial updates in downstream systems in edge case scenarios. Kafka transactions ensure atomicity, either all messages in a transaction are successfully committed or none are, preventing inconsistent states.
Managed vs. Self-Managed Kafka
When evaluating Kafka, one of the early decisions you'll need to make is whether to use a managed Kafka service or to go with a self-managed deployment. Each option comes with trade-offs in terms of cost, control, operational effort, and flexibility.
A simple way to understand the difference is to compare it to hosting a dinner:
- Managed Kafka is like ordering food from a trusted catering service. You're relying on experts who specialize in delivering reliable results. It typically costs more, and you may be tied to the vendor's platform or feature set but you avoid the hassle of provisioning infrastructure, managing upgrades, monitoring, and ensuring high availability.
- Self-Managed Kafka is like cooking the entire meal yourself. You have full control over every detail from infrastructure choices to tuning performance. However, it requires deep expertise in Kafka, operational excellence, and constant vigilance to manage updates, broker failures, scaling, backups, and disaster recovery.
Enabling Cross-Functional Collaboration with Kafka
In a microservices architecture, it's common for multiple cross-functional teams to produce to and consume from Kafka topics, often across different domains. While Kafka is built for decoupling, this flexibility can lead to unintended coupling or conflicts if governance isn’t enforced.
To avoid issues such as overlapping consumer groups, unintended message consumption, it's important to define clear collaboration boundaries and access policies.
Key Practices:
- Namespace or Domain Ownership: Assign ownership of Kafka topics and consumer groups at the domain or team level. This helps in clearly defining responsibility and avoiding accidental interference.
- Consumer Group Isolation: Ensure that each team uses distinct consumer group IDs. Sharing a consumer group unintentionally can lead to competition for messages and unexpected behavior in consumer applications.
- Topic Naming Conventions: Establish clear and consistent naming conventions for topics that reflect their domain ownership. This improves discoverability and enforces boundaries.
- Use Kafka ACLs to restrict which teams can produce to or consume from specific topics. This minimizes the risk of unauthorized access and promotes a contract-driven integration model.
Final Thoughts
Kafka is a powerful technology that enables effective asynchronous communication across microservices (amongst other different use cases that it supports). However, like any tool, it comes with trade-offs such as operational complexity, and architectural implications. It's not a silver bullet.
The key is not to adopt Kafka (or any technology for that matter) just because it's popular, but to understand its strengths and limitations and apply it thoughtfully to solve the right problems, based on your business needs.