技術(tech)

Kafka – About topicName, partitionNumber, and groupId as Keys in __consumer_offsets Topic

This article investigates and introduces the keys in the __consumer_offsets Topic: topicName, partitionNumber, and groupId.

Background

In the following article, I explained points to consider when operating Debezium:

https://gonkunblog.com/operation-of-debezium/2236/

The concept of Kafka Topics came up during that discussion.

Kafka has a topic called __consumer_offsets that keeps track of which messages each Subscriber has consumed.

This __consumer_offsets appears to determine if subscribers are the same using the key [topicName, partitionNumber, groupId].

Now, a question arises:
What exactly are these key components [topicName, partitionNumber, groupId]?

I’ll investigate this and create some diagrams to help visualize the concept.

Target Audience

  • Anyone who wants to slightly improve their understanding of the Topic concept

Visual Understanding

My understanding of Kafka Topics before this investigation was as follows:

I only understood that "Topics hold Messages sent from Producers."

topicName and partitionNumber

If we increase our understanding of Topics slightly, we can see what partitionNumber means.

Looking deeper into a single Topic, the image is like this:

The explanation in the following article was very clear:
https://qiita.com/sigmalist/items/5a26ab519cbdf1e07af3

  • 1 Topic > consists of multiple Partitions
  • Each Kafka Server has a Broker, and each Partition consists of Replicas that are replicated across Brokers
      • Assuming Kafka is in a Cluster configuration to improve fault tolerance

In other words, Messages sent from Producers (like Debezium) are stored in the various Partitions within the corresponding Kafka Topic.
More specifically, Messages are first stored on the Broker of the Leader replica for each Partition, and then Replicated to Brokers of other Follower replicas.

Each Consumer watches a specific Partition and retrieves new Messages from the Topic.

At this point, you should have a better understanding of topicName and partitionNumber as keys in __consumer_offsets.

groupId

Now let’s look at what groupId is.

This refers to the Consumer group.
On the Consumer side, you can configure a Consumer Group consisting of one or more Consumers.

The idea is that distributing message reading across various Partitions is more efficient and faster than having a single Consumer read from only one Partition.
Also, rather than having just one Consumer, multiple Consumers can read Messages from multiple Partitions in parallel.

Furthermore, duplicate Message reading generally doesn’t occur within the same Consumer Group.

For example, here’s what it looks like when a Topic has 3 Partitions and a Consumer Group has 2 Consumers:

  • Assuming round-robin processing of Topic messages

Messages in the Topic are read distributively, and processed in a distributed manner on the Consumer side, allowing messages to be consumed more efficiently.

  • I’m not 100% certain whether the round-robin is applied to Partitions or to Consumers, so I apologize if my understanding is incorrect…

The following article was very helpful, including specific verification details:
https://pppurple.hatenablog.com/entry/2018/11/20/213651

This one was also quite useful:
https://qiita.com/sigmalist/items/3b512e2ab49b07271665#consumer-group%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6

By now, you should have a better understanding of each key in __consumer_offsets.

Keys in __consumer_offsets

Now we have a slightly better understanding of the following question:

What exactly are the keys [topicName, partitionNumber, groupId]?

The key is which Consumer Group is reading which Partition of which Topic.

Summary

In this article, I organized my thoughts to improve my understanding of Topics and created diagrams to visualize the concepts.

This article was just about organizing logical concepts based on information from various sources.
Therefore, the reliability of this article isn’t very high (please take it with a grain of salt).

Next time, I’d like to verify these concepts through practical testing.

Thank you for reading this far.

References

In writing this article, I referenced various documents and articles (honestly, they have more detailed information than my article…).