Topics Fundamentals#
A topic is a collection of related messages or events – a log or sequence of events. In theory, you can have an unlimited number of topics. Within a topic, we find one or several partitions corresponding to a log, which are allocated to different brokers in the cluster. You can think of partitions as a log; strictly ordered with each message receiving an incremental ID called offset. Within a partitions, we have individual segments, existing on the disk. Kafka performs distributions of the topics to brokers, but the user needs to set resource limitations and requests.
Topic Design#
The two main considerations when creating topics is partitions and replication factor. This is controlled by:
Number of brokers, which limits number of replicas
High replication factor implies greater fault tolerance
Regarding consumers: how many consumers are needed per consumer group? At least as any partitions as number of consumers on a single group is required.
Regarding memory: how much memory is available on each broker?
Required memory can be adjusted with configuration replica.fetch.max.bytes
, which defaults to 1 MB for each partitions
on a broker.
Topic Tools#
Topics are created in the following way with a given --replication-factor
and number of --partitions
:
kafka-topics --bootstrap-server kafka:9092 --create --topic <topic-name> --partitions $PART --replication-factor $REP
To (1) avoid overwriting an existing topic and (2) create a topic that does not exsist, pass the --if-not-exists
flag.
Number of partitions can be altered using the --alter
flag:
kafka-topics --bootstrap-server kafka:9092 --alter --topic <topic-name> --partitions $NEW_NUMBER_OF_PARTITIONS
where <topic-name>
is an existing topic. You can only increase the number of partitions.
Other topic commands include:
--topics-with-overrides
--under-replicated-partitions
--unavailable-partitions
Topic Configuration#
To get configuration for a particular entity type (here topics) may be listed by describing the --entity-type
to kafka-configs
:
kafka-configs --bootstrap-server kafka:9092 --describe --entity-type topics
and add configuration by using the --add-config KEY=VALUE
flag (also need to specify --entity-name
).
Another approach to configure topics is through consumer groups accessible through kafka-consumer-groups.sh
.
Deleting a group with --delete --group <group-name>
can only be performed for consumer groups with no
members (-member
). To reset the offset for an existing consumer group, use --to-latest
. When listing topics, the
topic __consumer_offset
topic will be listed with information of the offset. This is created when consumer groups
are enabled.
Message behaviour#
By default, messages are read one per line. This can be controlled by a class extending the kafka.common.MessageReader
class. Given an extended class MyMessageReader
, this can be supplemented to the Kafka console through the
argument --line-reader org.example.MyMessageReader
.