Advanced Topics#
Topic Design#
What do we need to consider when designing topics?
Data accuracy
Making sure that events that must be ordered end up in same partition (and same topic)
Using key’ed records needing to be ordered, patition setup will be essential
In contrast, search events may not need to be ordered
Popularity of events
E.g. searches for tickets > Any other request
Consumers time spent filtering out mass of events
Amount of data to process
Will we need multiple consumers to prosess this data?
Easy solution: Increase number of partitions, but this will require more resources
Hint: Start with small partitions
Topic Options#
Topic options or parameters are additional parameters provided to the kafka-topics
command. Only required parameter
is --bootstrap-server
, and an action, e.g.
--list
--create
--describe
To make sure you do not create topics on accident, set auto.create.topics.enable=false
.
Altering the topics can be performed with the --alter
flag, although this is only recommended for partitions without a
key, or when there is no data. If the topics has a key, the partition logic or ordering of the messages will be
affected.
Adding and deleting configuration can be performed using --add-config <key>=<value>
and -delete-config <key>
options.
Topic Alterations#
Topic/Log compaction is different from retention; here the goal is to make sure that the latest value exists, and is
controlled by the --config cleanup.policy=compact
option. When a topic is marked for compaction, a single log can be
observed in either a clean or dirty state.
Clean state: Messages that have been though compaction before (duplicates do not exist)
Dirty state: Messages that have not been though compaction before (duplicates exist)
A partition is compacted when the ratio between dirty and total records is larger than min.cleanable.dirty.ratio
and
by either:
AND
min.compaction.lag.ms
is reachedOR
max.compaction.lag.ms
is reached
When data is sparse, it is recommended to adjust the requency of log rotation. Using the option log.roll.hours=24
will
cause segments rotate about once a day (broker configuration).