Xiaomi Galaxy Talos Book

Talos Data Consumption Model Diagram


Data Consumption

Talos's high end Consumer solves many data consumption problems for the user. One of these solutions is "retaining a memory" of consumption, guaranteeing that when users turn on Consumer they can begin consuming from the point of "previous consumption".

message consuming

  • Commit Offset While TalosConsumer is in use, at periodic intervals it can commit data that has already been consumed by the user. We call this a "commit offset", meaning it it is sending an already consumed offset to the server end record. Please be aware that commit offsets don't commit each file individually, rather, they commit as a batch unit. The timing for commits is decided by two factors, "time" and "number of consumed messages"; these can be configured by the user.

Looking at the diagram, we can take Partition 7, File 3 as an example. Assuming this is the first time the user has launched TalosConsumer, and they run it for a period of time before quitting; if the last commit offset to be submitted before quitting is 700, and then the program hangs, when the user restarts the program, TalosConsumer can query the server as to the previous point of consumption, and know that during the last session it consumed up to 700. As a result, the program can begin reading data starting from 701. Supposing the user reads a batch of messages containing 200 messages total, TalosConsumer can search the batch's messages to find the biggest offset number. If the last message read out of the 200 had an offset of 900, TalosConsumer can commit 900 to the server end. By using this method it is effectively "retaining a memory" of consumption.

Note: As shown in the diagram, each consumer group records its own offset record for a partition's consumption. Different consumer groups can consume the same partition, and each of their consumed offset records is individually maintained. As shown, Consumer Group 1's consumed offset record for Partition 7 is 900, and Consumer Group 2's record is 100.

Reset Offset

From the above sections, we know that TalosConsumer can resume reading data from the previous point of consumption following a restart. However, some users may have different needs. What if you want to start reading from the beginning or from the end after restarting the client end program? Talos provides configurations allowing users to begin consumption from different points when restarting the program.

offset reset

Before explaining how to reset the offset, let us first introduce some concepts:

  • Start Offset: The initial offset of the valid data in the current partition is as shown in the diagram: Partition 7's initial Start Offset is the first message in File 1, which has an offset of 0, therefore Partition 7's Start Offset is 0. Assuming the topic data is saved for one day, after one day File 1 will have expired; at that time Partition 7's valid files will start with File 2, and Partition 7's Start Offset will become 380.

  • Last Commit Offset: Discussed above, we won't reiterate here.

  • End Offset: The last valid message in the current partition. As shown in the diagram, the offset of Partition 7's final message is 1000, therefore the End Offset is 1000.

Regardless of whether this is the first time launching TalosConsumer or a subsequent restart, each time the user starts the program they can revise the configurations and reset the initial offset to read from. For information on how to perform a reset, please refer to TalosConsumerAPI's Reset Instructions for notes on how to Reset Offset. The following applies to TalosConsumer's default setting for reading messages:

  • The first time it's launched, the default is to begin reading from the Start Offset (this can be 0, or can be an updated offset following a file's expiration).

  • During a subsequent restart, the default is to begin reading from the Last Commit Offset (this can lead to an invalid Last Commit Offset if the message expired, if this occurs, please refer to Configuration Instructions Scenario 4).