v.22.2New Feature
Add Custom Deduplication Semantic Setting in MergeTree/ReplicatedMergeTree
Add a setting that allows a user to provide own deduplication semantic inMergeTree/ReplicatedMergeTreeIf provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov).
Why it matters
This feature enables users to control deduplication behavior more precisely during data insertion. By specifying a unique value for the custom setting in eachINSERT statement, users can prevent identical data blocks from being deduplicated, which is useful when intentional duplicate inserts need to be preserved.How to use it
Users can enable this feature by setting the new deduplication semantic setting to a custom value that uniquely identifies each inserted block. For example, include a unique identifier in eachINSERT query to ensure that ClickHouse treats each block as distinct and does not skip deduplication based on data content.