v.18.12New Feature

Added min_merge_bytes_to_use_direct_io Option for MergeTree Engines

Added the min_merge_bytes_to_use_direct_io option for MergeTree engines, which allows you to set a threshold for the total size of the merge (when above the threshold, data part files will be handled using O_DIRECT). #3117
Added the min_merge_bytes_to_use_direct_io setting for MergeTree engines to enable direct I/O for large merges.

Why it matters

This feature allows users to specify a size threshold for data merges in MergeTree tables. When the total size of a merge exceeds this threshold, the merge process will use O_DIRECT to handle data part files, improving disk I/O efficiency and potentially reducing cache pollution during large merges.

How to use it

Set the min_merge_bytes_to_use_direct_io parameter in the MergeTree table engine settings with the desired byte threshold. For example:

CREATE TABLE example (
...
) ENGINE = MergeTree()
SETTINGS min_merge_bytes_to_use_direct_io = 1000000000;


This will enable direct I/O for merges larger than 1GB.