v.24.12New Feature

Added Cache for Primary Index in MergeTree Tables to Optimize Memory Usage

Added cache for primary index of MergeTree tables (can be enabled by table setting use_primary_key_cache). If lazy load and cache are enabled for primary index, it will be loaded to cache on demand (similar to mark cache) instead of keeping it in memory forever. Added prewarm of primary index on inserts/mergs/fetches of data parts and on restarts of table (can be enabled by setting prewarm_primary_key_cache). This allows lower memory usage for huge tables on shared storage, and we tested it on tables over one quadrillion records. #72102 (Anton Popov). #72750 (Alexander Gololobov).
Added a cache for the primary index of MergeTree tables, which can be enabled via the use_primary_key_cache table setting. The cache supports lazy loading on demand and a prewarm mechanism triggered by data inserts, merges, fetches, and table restarts via the prewarm_primary_key_cache setting.

Why it matters

The feature reduces memory usage for huge MergeTree tables on shared storage by loading the primary index into cache only as needed instead of keeping it in memory permanently. This improves resource efficiency and scalability, demonstrated on tables with over one quadrillion records.

How to use it

Enable primary index caching by setting use_primary_key_cache = 1 on a MergeTree table. Optionally, enable automatic prewarming of the primary key cache on data inserts, merges, fetches, and table restarts by setting prewarm_primary_key_cache = 1.