v.24.2Experimental Feature

Support for LEFT JOIN, ALL INNER JOIN, and Subqueries in Parallel Replicas

Support LEFT JOIN, ALL INNER JOIN, and simple subqueries for parallel replicas (only with analyzer). New setting parallel_replicas_prefer_local_join chooses local JOIN execution (by default) vs GLOBAL JOIN. All tables should exist on every replica from cluster_for_parallel_replicas. New settings min_external_table_block_size_rows and min_external_table_block_size_bytes are used to squash small blocks that are sent for temporary tables (only with analyzer). #58916 (Nikolai Kochetov).
ClickHouse now supports LEFT JOIN, ALL INNER JOIN, and simple subqueries for parallel replicas when using the analyzer. It introduces a new setting parallel_replicas_prefer_local_join to control whether to execute JOIN operations locally or globally. Additional settings min_external_table_block_size_rows and min_external_table_block_size_bytes optimize block size for temporary external tables during parallel processing.

Why it matters

This feature enables more flexible and efficient distributed query execution across parallel replicas by supporting common join types and subqueries. It solves the problem of choosing between local and global JOIN execution strategies, improving query performance and resource utilization. Squashing small blocks for temporary tables reduces overhead during data transfer between replicas.

How to use it

To use these features, ensure all tables involved exist on every replica in the cluster_for_parallel_replicas. Enable or configure the join strategy with the parallel_replicas_prefer_local_join setting to prefer local or global JOIN execution. Adjust min_external_table_block_size_rows and min_external_table_block_size_bytes to control the block sizes for temporary tables sent between replicas when using the analyzer for parallel replica queries.