v.25.12Improvement

Add a setting insert_select_deduplicate

Add a setting insert_select_deduplicate. Which makes it more clear how do we handle insert deduplication when INSERT SELECT. In general it is not possible to do deduplication for such queries, but if the table is not changed and the result is sorted then it is possible to do deduplication on retry. We could not track that the source is the same. But we could check that the result of select query is sorted. Actually it turned out that in general case it is really hard to check, but the simple case with ORDER BY ALL is easy. Right now the logic here actually is broken. We try to deduplicate, but in the most cases it just does not see duplicates among the blocks because the select returns different data. #91830 (Sema Checherinda).
Introduces a new setting insert_select_deduplicate to clarify and control deduplication behavior during INSERT SELECT operations in ClickHouse.

Why it matters

The setting addresses the challenge of deduplication for INSERT SELECT queries. Typically, deduplication is not feasible because the source data may change or be unordered. However, if the target table remains unchanged and the SELECT query result is sorted (especially using ORDER BY ALL), deduplication on retry becomes possible. This feature solves inconsistencies caused by the previous, flawed logic that attempted deduplication but often failed due to varying data visibility across blocks.

How to use it

Users can enable deduplication for INSERT SELECT by setting insert_select_deduplicate to true in the session or query settings. It works reliably when the SELECT query uses ORDER BY ALL to ensure sorted results, enabling ClickHouse to detect duplicates properly during retries.