v.21.1New Features
Extended OPTIMIZE DEDUPLICATE Syntax for Column-Specific Duplicate Checks
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on. ... #17846 (Vasily Nemkov).Why it matters
This feature enables users to fine-tune the deduplication process duringOPTIMIZE operations by choosing which columns to consider for duplicate detection. It enhances flexibility and control, improving performance and accuracy of deduplication according to user needs.How to use it
Users can invokeOPTIMIZE ... DEDUPLICATE with an explicit list of columns to match duplicates or use an asterisk/wildcards to implicitly specify columns. For example:OPTIMIZE TABLE table_name DEDUPLICATE BY (column1, column2);or
OPTIMIZE TABLE table_name DEDUPLICATE BY *;