v.21.1New Features

Extended OPTIMIZE DEDUPLICATE Syntax for Column-Specific Duplicate Checks

Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on. ... #17846 (Vasily Nemkov).
Extended the OPTIMIZE ... DEDUPLICATE syntax to allow specifying explicit columns or using implicit asterisk/column transformers for deduplication.

Why it matters

This feature enables users to fine-tune the deduplication process during OPTIMIZE operations by choosing which columns to consider for duplicate detection. It enhances flexibility and control, improving performance and accuracy of deduplication according to user needs.

How to use it

Users can invoke OPTIMIZE ... DEDUPLICATE with an explicit list of columns to match duplicates or use an asterisk/wildcards to implicitly specify columns. For example:

OPTIMIZE TABLE table_name DEDUPLICATE BY (column1, column2);

or
OPTIMIZE TABLE table_name DEDUPLICATE BY *;