v.21.11New Features

Add Unicode normalization functions: normalizeUTF8NFC, normalizeUTF8NFD, normalizeUTF8NFKC, normalizeUTF8NFKD

Add functions for Unicode normalization: normalizeUTF8NFC, normalizeUTF8NFD, normalizeUTF8NFKC, normalizeUTF8NFKD functions. #28633 (darkkeks).
Introduces four new functions for Unicode normalization in UTF-8 strings: normalizeUTF8NFC, normalizeUTF8NFD, normalizeUTF8NFKC, and normalizeUTF8NFKD.

Why it matters

These functions provide standardized Unicode normalization forms to ensure consistent text representation and comparison in ClickHouse. They help solve issues related to different Unicode encoding forms in strings, improving text processing accuracy, sorting, and searching.

How to use it

Use the functions normalizeUTF8NFC, normalizeUTF8NFD, normalizeUTF8NFKC, or normalizeUTF8NFKD in your SQL queries to normalize UTF-8 strings to the respective Unicode normalization form. For example:

SELECT normalizeUTF8NFC(your_column) FROM your_table;