v.25.5New Feature

Add stringBytesUniq and stringBytesEntropy

Add stringBytesUniq and stringBytesEntropy functions to search for possibly random or encrypted data. #79350 (Sachin Kumar Singh).
Introduces the stringBytesUniq and stringBytesEntropy functions to ClickHouse for analyzing strings by measuring unique byte counts and entropy, aiding detection of random or encrypted data.

Why it matters

These functions help users identify potentially random or encrypted data within string columns by quantifying the uniqueness and entropy of byte values, which are common indicators of such data types. This facilitates data analysis tasks like data quality checks and anomaly detection.

How to use it

Use stringBytesUniq to count unique bytes in a string and stringBytesEntropy to calculate the entropy of bytes in a string. They can be applied directly in SQL queries on string fields, for example:

SELECT stringBytesUniq(column_name), stringBytesEntropy(column_name) FROM table_name;