v.22.1New Feature

Add Aggregated Functions for Categorical Dependency Measurement

Add aggregate functions cramersV, cramersVBiasCorrected, theilsU and contingency. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist.
ClickHouse introduces new aggregate functions cramersV, cramersVBiasCorrected, theilsU, and contingency to measure dependency and association between categorical values using cross-tab (pair histograms).

Why it matters

These functions provide a way to quantify the strength of association between discrete categorical variables, similar to how correlation coefficients work for numerical values. This helps users analyze relationships in categorical data more effectively within ClickHouse.

How to use it

Use the new aggregate functions in your SELECT queries on categorical columns. For example, you can invoke cramersV(column1, column2) to calculate the measure of association between two categorical columns. No special setup is required beyond using these functions in your queries.