v.22.1New Feature
Add Aggregated Functions for Categorical Dependency Measurement
Add aggregate functionscramersV,cramersVBiasCorrected,theilsUandcontingency. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist.
Why it matters
These functions provide a way to quantify the strength of association between discrete categorical variables, similar to how correlation coefficients work for numerical values. This helps users analyze relationships in categorical data more effectively within ClickHouse.How to use it
Use the new aggregate functions in yourSELECT queries on categorical columns. For example, you can invoke cramersV(column1, column2) to calculate the measure of association between two categorical columns. No special setup is required beyond using these functions in your queries.