v.20.1New Feature

Add categoricalInformationValue aggregate function for discrete feature analysis

Add aggregate function categoricalInformationValue which calculates the information value of a discrete feature. #8117 (hcz)
Add aggregate function categoricalInformationValue which calculates the information value of a discrete feature.

Why it matters

The feature provides a way to compute the Information Value (IV) metric for categorical features directly within ClickHouse. Information Value is widely used in feature selection, especially in credit scoring and risk modeling, to measure the predictive power of a discrete variable. This function helps users evaluate the usefulness of categorical features efficiently without exporting data to external tools.

How to use it

Users can apply the new aggregate function categoricalInformationValue in their SELECT queries to calculate the information value of a categorical column with respect to a binary target. Typical usage involves grouping data by the categorical feature and aggregating with categoricalInformationValue(feature_column, target_column).