v.23.9New Feature

Added GCD Codec for Data Compression in ClickHouse

Added GCD a.k.a. "greatest common denominator" as a new data compression codec. The codec computes the GCD of all column values, and then divides each value by the GCD. The GCD codec is a data preparation codec (similar to Delta and DoubleDelta) and cannot be used stand-alone. It works with data integer, decimal and date/time type. A viable use case for the GCD codec are column values that change (increase/decrease) in multiples of the GCD, e.g. 24 - 28 - 16 - 24 - 8 - 24 (assuming GCD = 4). #53149 (Alexander Nam).
Added GCD (greatest common denominator) as a new data compression codec in ClickHouse that calculates the greatest common divisor of all column values and compresses data by dividing each value by this GCD.

Why it matters

The GCD codec addresses the need for improved compression on columns where values change in multiples of a common factor. By normalizing values through division by the GCD, it optimizes storage and reduces disk space consumption for integer, decimal, and date/time data types.

How to use it

The GCD codec is a data preparation codec similar to Delta and DoubleDelta and cannot be used alone. To enable it, specify GCD within the column's codec chain in the table schema definition. For example:

CREATE TABLE example (
column_name Int32 CODEC(GCD, LZ4)
) ENGINE = ...;