v.20.3New Feature

Added groupArraySample function with reservoir sampling algorithm

Added groupArraySample function (similar to groupArray) with reservior sampling algorithm. #8286 (Amos Bird)
Added the groupArraySample aggregation function that collects a sample of elements using the reservoir sampling algorithm, similar to groupArray.

Why it matters

The groupArraySample function addresses the need to efficiently sample elements from groups when the full aggregation may be too large or unnecessary. It provides a way to gather a representative subset of data within each group, saving memory and computational resources while maintaining randomness in the sample.

How to use it

Use groupArraySample in SQL queries like other aggregation functions. For example:

SELECT key, groupArraySample(value) AS sample_values
FROM table
GROUP BY key

This will return a sample array of value elements per group using reservoir sampling.