v.24.8New Feature

Add rows_before_aggregation_at_least Statistic to Query Response

Add the rows_before_aggregation_at_least statistic to the query response when a new setting, rows_before_aggregation is enabled. This statistic represents the number of rows read before aggregation. In the context of a distributed query, when using the group by or max aggregation function without a limit, rows_before_aggregation_at_least can reflect the number of rows hit by the query. #66084 (morning-color).
Added the rows_before_aggregation_at_least statistic to query responses when the rows_before_aggregation setting is enabled. This statistic shows the number of rows read before aggregation during query execution.

Why it matters

This feature helps users understand the volume of data processed before aggregation, especially in distributed queries using group by or max without a limit. It provides insight into query performance and resource usage by reporting the minimum number of rows scanned prior to aggregation.

How to use it

Enable the feature by setting rows_before_aggregation = 1 in your query or session settings. Once enabled, the query response will include the rows_before_aggregation_at_least statistic indicating the number of rows read before aggregation.