v.23.3Improvement
Fix exact_rows_before_limit for distributed processing and sorting in queries
The parameterexact_rows_before_limitis used to makerows_before_limit_at_leastis designed to accurately reflect the number of rows returned before the limit is reached. This pull request addresses issues encountered when the query involves distributed processing across multiple shards or sorting operations. Prior to this update, these scenarios were not functioning as intended. #47874 (Amos Bird).
Why it matters
This feature solves the problem of inaccurate row count reporting byrows_before_limit_at_least when queries use multiple shards or sorting. It ensures that the number of rows before the limit reflects the actual data processed, enhancing correctness and reliability of query results in distributed environments.How to use it
Enable the feature by setting the parameterexact_rows_before_limit to true in your query settings or configuration. This activates precise counting for rows_before_limit_at_least even for distributed and sorted queries.