v.23.3Improvement

Fix exact_rows_before_limit for distributed processing and sorting in queries

The parameter exact_rows_before_limit is used to make rows_before_limit_at_least is designed to accurately reflect the number of rows returned before the limit is reached. This pull request addresses issues encountered when the query involves distributed processing across multiple shards or sorting operations. Prior to this update, these scenarios were not functioning as intended. #47874 (Amos Bird).
exact_rows_before_limit parameter improves accuracy of rows_before_limit_at_least in queries involving distributed processing and sorting.

Why it matters

This feature solves the problem of inaccurate row count reporting by rows_before_limit_at_least when queries use multiple shards or sorting. It ensures that the number of rows before the limit reflects the actual data processed, enhancing correctness and reliability of query results in distributed environments.

How to use it

Enable the feature by setting the parameter exact_rows_before_limit to true in your query settings or configuration. This activates precise counting for rows_before_limit_at_least even for distributed and sorted queries.