v.22.8Performance Improvement
Distinct Optimization and Memory Efficiency in ClickHouse with Sorted Input Handling
DISTINCTin order withORDER BY: Deduce way to sort based on input stream sort description. Skip sorting if input stream is already sorted. #38719 (Igor Nikonov). Improve memory usage (significantly) and query execution time + useDistinctSortedChunkTransformfor final distinct whenDISTINCTcolumns matchORDER BYcolumns, but rename toDistinctSortedStreamTransforminEXPLAIN PIPELINE→ this improves memory usage significantly + remove unnecessary allocations in hot loop inDistinctSortedChunkTransform. #39432 (Igor Nikonov). UseDistinctSortedTransformonly when sort description is applicable to DISTINCT columns, otherwise fall back to ordinary DISTINCT implementation + it allows making less checks duringDistinctSortedTransformexecution. #39528 (Igor Nikonov). Fix:DistinctSortedTransformdidn't take advantage of sorting. It never cleared HashSet since clearing_columns were detected incorrectly (always empty). So, it basically worked as ordinaryDISTINCT(DistinctTransform). The fix reduces memory usage significantly. #39538 (Igor Nikonov).