v.23.8Improvement

Performance Comparison of Reading Small Files from HDFS with Gluten vs. Spark

While read small files from HDFS by Gluten, we found that it will cost more times when compare to directly query by Spark. And we did something with that. #50063 (KevinyhZou).
Improved performance for reading small files from HDFS when using Gluten in ClickHouse.

Why it matters

This feature addresses the issue where querying small files from HDFS via Gluten in ClickHouse was slower compared to direct Spark queries. It optimizes the read process to reduce query times and enhance efficiency for users working with small HDFS files.

How to use it

Use ClickHouse with Gluten integration to read files from HDFS as usual. The performance improvements are applied automatically when querying small files with Gluten.