v.22.1New Feature
Added hdfsCluster Function for Parallel HDFS File Processing in Clusters
Added table functionhdfsClusterwhich allows processing files from HDFS in parallel from many nodes in a specified cluster, similarly tos3Cluster. #32400 (Zhichang Yu).
Why it matters
This feature addresses the need for efficient distributed processing of large datasets stored in HDFS by leveraging cluster-wide parallelism in ClickHouse. It improves performance and scalability when querying HDFS data in a multi-node environment.How to use it
Users can utilize thehdfsCluster table function in their SQL queries to process HDFS files in parallel across cluster nodes. The function is used similarly to s3Cluster and requires specifying the target cluster where the HDFS files reside.