v.21.6New Feature

Added s3Cluster Function for Parallel File Processing on S3 in ClickHouse

Added a table function s3Cluster, which allows to process files from s3 in parallel on every node of a specified cluster. #22012 (Nikita Mikhaylov).
Added a new table function s3Cluster that enables parallel processing of s3 files across all nodes in a specified ClickHouse cluster.

Why it matters

This feature solves the problem of efficiently reading and processing large datasets stored in s3 by distributing the workload across multiple cluster nodes. It improves performance and scalability when working with s3-based data sources in a distributed ClickHouse setup.

How to use it

Use the s3Cluster table function by specifying the target cluster and the s3 file paths. This will automatically parallelize the reading of s3 files on every node of the cluster during query execution.