v.24.8New Feature

Interpretation of Hive-Style Partitioning for Various Data Engines

Interpret Hive-style partitioning for different engines (File, URL, S3, AzureBlobStorage, HDFS). Hive-style partitioning organizes data into partitioned sub-directories, making it efficient to query and manage large datasets. Currently, it only creates virtual columns with the appropriate name and data. The follow-up PR will introduce the appropriate data filtering (performance speedup). #65997 (Yarik Briukhovetskyi).
Interpret Hive-style partitioning for various storage engines (File, URL, S3, AzureBlobStorage, HDFS) by creating virtual columns that represent partition keys.

Why it matters

Hive-style partitioning organizes data into partitioned sub-directories, allowing efficient querying and management of large datasets. This feature provides support for interpreting these partitions in ClickHouse, enabling users to access partition information as virtual columns. It lays the foundation for improved performance through partition pruning in follow-up updates.

How to use it

When using external storage engines such as File, URL, S3, AzureBlobStorage, or HDFS, Hive-style partitions are automatically recognized and exposed as virtual columns with appropriate names and data types. Users can then query these virtual columns in their SQL statements. The actual data filtering based on these partitions will be enabled in a subsequent release.