v.22.1New Feature

Implement Data Schema Inference for Various Input Formats in ClickHouse

Implement data schema inference for input formats. Allow to skip structure (or write just auto) in table functions file, url, s3, hdfs and in parameters of clickhouse-local . Allow to skip structure in create query for table engines File, HDFS, S3, URL, Merge, Buffer, Distributed and ReplicatedMergeTree (if we add new replicas). #32455 (Kruglov Pavel).
Implement automatic data schema inference for input formats in various table functions, table engines, and clickhouse-local parameters, allowing users to omit explicit structure definitions by using auto or skipping the structure altogether.

Why it matters

This feature solves the problem of manually specifying the data schema when working with external data sources or formats. It simplifies the data import and table creation process by enabling ClickHouse to automatically infer the structure from the input data. This reduces user effort, prevents errors, and improves usability when working with file-based and external storage engines or when running local queries.

How to use it

Users can enable schema inference by either omitting the structure definition or specifying auto for the structure parameter in the following contexts:

- Table functions: file, url, s3, hdfs
- Parameters of clickhouse-local
- CREATE TABLE queries for engines: File, HDFS, S3, URL, Merge, Buffer, Distributed, and ReplicatedMergeTree (notably when adding new replicas)

This allows ClickHouse to detect the data schema automatically without requiring explicit column definitions.