v.20.3Improvement

Add _path and _file Virtual Columns to HDFS and File Engines

Add _path and _file virtual columns to HDFS and File engines and hdfs and file table functions #8489 (Olga Khvostikova)
Added _path and _file virtual columns to HDFS and File table engines and the hdfs and file table functions in ClickHouse.

Why it matters

This feature provides users with metadata about the source of each row by exposing the file path and file name as virtual columns when reading data from HDFS or local files. It enhances data inspection, filtering, and debugging capabilities by making the origin of data explicit within queries.

How to use it

When querying tables created with the HDFS or File engines, or when using the hdfs or file table functions, users can simply select the virtual columns _path and _file to retrieve the file path and file name respectively, for each row. For example:

SELECT _path, _file, * FROM hdfs('hdfs://namenode:8020/data/file.csv', 'CSV')


This requires no additional configuration as the columns are provided automatically.