v.20.3Improvement
Add _path and _file Virtual Columns to HDFS and File Engines
Add_pathand_filevirtual columns toHDFSandFileengines andhdfsandfiletable functions #8489 (Olga Khvostikova)
Why it matters
This feature provides users with metadata about the source of each row by exposing the file path and file name as virtual columns when reading data fromHDFS or local files. It enhances data inspection, filtering, and debugging capabilities by making the origin of data explicit within queries.How to use it
When querying tables created with theHDFS or File engines, or when using the hdfs or file table functions, users can simply select the virtual columns _path and _file to retrieve the file path and file name respectively, for each row. For example:SELECT _path, _file, * FROM hdfs('hdfs://namenode:8020/data/file.csv', 'CSV')This requires no additional configuration as the columns are provided automatically.