v.21.7New Feature

Support Structs, Maps, and Dictionaries in Arrow/Parquet/ORC Formats

Support structs and maps in Arrow/Parquet/ORC and dictionaries in Arrow input/output formats. Present new setting output_format_arrow_low_cardinality_as_dictionary. #24341 (Kruglov Pavel).
Support for Struct and Map data types in Arrow, Parquet, and ORC formats, along with support for dictionaries in Arrow input/output formats. Introduction of a new setting output_format_arrow_low_cardinality_as_dictionary to control dictionary encoding in Arrow output.

Why it matters

This feature expands ClickHouse's compatibility with complex nested data types when importing and exporting data in popular columnar formats such as Arrow, Parquet, and ORC. It solves problems related to handling structured and map data directly, enhancing data interchange fidelity and efficiency. The new setting allows users to optimize Arrow output by controlling how low cardinality columns are serialized, improving performance and interoperability with systems consuming Arrow data.

How to use it

To enable or disable dictionary encoding for low cardinality columns in Arrow output, set the output_format_arrow_low_cardinality_as_dictionary setting to true or false. Data with Struct and Map types can now be read and written directly when using Arrow, Parquet, and ORC formats without additional transformation.