v.24.3upgrade-notes

ClickHouse String Data Type Handling and Default Compression Settings Update

ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, output_format_parquet_string_as_string, output_format_orc_string_as_string, output_format_arrow_string_as_string. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster lz4 compression method, that's why we set zstd by default. This is controlled by the settings output_format_parquet_compression_method, output_format_orc_compression_method, and output_format_arrow_compression_method. We changed the default to zstd for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov).