v.24.3upgrade-notes
ClickHouse String Data Type Handling and Default Compression Settings Update
ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings,output_format_parquet_string_as_string,output_format_orc_string_as_string,output_format_arrow_string_as_string. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the fasterlz4compression method, that's why we setzstdby default. This is controlled by the settingsoutput_format_parquet_compression_method,output_format_orc_compression_method, andoutput_format_arrow_compression_method. We changed the default tozstdfor Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov).