v.23.1New Feature
Refactor and Enhance Streaming Engines Kafka/RabbitMQ/NATS with Improved Format Support
Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and add support for all formats, also refactor formats a bit: - Fix producing messages in row-based formats with suffixes/prefixes. Now every message is formatted completely with all delimiters and can be parsed back using input format. - Support block-based formats like Native, Parquet, ORC, etc. Every block is formatted as a separate message. The number of rows in one message depends on the block size, so you can control it via the settingmax_block_size. - Add new engine settingskafka_max_rows_per_message/rabbitmq_max_rows_per_message/nats_max_rows_per_message. They control the number of rows formatted in one message in row-based formats. Default value: 1. - Fix high memory consumption in the NATS table engine. - Support arbitrary binary data in NATS producer (previously it worked only with strings contained \0 at the end) - Add missing Kafka/RabbitMQ/NATS engine settings in the documentation. - Refactor producing and consuming in Kafka/RabbitMQ/NATS, separate it from WriteBuffers/ReadBuffers semantic. - Refactor output formats: remove callbacks on each row used in Kafka/RabbitMQ/NATS (now we don't use callbacks there), allow to use IRowOutputFormat directly, clarify row end and row between delimiters, make it possible to reset output format to start formatting again - Add proper implementation in formatRow function (bonus after formats refactoring). #42777 (Kruglov Pavel).
Why it matters
This feature solves issues related to message formatting, parsing accuracy, memory consumption, and limited format support in ClickHouse streaming engines. By supporting complete message formatting with delimiters, block-based formats, and allowing control over rows per message, it improves compatibility and performance. The refactoring also reduces memory usage and clarifies engine settings for users, increasing reliability and usability.How to use it
Users can leverage the new capabilities by configuring the new settingskafka_max_rows_per_message, rabbitmq_max_rows_per_message, and nats_max_rows_per_message to control the number of rows in each message for row-based formats (default is 1). To handle block-based formats like Native or Parquet, simply use these formats in the engine definition as before. Memory optimization and binary data support in NATS are now default behaviors. Refer to the updated engine documentation for full details.