v.23.1New Feature

Refactor and Enhance Streaming Engines Kafka/RabbitMQ/NATS with Improved Format Support

Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and add support for all formats, also refactor formats a bit: - Fix producing messages in row-based formats with suffixes/prefixes. Now every message is formatted completely with all delimiters and can be parsed back using input format. - Support block-based formats like Native, Parquet, ORC, etc. Every block is formatted as a separate message. The number of rows in one message depends on the block size, so you can control it via the setting max_block_size. - Add new engine settings kafka_max_rows_per_message/rabbitmq_max_rows_per_message/nats_max_rows_per_message. They control the number of rows formatted in one message in row-based formats. Default value: 1. - Fix high memory consumption in the NATS table engine. - Support arbitrary binary data in NATS producer (previously it worked only with strings contained \0 at the end) - Add missing Kafka/RabbitMQ/NATS engine settings in the documentation. - Refactor producing and consuming in Kafka/RabbitMQ/NATS, separate it from WriteBuffers/ReadBuffers semantic. - Refactor output formats: remove callbacks on each row used in Kafka/RabbitMQ/NATS (now we don't use callbacks there), allow to use IRowOutputFormat directly, clarify row end and row between delimiters, make it possible to reset output format to start formatting again - Add proper implementation in formatRow function (bonus after formats refactoring). #42777 (Kruglov Pavel).
Refactor and enhance Kafka, RabbitMQ, and NATS streaming engines with full format support and improved memory efficiency. This update adds support for all formats including block-based ones like Native, Parquet, and ORC; improves message formatting and parsing; and introduces new settings to control message row counts.

Why it matters

This feature solves issues related to message formatting, parsing accuracy, memory consumption, and limited format support in ClickHouse streaming engines. By supporting complete message formatting with delimiters, block-based formats, and allowing control over rows per message, it improves compatibility and performance. The refactoring also reduces memory usage and clarifies engine settings for users, increasing reliability and usability.

How to use it

Users can leverage the new capabilities by configuring the new settings kafka_max_rows_per_message, rabbitmq_max_rows_per_message, and nats_max_rows_per_message to control the number of rows in each message for row-based formats (default is 1). To handle block-based formats like Native or Parquet, simply use these formats in the engine definition as before. Memory optimization and binary data support in NATS are now default behaviors. Refer to the updated engine documentation for full details.