v.23.9New Feature

Added IO Scheduling Support for Remote Disks in ClickHouse

Added IO scheduling support for remote disks. Storage configuration for disk types s3, s3_plain, hdfs and azure_blob_storage can now contain read_resource and write_resource elements holding resource names. Scheduling policies for these resources can be configured in a separate server configuration section resources. Queries can be marked using setting workload and classified using server configuration section workload_classifiers to achieve diverse resource scheduling goals. More details in the docs. #47009 (Sergei Trifonov). Added "bandwidth_limit" IO scheduling node type. It allows you to specify max_speed and max_burst constraints on traffic passing though this node. #54618 (Sergei Trifonov).
Added IO scheduling support for remote disks in ClickHouse, enabling configuration of read and write resource limits for s3, s3_plain, hdfs, and azure_blob_storage disk types. Introduced new scheduling policies and workload classification to manage resource usage effectively. Also, a new "bandwidth_limit" IO scheduling node type was added to specify max_speed and max_burst constraints on IO traffic.

Why it matters

This feature addresses the need for fine-grained IO resource management and traffic control on remote storage disks, helping to avoid resource contention and ensure stable performance under concurrent workloads. By associating disk operations with configurable resources and workload classifications, users can prioritize and throttle IO bandwidth according to workload demands and server capacity.

How to use it

To use this feature, specify read_resource and write_resource names in the storage configuration for remote disk types s3, s3_plain, hdfs, or azure_blob_storage. Define scheduling policies for these resources in the resources section of the server configuration. Use the workload query setting and configure workload classifiers in the workload_classifiers section to control query classification for scheduling. Additionally, configure the new "bandwidth_limit" IO scheduling node with max_speed and max_burst parameters to constrain IO traffic bandwidth.