v.25.4Improvement

Enabled a backoff logic

Enabled a backoff logic for all types of replicated tasks. It will provide the ability to reduce CPU usage, memory usage, and log file sizes. Added new settings max_postpone_time_for_failed_replicated_fetches_ms, max_postpone_time_for_failed_replicated_merges_ms and max_postpone_time_for_failed_replicated_tasks_ms which are similar to max_postpone_time_for_failed_mutations_ms. #74576 (MikhailBurdukov).
Implemented a backoff logic for all types of replicated tasks in ClickHouse, allowing automatic postponement on failures to optimize resource usage.

Why it matters

This feature addresses the problem of excessive CPU, memory, and log file consumption caused by immediate retries of failed replicated tasks. By introducing backoff delays, it reduces system load and improves cluster stability and performance.

How to use it

Users can configure the backoff behavior using the new settings: max_postpone_time_for_failed_replicated_fetches_ms, max_postpone_time_for_failed_replicated_merges_ms, and max_postpone_time_for_failed_replicated_tasks_ms. These settings define the maximum postponement time (in milliseconds) for failed replicated fetches, merges, and other replicated tasks respectively, similar to the existing max_postpone_time_for_failed_mutations_ms setting.