v.24.11Improvement

Better error-handling and cancellation for ON CLUSTER backups and restores

Better error-handling and cancellation of ON CLUSTER backups and restores: - If a backup or restore fails on one host then it'll be cancelled on other hosts automatically - No weird errors must be produced because some hosts failed while other hosts continued their work - If a backup or restore is cancelled on one host then it'll be cancelled on other hosts automatically - Fix issues with test_disallow_concurrency - now disabling of concurrency must work better - Backups and restores now are much more resistant to ZooKeeper disconnects. #70027 (Vitaly Baranov).
Improved error handling and automatic cancellation for ON CLUSTER backups and restores, ensuring consistent and reliable operations across multiple hosts.

Why it matters

This feature addresses problems with partial failures during ON CLUSTER backup and restore operations. If a backup or restore fails or is cancelled on one host, it will now be automatically cancelled on all other hosts, preventing inconsistent states and eliminating confusing error messages. It also enhances stability by making these operations more resilient to ZooKeeper disconnects, and improves concurrency control during backups and restores.

How to use it

Users simply perform BACKUP and RESTORE commands with the ON CLUSTER clause as usual. The improved error handling, automatic cancellation, and ZooKeeper fault tolerance are integrated internally and require no additional configuration.