v.24.5Experimental Feature

Automatically Mark Replica as Lost and Start Recovery After DDL Failures

Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than max_retries_before_automatic_recovery (100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. #63549 (Alexander Tokmakov).
Automatically marks a replica of a Replicated database as lost and initiates recovery if a DDL task fails consecutively more than max_retries_before_automatic_recovery times (default 100) with the same error. A bug causing skipping of DDL entries during early execution exceptions has also been fixed.

Why it matters

This feature improves the reliability and consistency of Replicated databases by detecting persistent DDL failures on replicas. By automatically marking such replicas as lost and triggering recovery, it prevents prolonged inconsistencies or replication deadlocks caused by repeated task failures. The bug fix ensures all DDL entries are processed correctly, avoiding unintended skips.

How to use it

Users can configure the parameter max_retries_before_automatic_recovery (default is 100) to specify how many consecutive failures of the same DDL task on a replica will trigger automatic marking as lost and recovery. This feature works automatically once the parameter is set; no additional manual steps are required.