v.25.7Improvement

When distributed_ddl_output_mode='*_only_active', don't wait

When distributed_ddl_output_mode='*_only_active', don't wait for new or recovered replicas that have replication lag bigger than max_replication_lag_to_enqueue. This should help to avoid DDL task is not finished on some hosts when a new replica becomes active after finishing initialization or recovery, but it accumulated huge replication log while initializing. Also, implement SYSTEM SYNC DATABASE REPLICA STRICT query that waits for replication log to become below max_replication_lag_to_enqueue. #83302 (Alexander Tokmakov).
Added a new behavior for distributed_ddl_output_mode settings with values ending in '_only_active' to skip waiting for new or recovered replicas that exceed the max_replication_lag_to_enqueue threshold. Introduced the SYSTEM SYNC DATABASE REPLICA STRICT query to explicitly wait until the replication lag is below this threshold.

Why it matters

This feature addresses issues where a DDL task hangs indefinitely due to replicas having a large replication lag right after initialization or recovery. By not waiting for such replicas when using distributed DDL, it prevents blocks on DDL execution. The strict sync query allows users to force synchronization when needed, improving reliability and control over replication lag awareness during DDL operations.

How to use it

Set the distributed_ddl_output_mode configuration to a value with the suffix _only_active and ensure max_replication_lag_to_enqueue is properly configured to define the acceptable replication lag threshold. Use the following query to wait for replication lag to reduce below the threshold explicitly:

SYSTEM SYNC DATABASE REPLICA STRICT <database_name>