v.25.9Improvement

Add a new startup_scripts_failure_reason dimensional metric

Add a new startup_scripts_failure_reason dimensional metric. This metric is needed to distinguish between different error types that result in failing startup scripts. In particular, for alerting purposes, we need to distinguish between transient (e.g., MEMORY_LIMIT_EXCEEDED or KEEPER_EXCEPTION) and non-transient errors. #86202 (Miсhael Stetsyuk).
Added a new dimensional metric startup_scripts_failure_reason to ClickHouse for identifying different error types causing startup script failures.

Why it matters

This feature enables distinguishing between transient errors (such as MEMORY_LIMIT_EXCEEDED or KEEPER_EXCEPTION) and non-transient errors in startup scripts, improving alerting accuracy and operational diagnostics.

How to use it

Users can utilize the startup_scripts_failure_reason metric in their monitoring and alerting setups to filter and respond to specific types of startup script failures by querying this new dimension in the metrics system.