v.25.8Performance Improvement

Remove zero byte

Published: November 25, 2025

Remove zero byte. Closes #85062. A few minor bugs were fixed. Functions structureToProtobufSchema, structureToCapnProtoSchema didn't correctly put a zero-terminating byte and were using a newline instead of it. That was leading to a missing newline in the output, and could lead to buffer overflows while using other functions that depend on the zero byte (such as logTrace, demangle, extractURLParameter, toStringCutToZero, and encrypt/decrypt). The regexp_tree dictionary layout didn't support processing strings with zero bytes. The formatRowNoNewline function, called with Values format or with any other format without a newline at the end of rows, erroneously cuts the last character of the output. Function stem contained an exception-safety error that could lead to a memory leak in a very rare scenario. The initcap function worked in the wrong way for FixedString arguments: it didn't recognize the start of the word at the start of the string if the previous string in a block ended with a word character. Fixed a security vulnerability of the Apache ORC format, which could lead to the exposure of uninitialized memory. Changed behavior of the function replaceRegexpAll and the corresponding alias, REGEXP_REPLACE: now it can do an empty match at the end of the string even if the previous match processed the whole string, such as in the case of ^a*|a*$ or ^|.* - this corresponds to the semantic of JavaScript, Perl, Python, PHP, Ruby, but differs to the semantic of PostgreSQL. Implementation of many functions has been simplified and optimized. Documentation for several functions was wrong and has now been fixed. Keep in mind that the output of byteSize for String columns and complex types, which consisted of String columns, has changed (from 9 bytes per empty string to 8 bytes per empty string), and this is normal. #85063 (Alexey Milovidov).