v.22.5Improvement

Nullables Detection in Protobuf and ClickHouse Integration Proposal

Nullables detection in protobuf. In proto3, default values are not sent on the wire. This makes it non-trivial to distinguish between null and default values for Nullable columns. A standard way to deal with this problem is to use Google wrappers to nest the target value within an inner message (see https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto). In this case, a missing field is interpreted as null value, a field with missing value if interpreted as default value, and a field with regular value is interpreted as regular value. However, ClickHouse interprets Google wrappers as nested columns. We propose to introduce special behaviour to detect Google wrappers and interpret them like in the description above. For example, to serialize values for a Nullable column test, we would use google.protobuf.StringValue test in our .proto schema. Note that these types are so called "well-known types" in Protobuf, implemented in the library itself. #35149 (Jakub Kuklis).
Adds support for detecting Google protobuf wrapper types to correctly interpret Nullable columns when using proto3, distinguishing between null and default values.

Why it matters

In proto3, default values are omitted when serialized, making it difficult to differentiate between a null value and a default value for Nullable columns in ClickHouse. This feature addresses that by recognizing Google's wrapper types (well-known protobuf types) that explicitly encapsulate nullability. It allows ClickHouse to properly interpret missing fields as null, fields with missing inner values as default, and regular fields as actual values, improving data correctness and usability when importing protobuf data.

How to use it

Define Nullable columns in your ClickHouse table schema as usual, and in your .proto schema, use the corresponding Google wrapper types from google.protobuf.wrappers.proto (e.g., google.protobuf.StringValue test for a Nullable String column named test). ClickHouse will automatically detect and correctly handle these wrappers during protobuf data processing.