v.23.4New Feature

Add extractKeyValuePairs function for noisy string parsing

Add extractKeyValuePairs function to extract key value pairs from strings. Input strings might contain noise (i.e. log files / do not need to be 100% formatted in key-value-pair format), the algorithm will look for key value pairs matching the arguments passed to the function. As of now, function accepts the following arguments: data_column (mandatory), key_value_pair_delimiter (defaults to :), pair_delimiters (defaults to \space , ;) and quoting_character (defaults to double quotes). #43606 (Arthur Passos).
Introduces the extractKeyValuePairs function to extract key-value pairs from strings, even in noisy or partially formatted text such as log files.

Why it matters

This feature addresses the need to parse and extract structured key-value pairs from unstructured or semi-structured string data, helping users analyze logs or similar data where perfect formatting cannot be guaranteed. It enhances data extraction flexibility by allowing custom delimiters and quoting characters.

How to use it

Use the extractKeyValuePairs function by passing the string column as data_column. Optionally, specify key_value_pair_delimiter (default is :), pair_delimiters (default is whitespace, comma, and semicolon), and quoting_character (default is double quotes) to fit your data format. Example usage:
sql<br>SELECT extractKeyValuePairs(log_column, ':', ',;', '"') FROM table<br>