v.23.9Improvement

LIKE and Regular Expressions Now Support Non-UTF-8 Patterns with Binary Matching

LIKE, match, and other regular expressions matching functions now allow matching with patterns containing non-UTF-8 substrings by falling back to binary matching. Example: you can use string LIKE '\xFE\xFF%' to detect BOM. This closes #54486. #54942 (Alexey Milovidov).
Enhanced pattern matching for non-UTF-8 substrings in LIKE, match, and other regex functions.

Why it matters

This feature addresses the limitation of matching patterns containing non-UTF-8 binary substrings by enabling fallback to binary matching. It allows users to perform pattern matching on binary data (e.g., detecting BOM) using standard SQL pattern matching functions, improving usability and functionality with non-textual data.

How to use it

Use standard pattern matching functions such as LIKE or match with patterns containing binary sequences. For example, to detect a BOM, you can run:

SELECT * FROM table WHERE string LIKE '\xFE\xFF%'


The system automatically falls back to binary matching when non-UTF-8 substrings are detected in the pattern.