v.19.8New Features

Added Tovalidutf8 Function, Which Replaces All Invalid Utf-8 Characters by Replacement Character � (u+fffd)

Added toValidUTF8 function, which replaces all invalid UTF-8 characters by replacement character � (U+FFFD). #5322 (Danila Kutenin)
Added the toValidUTF8 function which replaces all invalid UTF-8 characters with the replacement character (U+FFFD).

Why it matters

This feature addresses the problem of invalid UTF-8 sequences in input data by sanitizing strings and ensuring output validity. It helps maintain data integrity and prevents errors related to invalid UTF-8 encoding during processing and storage.

How to use it

Use the toValidUTF8 function in your queries to convert strings containing invalid UTF-8 sequences. For example:

SELECT toValidUTF8(column_name) FROM table_name;