v.23.9New Feature
Improve JSON Schema Inference and Type Handling in ClickHouse
Improve schema inference from JSON formats: 1) Now it's possible to infer named Tuples from JSON objects without experimental JSON type under a settinginput_format_json_try_infer_named_tuples_from_objectsin JSON formats. Previously without experimental type JSON we could only infer JSON objects as Strings or Maps, now we can infer named Tuple. Resulting Tuple type will conain all keys of objects that were read in data sample during schema inference. It can be useful for reading structured JSON data without sparse objects. The setting is enabled by default. 2) Allow parsing JSON array into a column with type String under settinginput_format_json_read_arrays_as_strings. It can help reading arrays with values with different types. 3) Allow to use type String for JSON keys with unkown types (null/[]/{}) in sample data under settinginput_format_json_infer_incomplete_types_as_strings. Now in JSON formats we can read any value into String column and we can avoid getting errorCannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Mapsduring schema inference by using type String for unknown types, so the data will be read successfully. #54427 (Kruglov Pavel).
Why it matters
This feature improves the schema inference process for JSON formats by enabling more accurate and flexible type detection. It allows named Tuples to be inferred from JSON objects without relying on experimental JSON types, helps with reading JSON arrays as strings to accommodate heterogeneous array values, and permits treating unknown or incomplete JSON types as strings. These enhancements reduce schema inference errors and improve compatibility with diverse JSON data structures.How to use it
The feature is controlled via three new settings in JSON input formats:- Enable named Tuple inference from JSON objects with
input_format_json_try_infer_named_tuples_from_objects (enabled by default).- Read JSON arrays as strings by enabling
input_format_json_read_arrays_as_strings.- Treat incomplete or unknown JSON types as strings using
input_format_json_infer_incomplete_types_as_strings.Users can configure these settings when reading JSON data to improve schema inference and avoid errors related to null or heterogeneous JSON values.