v.24.4Improvement

Inconsistent Input Field Resizing When Reading Hive Text Files

While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like test_tbl(a Int32, b Int32, c Int32), but the first line of text file only has 2 fields, and in this situation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. #62086 (KevinyhZou).
Fixes incorrect field count detection when reading Hive text files by ensuring the number of input fields is consistent with the Hive table definition rather than relying on the first line of the file.

Why it matters

Previously, when reading Hive text files, the number of input fields was determined by the first line of the file. If the first line had fewer fields than the Hive table definition, subsequent lines with more fields were incorrectly truncated, and missing fields were filled with default values, leading to inaccurate data reading. This feature corrects the behavior by aligning the input field count strictly to the table schema, preventing data loss or misinterpretation.

How to use it

This fix is applied automatically when reading Hive text files using standard Hive integration in ClickHouse. Users do not need to enable any special settings; the number of input fields will now consistently match the Hive table definition.