v.25.5New Feature

New functions sparseGrams, sparseGramsHashes, sparseGramsHashesUTF8, sparseGramsUTF8

New functions sparseGrams, sparseGramsHashes, sparseGramsHashesUTF8, sparseGramsUTF8 for calculating "sparse-ngrams" - a robust algorithm for extracting substrings for indexing and search. #79517 (scanhex12).
Introduces new functions sparseGrams, sparseGramsHashes, sparseGramsHashesUTF8, and sparseGramsUTF8 for calculating "sparse-ngrams" — a robust algorithm for substring extraction to enhance indexing and search capabilities.

Why it matters

The feature addresses the need for a more effective and robust method of extracting meaningful substrings (sparse n-grams) from text data to improve indexing efficiency and the accuracy of search queries. This helps users build better search indexes and perform faster and more relevant text searches.

How to use it

Use the new functions sparseGrams, sparseGramsHashes, sparseGramsHashesUTF8, and sparseGramsUTF8 in your SQL queries to generate sparse n-grams or their hashes from text columns. These functions can be applied directly in the SELECT statements where substring extraction or text indexing is required.