v.21.9New Feature

Added NLP Functions for Tokenization and Synonyms Search

Added natural language processing (NLP) functions for tokenization, stemming, lemmatizing and search in synonyms extensions. #24997 (Nikolay Degterinsky).
Added natural language processing (NLP) functions for tokenization, stemming, lemmatizing, and searching within synonyms extensions in ClickHouse.

Why it matters

This feature addresses the need for advanced text processing directly within ClickHouse, enabling users to perform tokenization, stemming, and lemmatization to normalize and analyze textual data. It also allows searching using synonyms extensions, improving search flexibility and accuracy. This reduces the need for external NLP processing and enhances full-text search capabilities.

How to use it

Users can apply the new NLP functions such as tokenization, stemming, and lemmatizing directly in their SQL queries. Additionally, to perform synonym-based searches, they can utilize the search functionality provided by the synonyms extension within ClickHouse. Specific usage examples can be found in the updated ClickHouse documentation and pull request.