Added natural language processing (NLP) functions for tokenization, stemming, lemmatizing, and searching within synonyms extensions in ClickHouse.
Why it matters
This feature addresses the need for advanced text processing directly within ClickHouse, enabling users to perform tokenization, stemming, and lemmatization to normalize and analyze textual data. It also allows searching using synonyms extensions, improving search flexibility and accuracy. This reduces the need for external NLP processing and enhances full-text search capabilities.How to use it
Users can apply the new NLP functions such as tokenization, stemming, and lemmatizing directly in their SQL queries. Additionally, to perform synonym-based searches, they can utilize the search functionality provided by the synonyms extension within ClickHouse. Specific usage examples can be found in the updated ClickHouse documentation and pull request.