v.21.3New Feature

Add extractTextFromHTML Function

Add function extractTextFromHTML #19600 (zlx19950903), (alexey-milovidov).
Added a new function extractTextFromHTML to ClickHouse for extracting plain text content from HTML strings.

Why it matters

This feature allows users to cleanly extract readable text from HTML-formatted data stored in ClickHouse. It solves the problem of querying and analyzing text data embedded within HTML tags by providing a native function to strip out markup and return only the textual content.

How to use it

Use the extractTextFromHTML function in your queries by passing an HTML string as its argument, for example: SELECT extractTextFromHTML(html_column) FROM table. This will return the extracted plain text from the HTML input.