v.21.7Improvement

Bugfixes and Improvements for ClickHouse Copier: Schema Compatibility, TTL Support, Alias Handling, and Progress Tracking

Bugfixes and improvements of clickhouse-copier. Allow to copy tables with different (but compatible schemas). Closes #9159. Added test to copy ReplacingMergeTree. Closes #22711. Support TTL on columns and Data Skipping Indices. It simply removes it to create internal Distributed table (underlying table will have TTL and skipping indices). Closes #19384. Allow to copy MATERIALIZED and ALIAS columns. There are some cases in which it could be helpful (e.g. if this column is in PRIMARY KEY). Now it could be allowed by setting allow_to_copy_alias_and_materialized_columns property to true in task configuration. Closes #9177. Closes [#11007] (https://github.com/ClickHouse/ClickHouse/issues/11007). Closes #9514. Added a property allow_to_drop_target_partitions in task configuration to drop partition in original table before moving helping tables. Closes #20957. Get rid of OPTIMIZE DEDUPLICATE query. This hack was needed, because ALTER TABLE MOVE PARTITION was retried many times and plain MergeTree tables don't have deduplication. Closes #17966. Write progress to ZooKeeper node on path task_path + /status in JSON format. Closes #20955. Support for ReplicatedTables without arguments. Closes #24834 .#23518 (Nikita Mikhaylov).
clickhouse-copier has been enhanced to support copying tables with different but compatible schemas, including handling MATERIALIZED and ALIAS columns, TTL columns, and Data Skipping Indices. Additionally, new configuration properties improve partition management and progress tracking during copy operations.

Why it matters

This feature addresses limitations in clickhouse-copier by enabling it to copy tables even when schemas differ slightly but remain compatible. It simplifies copying of tables with TTL and skipping indices by removing them in the internal Distributed table. Supporting MATERIALIZED and ALIAS columns allows copying tables where such columns are part of keys or important expressions. The addition of configuration options such as allow_to_copy_alias_and_materialized_columns and allow_to_drop_target_partitions provides users greater control over the copy process. Progress tracking via ZooKeeper improves monitoring and reliability, while eliminating unnecessary queries like OPTIMIZE DEDUPLICATE enhances performance.

How to use it

To leverage these new capabilities, users can set the following parameters in their clickhouse-copier task configuration:

- Enable copying of MATERIALIZED and ALIAS columns by setting allow_to_copy_alias_and_materialized_columns to true.
- Allow dropping target partitions before moving partitions by setting allow_to_drop_target_partitions to true.

The copier will automatically handle tables with different compatible schemas, skip TTL and Data Skipping Indices when creating the internal Distributed table, and write progress status in JSON format to ZooKeeper under task_path + "/status".

These options provide flexible and safer copying of complex table structures without manual schema synchronization.