v.21.7Improvement
Bugfixes and Improvements for ClickHouse Copier: Schema Compatibility, TTL Support, Alias Handling, and Progress Tracking
Bugfixes and improvements ofclickhouse-copier. Allow to copy tables with different (but compatible schemas). Closes #9159. Added test to copy ReplacingMergeTree. Closes #22711. Support TTL on columns and Data Skipping Indices. It simply removes it to create internal Distributed table (underlying table will have TTL and skipping indices). Closes #19384. Allow to copy MATERIALIZED and ALIAS columns. There are some cases in which it could be helpful (e.g. if this column is in PRIMARY KEY). Now it could be allowed by settingallow_to_copy_alias_and_materialized_columnsproperty to true in task configuration. Closes #9177. Closes [#11007] (https://github.com/ClickHouse/ClickHouse/issues/11007). Closes #9514. Added a propertyallow_to_drop_target_partitionsin task configuration to drop partition in original table before moving helping tables. Closes #20957. Get rid ofOPTIMIZE DEDUPLICATEquery. This hack was needed, becauseALTER TABLE MOVE PARTITIONwas retried many times and plain MergeTree tables don't have deduplication. Closes #17966. Write progress to ZooKeeper node on pathtask_path + /statusin JSON format. Closes #20955. Support for ReplicatedTables without arguments. Closes #24834 .#23518 (Nikita Mikhaylov).
Why it matters
This feature addresses limitations inclickhouse-copier by enabling it to copy tables even when schemas differ slightly but remain compatible. It simplifies copying of tables with TTL and skipping indices by removing them in the internal Distributed table. Supporting MATERIALIZED and ALIAS columns allows copying tables where such columns are part of keys or important expressions. The addition of configuration options such as allow_to_copy_alias_and_materialized_columns and allow_to_drop_target_partitions provides users greater control over the copy process. Progress tracking via ZooKeeper improves monitoring and reliability, while eliminating unnecessary queries like OPTIMIZE DEDUPLICATE enhances performance.How to use it
To leverage these new capabilities, users can set the following parameters in theirclickhouse-copier task configuration:- Enable copying of MATERIALIZED and ALIAS columns by setting
allow_to_copy_alias_and_materialized_columns to true.- Allow dropping target partitions before moving partitions by setting
allow_to_drop_target_partitions to true.The copier will automatically handle tables with different compatible schemas, skip TTL and Data Skipping Indices when creating the internal Distributed table, and write progress status in JSON format to ZooKeeper under
task_path + "/status".These options provide flexible and safer copying of complex table structures without manual schema synchronization.