v.20.3New Feature

Allow Specifying --limit Exceeding Source Data Size in ClickHouse Obfuscator

Allow to specify --limit more than the source data size in clickhouse-obfuscator. The data will repeat itself with different random seed. #9155 (alexey-milovidov)
Enhanced clickhouse-obfuscator to allow specifying --limit values greater than the source data size, enabling data repetition with varied random seeds.

Why it matters

This feature solves the problem of limited output data size when obfuscating data with clickhouse-obfuscator. Users can now generate larger datasets than the original source by repeating the source data with different random seeds, which is valuable for testing, benchmarking, or anonymization scenarios requiring more data volume.

How to use it

When running clickhouse-obfuscator, specify the --limit parameter with a value larger than the available source data size. The tool will automatically repeat the source data while applying different random seeds to ensure varied obfuscated output. Example:

clickhouse-obfuscator --limit=1000000 <other_parameters>