v.25.9Performance Improvement

Radix sort: help the compiler use SIMD

Radix sort: help the compiler use SIMD and do better prefetching. Uses dynamic dispatch to use software prefetching with Intel CPUs only. Continues the work by @taiyang-li in https://github.com/ClickHouse/ClickHouse/pull/77029. #86378 (Raúl Marín).