v.21.11Performance Improvements

Remove Branchy Code in Filter Operation for Improved Performance

Remove branchy code in filter operation with a better implementation with popcnt/ctz which have better performance. #29881 (Jun Jin).