v.20.5Performance Improvement

Add Runtime CPU Detection and Multi-Target Code Generation

Add runtime CPU detection to select and dispatch the best function implementation. Add support for codegeneration for multiple targets. This closes #1017. #10058 (DimasKovas).