This requires GLIBC 2.23+, plus either gcc 6+ or clang 14+. - Provides build-time feature detection - Use with (un)premultiply for ~10% perf gain on AVX CPUs - Slightly increases binary size, so best to use sparingly