I look forward to seeing some benchmarks with .NET - Microsoft needs to support a pretty wide variety of platforms. It will be interesting to see if their implementation is better!
As of now, Vector<T> automatically targets AVX2, SSE4.2 and AdvSimd (NEON). Vector256<T> targets AVX2 (for the lack of ARM counterpart) and Vector128<T> targets SSE / AdvSimd respectively.