I'd rather see new languages focus on making better explicit SIMD abstractions a la Intels ISPC, rather than writing yet another magic vectorizer that only actually works in trivial cases.
Then it's just a codegen problem.
But yes, ultimately, the user needs to be aware of how the language works, what is parallelizable and what isn't, and of the cost of the operations that they ask their computer to execute.
https://learn.microsoft.com/en-us/dotnet/api/system.runtime....
Examples of usage:
- https://github.com/U8String/U8String/blob/main/Sources/U8Str...
- https://github.com/nietras/1brc.cs/blob/main/src/Brc/BrcAccu...
- https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
(and many more if you search github for the uses of Vector128/256<byte> and the like!)