Unless of course we are talking about the 80 bit format.
If that's not the case, would be interested to know where they differ.
Unfortunately for the transcendental function the accuracy still hasn't been pinned down, especially since that's still an ongoing research problem.
There's been some great strides in figuring out the worst cases for binary floating point up to doubles so hopefully an upcoming standard will stipulate 0.5 ULP for transcendentals. But decimal floating point still has a long way to go.
Every 754 architecture (including SSE) I've worked on has an accurate sqrt().
I'm assuming you're talking about with "fast math" enabled? In which case all bets are off anyway!
Now, there is also often an approximate rsqrt and approximate reciprocal, with varying degrees of accuracy, and that can be "fun."
Or maybe the library you use...
FMAs were difficult. The Visual Studio compiler in particular didn't support purposeful FMAs for SSE instructions so you had to rely on the compiler to recognise and replace multiply-additions. Generally I want FMAs because they're more accurate but I want to control where they go.