You need to attest at least the kernel, firmware, graphics/input drivers, window management system etc because otherwise actions you think are being taken by the user might be issued by malware. You have to know that the app's onPayClicked() event handler is running because the human owner genuinely clicked it (or an app they authorized to automate for them like an a11y app). To get that assurance requires the OS to enforce app communication and isolation via secure boundaries.
Imagine if this was done for desktop computers before we had smartphones. That's just crazy.
Relying on hardware-bound keys is fine, but then the scope of the hardware and software stack that needs to be locked down should be severely limited to dedicated, external hardware tokens. Having to lock down the whole OS and service stack is just bad design, plain and simple, since it prioritizes control over freedom.