Similar I think to what you're calling 'rlhf-ed', though I think useful for code, it definitely seems to kind of scratchpad itself, and stub out how it intends to solve a problem before filling in the implementation. Where this becomes really useful though is in asking for a small change it doesn't (it seems) recompute the whole thing, but just 'knows' to change one function from what it already has.
They also seem to have it somehow set up to 'test' itself and occasionally it just says 'error' and tries again. I don't really understand how that works.
Perplexity's great for finding information with citations, but (I've only used the free version) IME it's 'just' a better search engine (for difficult to find information, obviously it's slower), it suffers a lot more from the 'the information needs to be already written somewhere, it's not new knowledge' dismissal.