I think that illustrates it will be a be a big uphill battle for any new entrant no matter how well funded or resourced.
Wrong. Claude 2 beats GPT-4 is some benchmarks (e.g. HumanEval Python coding; math; analytical writing.). It's close enough. It doesn't matter who holds the crown this week, Anthropic definitely has ingredients to make GPT-4-class model.
This is like comparing similar cars from BMW and Toyota, finding few specific parameters where BMW has a higher score and saying "You see? Toyota engineering is nowhere close".
This actually shows Sam Altman's true contribution: the free version of ChatGPT is undeniably worse than Bing Chat, and yet ChatGPT is a bigger brand.
(And it might be a deliberate choice to save money for Claude 3 instead instead of making Claude 2 absolutely SotA.)
@dang — any plans to do anything here
I mean not like you have to but yeah I can think of some stuff that could make this better probably (or at minimum experiments that could be run)
Also not on this post but in general I mean
@dang — any plans to do anything here
I mean not like you have to but yeah I can think of some stuff that could make this better probably
I mean not on this post in particular but as an HN issue if we agree it’s kind of degrading the experience and there are indeed likely fixes