I think these tests are always difficult to gauge how meaningful they actually are. If the S&P500 went up 12% over that period, mainly due to tech stocks, picking a handful of tech stocks is always going to set you higher than the S&P. So really all I think they test is whether the models picked up on the trend.
I more surprised that Gemini managed to lose 10%. I wish they actually mentioned what the models invested in and why.