>>dr_dsh+(OP)
Bigger model = better because a lot of performance at this task is memorization or the “lottery ticket hypothesis”.
An impressive advance would be a small model that’s capable of working from an external memory rather than memorizing it.