So not only would you have to submit a insanely large amount of code but you're also racing against literally millions of users writing legitimate code at any period of time.
So how about an already poisoned well. How up to date is the average Github project on encryption standards?
Really the thing is there's not way to ascribe correctness to a piece of code right, like humans fail at this even. The only "correct" code is like rote algorithmic code that has a well defined method of operation. And there's likely a lot more correct examples of that, like way more than you'd ever be able to poison.
You may be able to be misleading though by using names that say one thing but do another, but again you'd be fighting against the tide of correctly named things.
When I was playing around a couple years ago with the Fastai courses in language modeling I used the Python tokenize module to feed my model, and with excellent parser libraries like Lark[0] out there it wouldn't take that long to build real quality parsers.
Of course I could be totally wrong and they might just be dumping pure text in, shutter.