zlacker

[parent] [thread] 2 comments
1. gwern+(OP)[view] [source] 2020-05-13 22:34:13
One fun part - we used the inline metadata trick to train a single GPT-2-1.5b to do all the different subreddits. It allows mutual transfer learning and saves an enormous amount of space & complexity compared to training separate models, and it's easy to add in any new subreddits one might want (just define a new keyword prefix and train some more). Not sure that trick is meaningful for Markov chains at all!
replies(1): >>jesseh+fG
2. jesseh+fG[view] [source] 2020-05-14 04:17:53
>>gwern+(OP)
What is the inline metadata trick?
replies(1): >>gwern+lL2
◧◩
3. gwern+lL2[view] [source] [discussion] 2020-05-14 18:14:01
>>jesseh+fG
It's an old trick in generative models, I've been using it since 2015: https://www.gwern.net/RNN-metadata When you have categorical or other metadata, instead of trying to find some way to hardwire it into the NN by having a special one-hot vector or something, you simply inline it into the dataset itself, as a text prefix, and then let the model figure it out. If it's at all good, like a char-RNN, it'll learn what the metadata is and how to use it. So you get a very easy generic approach to encoding any metadata, which lets you extend it indefinitely without retraining from scratch (reusing models not trained with it in the first place, like OA's GPT-2-1.5b), while still controlling generation. Particularly with GPT-2, you see this used for (among others) Grover and CTRL, in addition to my own poetry/music/SubSim models.
[go to top]