>>Josely+(OP)
Haha of course this news comes just after I wrote a parser for my ChatGPT dump and generate offline embeddings for it with Phi 2 to help generate conversation metadata.
>>singul+Yx
I'll share the core bit that took a while to figure out the right format, my main script is a hot mess using embeddings with SentenceTransformer, so I won't share that yet. E.g: last night I did a PR for llama-cpp-python that shows how Phi might be used with JSON only for the author to write almost exactly the same code at pretty much the same time. https://github.com/abetlen/llama-cpp-python/pull/1184
But you can see how that might work.
Here is the core parser code:
https://gist.github.com/lukestanley/eb1037478b1129a5ca0560ee...