zlacker

I have thought of implementing something like you are describing using local LLMs. Chunk the text of all conversations, use an embeddings data store for search, and for each new conversation calculate an embedding for the new prompt, add context text from previous conversations. This would be maybe 100 lines of Python, if that. Really, a RAG application, storing as chunks previous conversations.