In fact I don’t think we would need processors anymore if we were centrally storing all of the operations ever done in our processors.
Now fast retrieval is another problem for another thread.
Community-scale caching? That's basically what pre-compiled software distributions are. And one idea for addressing the programming language design balk "that would be a nice feature, but it's not known how to compile it efficiently, so you can't have it", is highly-parallel cloud compilation, paired with a community-scale compiler cache. You might not mind if something takes say a day to resolve, if the community only needs it run once per release.
Using an LLM and caching eg FAQs can save a lot of token credits
AI is basically solving a search problem and the models are just approximations of the data - like linear regression or fourier transforms.
The training is basically your precalculation. The key is that it precalculates a model with billions of parameters, not overfitting with an exact random set of answers hehe
There are other similar lightweight encoding schemes like RLE and delta and frame of reference encoding which all are good for different data distributions.
Actually I don't have any intuition for why that's wrong, except that if we catenate the rows into one long row then the picture can be considered as a number 307200 digits long in base 256, and then I see that it could represent 256^307200 possible different values. Which is a lot: https://www.wolframalpha.com/input?i=256%5E307200
On my way to memoize your search history.
The number of possible pictures is indeed 256^307200, which is an unfathomably larger number than 78 million. (256 possible values for the first pixel * 256 possible values for the second pixel * 256 possi...).
It's basically how deduplication works in ZFS. And that's why it only makes sense when you store a lot of repetitive data, e.g. VM images.
https://images.lsnglobal.com/ZFSJiK61WTql9okXV1N5XyGtCEc=/fi...
if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?
But as I said, slow.
Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?
> "if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?"
It's not intuitive that "a 640x480 computer picture must be able to fit a single portrait of everyone in Germany"; A human couldn't check it, a human couldn't remember 78 million distinct pictures, look through them, and see that they all look sufficiently distinct and at no point is it representing 50k people with one picture; human attention and memory isn't enough for that.
Sure they are. You could generate every possible input, compute hash & compare with a given one.
Ok it might take infinite amount of compute (time/energy). But that's just a technicality, right?
Depends entirely on what you mean by reversible. For every hash value, there are an infinite number of inputs that give that value. So while it is certainly possible to find some input that hashes to a given value, you cannot know which input I originally hashed to get that that value.
Makes sense. Say you have a pattern (surrounded by empty space) that 'flickers': A-B-A-B-A... etc. Then as long as nothing intrudes, nth generation is the same pattern as in n+1000,000th generation. Similar for patterns that do a 3-cycle, 4-cycle etc.
All you'd need is a) a way to detect repeating patterns, and b) do some kind of collision detection between areas/patterns (there's a thing called 'lightspeed' in Life, that helps).
I imagine if you have a good idea of the data incoming you could probably do a similar encoding scheme where you use 7 bits to point to a ~512 bit blob and the 8th bit means the next 512 couldn't be compressed.
I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)
Alternately, have you considered 8 byte blocks?
If your block pointers are 8-byte addresses, you don't need to count on block sparsity, in fact, you don't even need to have the actual blocks.
A pointer type, that implements self-read and writes, with null allocations and deletes, is easy to implement incredibly efficiently in any decent type system. A true zero-cost abstraction, if I have ever seen one!
(On a more serious note, a memory heap and CPU that cooperated to interpret pointers with the top bit set, as a 63-bit linear-access/write self-storage "pointer", is an interesting thought.
What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?
When using MD5 (128bit) then when AWS S3 would apply this technique, it would only get a handful of collisions. Using 256bit would drive that down to a level where any collision is very unlikely.
This would be worth it if a 4kb block is, on average, duplicated with a chance of at least 6.25%. (not considering overhead of data-structures etc.)
I’m fond of derangements and their relationship with permutations, which contain a factor of e.