In time O(n) you can use O(n) cells on a tape, but there are O(2^n) possible configurations of symbols on a tape of length n (for an alphabet with 2 symbols), so you can do so much more with n space than with n time.
Something we all intuitively expected but someone finally figured out an obscure way to prove it.
--------
This is a really roundabout article that takes a meandering path to a total bombshell in the field of complexity theory. Sorry for spoiling but uhhh, you'd expect an article about P == PSPACE would get to the point faster....
More precisely, I think it is intuitive that the class of problems that can be solved in any time given O(n) space is far larger than the class of problems that can be solved in any space given O(n) time.
If your program uses O(n) memory, it must run at least in O(n) time (lower bound on time).
If intermediate results are entirely uncorrelated, with no overlap in the work at all, that would not hold - space will not help you. Edit: This kind of problem is very rare. Think of a cache with 0 percent hit rate - almost never happens.
And you can't really do it the other way around (at least not in current computing terms / concepts): you cannot use a single unit of time as a standin / proxy for hundreds of cells, since we don't quite have infinitely-broad SIMD architectures.
In fact I don’t think we would need processors anymore if we were centrally storing all of the operations ever done in our processors.
Now fast retrieval is another problem for another thread.
I forget the name of the O(1) access model. Not UMA, something else.
If you expect N ones at the output, how can this machine be simulated in the space smaller than N?
This machine must decrement the digit N at the beginning of the tape, and move to the end of the tape to write "1", so it runs in time O(N^2), not O(N)? (as it takes N "trips" to the end of the tape, and each "trip" takes 1, 2, 3 .. N steps)
Since turing machines can not jump to any place on a tape in constant time (like computers can), does it have any impact on real computers?
(Quite aside from the practical "we build on the surface of the earth" consideration, heat dissipation considerations limit you to a 2 dimensional circuit in 3-space.)
But to answer your question: "space" here refers to working space, excluding the input and output.
Or if you're saying heat dissipation scales with surface area and is 2D, I don't know. Would think that water cooling makes it more about volume, but I'm not an expert on that.
And paper: https://people.csail.mit.edu/rrw/time-vs-space.pdf
(Even more aside to your practical heat dissipation constraint)
Community-scale caching? That's basically what pre-compiled software distributions are. And one idea for addressing the programming language design balk "that would be a nice feature, but it's not known how to compile it efficiently, so you can't have it", is highly-parallel cloud compilation, paired with a community-scale compiler cache. You might not mind if something takes say a day to resolve, if the community only needs it run once per release.
Using an LLM and caching eg FAQs can save a lot of token credits
AI is basically solving a search problem and the models are just approximations of the data - like linear regression or fourier transforms.
The training is basically your precalculation. The key is that it precalculates a model with billions of parameters, not overfitting with an exact random set of answers hehe
There are other similar lightweight encoding schemes like RLE and delta and frame of reference encoding which all are good for different data distributions.
Actually I don't have any intuition for why that's wrong, except that if we catenate the rows into one long row then the picture can be considered as a number 307200 digits long in base 256, and then I see that it could represent 256^307200 possible different values. Which is a lot: https://www.wolframalpha.com/input?i=256%5E307200
On my way to memoize your search history.
The number of possible pictures is indeed 256^307200, which is an unfathomably larger number than 78 million. (256 possible values for the first pixel * 256 possible values for the second pixel * 256 possi...).
It's basically how deduplication works in ZFS. And that's why it only makes sense when you store a lot of repetitive data, e.g. VM images.
What a strange, sad way to think about this. Academia is perverse.
Either way, the week is not yet over, at least since the quanta article, so maybe no kicking will ensue.
https://images.lsnglobal.com/ZFSJiK61WTql9okXV1N5XyGtCEc=/fi...
if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?
But as I said, slow.
“If you’re running out of memory, you can buy more. But if you’re running out of time, you’re screwed.”
"I think it is very intuitive that more space beats the pants off of more time." (poster is absolutely right) The The article say "Until now, the only known algorithms for accomplishing certain tasks required an amount of space roughly proportional to their runtime, and researchers had long assumed there’s no way to do better.", which is interpreted as that there's a proportional relation between time and space. However, a quick look at the complexity hierarchy would never suggest such a thing. Reading more carefully, it says "known algorithms" for "certain tasks", but then where do you get a general intuition from such a particular statement?
EDIT: This makes sense because if you look at all problems with N outputs then that is just the same as "gluing together" N different decision problems (+ some epsilon of overhead)
Do LLM providers use caches for FAQs, without changing the number of tokens billed to customer?
I did a lot of raster graphics programming, in my career, and graphics work makes heavy use of lookup tables.
Yesterday, I posted a rather simple tool I wrote[0]: a server that “frontloads” a set of polygons into a database, and then uses them, at query time. It’s fairly fast (but I’m sure it could be a lot faster). I wrote it in a few hours, and got pretty good performance, right out of the starting gate.
Pretty basic stuff. I doubt the pattern is unique, but it’s one that I’ve used for ages. It’s where I try to do as much “upfront” work as possible, and store “half-baked” results into memory.
Like I said, I always “just worked that way,” and never really thought about it. There’s been a lot of “rule of thumb” stuff in my work. Didn’t have an MIT professor to teach it, but it’s sort of gratifying to see that it wasn’t just “old wives” stuff.
There’s probably a ton of stuff that we do, every day, that we don’t think about. Some of it almost certainly originated from really smart folks like him, finding the best way (like the “winding number” algorithm, in that server[1]), and some of it also probably comes from “grug-brained programmers,” simply doing what makes sense.
[0] >>44046227
[1] https://github.com/LittleGreenViper/LGV_TZ_Lookup/blob/e247f...
More at eleven.
Ok, but space is cheap, and we usually want to trade processing time for space.
I.e., the opposite.
He's not trying to please programmers.
> "if there were only 78 million possible pictures, how could that portrait be so recongizably one specific person? wouldnt that mean that your entire picture space wouldnt even be able to fit a single portrait of everyone in Germany?"
It's not intuitive that "a 640x480 computer picture must be able to fit a single portrait of everyone in Germany"; A human couldn't check it, a human couldn't remember 78 million distinct pictures, look through them, and see that they all look sufficiently distinct and at no point is it representing 50k people with one picture; human attention and memory isn't enough for that.
What I mean is that traditionally I think peoples' ideas of lookup tables are things like statically baked arrays setup at compile time, or even first thing at runtime, and they never change. But if you loosen your adherence to that last idea a bit, where a lookup table can change slightly over time, you can get a ton of mileage out of a comparatively small amount of memory compared to wasting cycles every frame.
As for the right tool for the job, I've read tons of dev logs and research papers over the years about moving work to the GPU, but this last few months stint of ripping my game inside out has really made me see the light. It's not just lookup tables built at compile or early runtime, but lookup tables modified slightly over time, and sent to the GPU as textures and used there.
Follow this train of thought long enough, and now we're just calling memory writes and reads "lookup tables" when they aren't really that anymore, but whos to say where the barrier really lies?
Sure they are. You could generate every possible input, compute hash & compare with a given one.
Ok it might take infinite amount of compute (time/energy). But that's just a technicality, right?
Depends entirely on what you mean by reversible. For every hash value, there are an infinite number of inputs that give that value. So while it is certainly possible to find some input that hashes to a given value, you cannot know which input I originally hashed to get that that value.
This "reverse" relationship works with other parameters. Such as in sorting algorithms, where besides time and space requirements you have stability requirement. If you constrain the sorting algorithms by stability, you won't have any advantage (will likely to have some disadvantage) in performance over non constrained algorithms. And wee see that the known stable sorting algorithms are either slower or require more memory. Nobody has yet invented a stable sorting algorithm that has comparable performance (both time and space) to non stable sorting algorithms. That's it.
> If your program runs in O(n) time, it cannot use more than O(n) memory (upper bound on memory usage.[sic]
This is clearly refuted by all software running today. Programs (especially games) clearly use more memory than there are instructions in the program.
> If your program uses O(n) memory, it must run at least in O(n) time (lower bound on time).
Memory bombs use an incredible amount of memory and do it incredibly quickly.
Makes sense. Say you have a pattern (surrounded by empty space) that 'flickers': A-B-A-B-A... etc. Then as long as nothing intrudes, nth generation is the same pattern as in n+1000,000th generation. Similar for patterns that do a 3-cycle, 4-cycle etc.
All you'd need is a) a way to detect repeating patterns, and b) do some kind of collision detection between areas/patterns (there's a thing called 'lightspeed' in Life, that helps).
For example, each block of pixels might have some calculated characteristics that were accessed by a hash into a LUT, but the characteristics would change, as we went through the image.
We'd do a "triage" run, where we'd build the LUT, then a "detailed" run, where we'd apply the LUT to the pixels.
It could get pretty hairy.
The CPU is like an engine and memory is your gas tank. Idling the engine is bad, but leaving gas in the tank doesn't hurt, but it doesn't help either. I'm not gonna get to my destination faster because I have a full tank.
Expensive calculation, cheap storage → caching results helps.
Limited bandwidth / 'expensive' storage, simple calculation (see: today's hyper-fast CPU+L1 cache combo's) → better to re-compute some things on the fly as needed.
I suspect there's a lot of existing software (components) out there designed along the "save CPU cycles, burn storage" path, where in modern reality a "save storage, CPU cycles are cheap" would be more effective. CPU speeds have grown way way faster than main memory bandwidth (or even size?) over the last decades.
For a datacenter, supercomputer, embedded system, PC or some end-user's phone, the metrics will be different. But same principle applies.
But the speed of light is the maximum space in the smallest time, which computationally would correspond to filling the largest amount of memory in the shortest time :facepalm: (and thanks for the links!)
I imagine if you have a good idea of the data incoming you could probably do a similar encoding scheme where you use 7 bits to point to a ~512 bit blob and the 8th bit means the next 512 couldn't be compressed.
I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)
Alternately, have you considered 8 byte blocks?
If your block pointers are 8-byte addresses, you don't need to count on block sparsity, in fact, you don't even need to have the actual blocks.
A pointer type, that implements self-read and writes, with null allocations and deletes, is easy to implement incredibly efficiently in any decent type system. A true zero-cost abstraction, if I have ever seen one!
(On a more serious note, a memory heap and CPU that cooperated to interpret pointers with the top bit set, as a 63-bit linear-access/write self-storage "pointer", is an interesting thought.
Give operating system development organizations a lot more memory and they will find a way to worsen the response time.
What I really want to know is about caching the large prefixes for prompts. Do they let you manage this somehow? What about llama and deepseek?
All this accomplishes is discrediting science. Trading personal gains for eroding the very thing that they make their money off of. This is a major part of why Americans (and people) have such high distrust for science. News outlets, and in particular science focused news outlets, constantly spew inaccurate information. It really should be no wonder that so many people are confused about so many scientific topics, as unless they actually take the years it takes to become an expert in a field, they are going to have a difficult time distinguishing fact from fiction. And why shouldn't the average person expect to trust a source like Quanta? They're "full of experts", right? smh
[0] This is the earliest archive I see with the note. Press back one day and it should not be there. Article was published on Nov 30 2022, along with a youtube video https://web.archive.org/web/20230329191417/https://www.quant...
I don't think Quanta should be afraid of showing math to people. That's really their whole purpose. Even if I think they've made some egregious mistakes that make them untrustable...[2]
[0] https://en.wikipedia.org/wiki/PSPACE#/media/File:Complexity_...
[1] https://www.quantamagazine.org/june-huh-high-school-dropout-...
[2] >>44067043
Edit: Aaronson even mentions the n^100 problem in the section about P!
If you take the example of a game, drawing sprites say. Drawing a single preloaded sprite of reasonable size is always cheap, so a slow frame must have an excessive number. It's very hard to construct a practical scenario of a large number of truly distinct sprites though. A level has a finite tile palette, a finite cast of characters, abilities, etc. It's hard to logistically get them all into a scene together, and even then it won't be that many. So the only scenario left where sprite drawing will be slow is drawing the same handful of sprites over and over again. By contrast that's super common: just spam a persistent projectile, tap the analog stick to generate dust particles, etc.
The point is that a single line[0] and a minimal graphic could substantially improve the reader's comprehension while simultaneously providing them the necessary nomenclature to find relevant material to further increase their understanding.
Look at this line:
| One of the most important classes goes by the humble name “P.”
It tells us almost nothing, except of its importance. Only to be followed by | Roughly speaking, it encompasses all problems that can be solved in a reasonable amount of time. An analogous complexity class for space is dubbed “PSPACE.”
This tells us nothing... My first thought would by "why not PTIME and PSPACE" if I didn't already know what was going on.The whole work is about bridging these two concepts! How can we understand that if we don't know what we're building a bridge between? It's like reporting on a bridge being built connecting England and France but just calling it a bridge. Is it important? Sounds like it by the way they talk, but how can you even know the impact of such a thing when not given such critical context? You get tremendous amounts of additional context with the addition of so few words.
When using MD5 (128bit) then when AWS S3 would apply this technique, it would only get a handful of collisions. Using 256bit would drive that down to a level where any collision is very unlikely.
This would be worth it if a 4kb block is, on average, duplicated with a chance of at least 6.25%. (not considering overhead of data-structures etc.)
How can you access a piece of memory without issuing an instruction to the CPU? Also, "clearly" is not an argument.
>Memory bombs use an incredible amount of memory and do it incredibly quickly.
How can you access a piece of memory without issuing an instruction to the CPU? Also "incredibly quickly" is not an argument. Also also, O(n) is incredibly quick.
As in your assertion is literally self-evidently false. It is on you to provide a burden of proof here; especially since there are instructions that can load more than a single bit of memory.
> How can you access a piece of memory without issuing an instruction to the CPU?
Let me rather ask you this: where do the instructions exist that are running? That is right: in memory. However, just because instructions exist in memory doesn’t mean they’re accessed. There is not a relationship between the number of instructions and the amount of memory accessed/used.
My article was quite explicit in multiple places that the universal/comprehensive character of the result was that counterintuitive part:
- In the first paragraph: "memory was more powerful than computer scientists believed: A small amount would be as helpful as a lot of time in all conceivable computations."
- Further down in the introduction, in the passage you quoted: "Until now, the only known algorithms for accomplishing certain tasks required an amount of space roughly proportional to their runtime, and researchers had long assumed there’s no way to do better. Williams’ proof established a mathematical procedure for transforming any algorithm — no matter what it does — into a form that uses much less space.
- In the third section, I explicitly state that researchers do believe space is more powerful than time in the specific sense that you're criticizing my article for misrepresenting: "But complexity theorists suspect that PSPACE is a much larger class, containing many problems that aren’t in P. In other words, they believe that space is a far more powerful computational resource than time. This belief stems from the fact that algorithms can use the same small chunk of memory over and over, while time isn’t as forgiving — once it passes, you can’t get it back."
- In the fourth section, I explain why researchers didn't think the HPV75 result could be improved further, despite their intuition that space is more powerful than time in the above sense: "While many problems can be solved with much less space than time, some intuitively seemed like they’d need nearly as much space as time."
TCS (and complexity theory specifically) are complicated subjects. I spend a lot of time interviewing researchers and thinking about how to distill the results of my reporting into a form that is accessible to readers with widely varying levels of familiarity with the subject matter. You are of course well within your rights to critique my stylistic choices, the narrative aspects of the story, and the order in which I presented information, but I will push back against the claim that my article is spreading misinformation about complexity theory. You're referring to a misconception that arises, by your own admission, when you don't read carefully. If it's the headline you object to, you could lodge a similar complaint against the complexity theorist Lance Fortnow: https://blog.computationalcomplexity.org/2025/02/you-need-mu....
That is not what it means. Again, if you are not familiar with the notation then all you are doing is slapping your personal ideas about computing to some symbols
I’m fond of derangements and their relationship with permutations, which contain a factor of e.
I can imagine it's tough to put that much effort into communicating something complex to a wide audience, only to have a bunch of very smart people here attempt to tear it apart.
Here on HN people often complain about the level of detail, which is fair! I think they are often falling prey to a common fallacy about conditional probabilities. P("X reads Quanta"|"X has some technical background (college STEM major or more)") is likely larger than it is for most popular science magazine. But P("Y has some technical background"|"Y reads Quanta") is much lower than many people realize. There is a limit on how much technical stuff I can put in an article and still have it be accessible to many of our readers, and I care a lot about making things accessible.