This is in practice not true at all. Vertical scaling is typically a sublinear cost increase (up to a point, but that point is a ridiculous beast of a machine), since you're (typically) upgrading just the CPU and/or just the RAM or just the storage; not all of them at once.
There are instances where you can get nearly 10x the machine for 2x the cost.
(and fwiw, wikipedia agrees with this definition: https://en.wikipedia.org/wiki/Scalability#Horizontal_(scale_... )
> Vertical scaling — a bigger, exponentially more expensive server
> Horizontal scaling — distribute the load over more servers
The fact that AMD chooses to package the "nodes" together on one die vs multiple doesn't change that.
> typically involving the addition of CPUs, memory or storage to a single computer.
Distinction between horizontal and vertical scaling becomes nonsense if we accept your definitions, because literally nobody does that sort of vertical scaling.
* Replace the CPU with a faster one, but with the same number of cores. Or simply run the same one at a higher clock rate.
* Add memory, or use faster memory.
* Add storage, or use faster storage.
These are all forms of vertical scaling because they reduce the time it takes to process a single transaction, either by reducing waits or by increasing computation speed.
> It's also contradicted by both the article we're discussing and the wikipedia article
The article agrees with this definition. Transaction latency decreases iff vertical scale increases. Transaction throughput increases with either form of scaling. Without this interpretation, the analogy to conveyor belts makes no sense.
Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. In Factorio you need 3 advanced circuits for the chemical science pack. If your science lab can produce 1 science pack every 24 seconds but your pipeline takes 16 seconds to produce one advanced circuit, your whole pipeline is going to have a latency of 48 seconds from start to finish due to being bottlenecked by the advanced circuit pipeline. Doubling the amount of processing units in each step of the circuit pipeline will double your throughput and bring your latency down to 24 seconds, as it should be. And if you have room for those extra processing units, you can do that without adding more belts.
The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.
This isn't latency in the same sense I was using the word. This is reciprocal throughput. Latency, as I was using the word, is the time it takes for an object to completely pass through a system; more generally, it's the delay between a cause and its effect/s. For example, you could measure how long it takes for an iron ore to be integrated into a final product at the end of the pipeline. This measure could be relevant in certain circumstances. If you needed to control throughput by restricting inputs, the latency would tell you how much lag there is between the time when you throttle the input supply and the time when the output rate starts to decrease.
>The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.
Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level. You take a unit of work that's both atomic (it's either complete or incomplete) and serial (a thread can't do anything else until it's completed it), and take a timestamp when it's begun processing and another when it's finished. The difference between the two is the latency of the system.
But it is. If you add more factories for producing advanced chips, you can produce a chemical science pack from start to finish in 24 seconds (assuming producing an advanced circuit takes 16 seconds). Otherwise it takes 48 seconds, because you’re waiting sequentially for 3 advanced circuits to be completed. It doesn’t matter that the latency of producing an advanced circuit didn’t decrease. The relevant metric is the latency to produce a chemical science pack, which _did_ decrease, by fanning out the production of a sub-component.
Edit: actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.
> Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level.
That’s what I’m saying about Factorio, though. You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.