Understanding Kafka with Factorio (2019)

>>pul+(OP)
> Vertical scaling — a bigger, exponentially more expensive server

This is in practice not true at all. Vertical scaling is typically a sublinear cost increase (up to a point, but that point is a ridiculous beast of a machine), since you're (typically) upgrading just the CPU and/or just the RAM or just the storage; not all of them at once.

There are instances where you can get nearly 10x the machine for 2x the cost.

>>margin+lu
For small consumer products sure, but we're talking at the extreme end of performance and physical capabilities. Sure you can get a 2Ghz CPU for ~2x the price of a 200Mhz CPU, but how much are you going to pay for a 6.0Ghz CPU vs 5.0Ghz? 6.1Ghz vs 6.0Ghz?

>>teawre+gC
You can go from a 8T/16C Epyc 7xxx series CPU to a 32T/64C CPU and not even double the cost.

>>margin+CM
That's more like horizontal scaling, though. You get more throughput (transactions per second) but not lower latency (seconds per transaction). Though it may be more cost-effective to have a single 32-core machine than two 16-core machines.

>>fluori+uR
I disagree with this definition of horizontal scaling. If you're moving to a bigger computer rather than more computers, then you're scaling vertically and not horizontally.

(and fwiw, wikipedia agrees with this definition: https://en.wikipedia.org/wiki/Scalability#Horizontal_(scale_... )

>>margin+MV
Then it sounds like you have a disagreement of terminology with FTA, since the article is using the terms like I am. Vertical scaling means increasing the serial performance of the system, and horizontal scaling means increasing the parallel performance of the system. In this sense, vertical scaling past a certain point does indeed get exponentially more expensive, while horizontal scaling almost always scales linearly in cost, or better.

>>fluori+oZ
What I'm commenting on is this phrasing from the article

> Vertical scaling — a bigger, exponentially more expensive server

> Horizontal scaling — distribute the load over more servers

>>margin+K81
Ok, I see where the lay person would get confused on this. In the context of this article, every core is what Wikipedia calls a "node". There is no difference between a single 32C CPU and 4x 8C CPUs except for their ability to share memory faster. Both are similarly defined as horizontal scaling in the context of this article. You're not going to finish a single workload any faster, but you're going to increase the throughput of finishing multiple workloads in parallel.

The fact that AMD chooses to package the "nodes" together on one die vs multiple doesn't change that.

>>teawre+wo1
The wikipedia article qualifies what it means with vertical scaling

> typically involving the addition of CPUs, memory or storage to a single computer.

>>margin+Yq1
This is one of those times when I feel like you just didn't read anything I typed. So... I'm just gonna let you be confidently incorrect.

>>teawre+dL1
I'm reading what you're typing, but I just don't agree with it. It's also contradicted by both the article we're discussing and the wikipedia article; further it's an interpretation of vertical scaling that effectively doesn't

Distinction between horizontal and vertical scaling becomes nonsense if we accept your definitions, because literally nobody does that sort of vertical scaling.

>>margin+gj3
Wrong. If you do any of these you're scaling vertically, even by that definition:

* Replace the CPU with a faster one, but with the same number of cores. Or simply run the same one at a higher clock rate.

* Add memory, or use faster memory.

* Add storage, or use faster storage.

These are all forms of vertical scaling because they reduce the time it takes to process a single transaction, either by reducing waits or by increasing computation speed.

> It's also contradicted by both the article we're discussing and the wikipedia article

The article agrees with this definition. Transaction latency decreases iff vertical scale increases. Transaction throughput increases with either form of scaling. Without this interpretation, the analogy to conveyor belts makes no sense.

>>fluori+JM3
Think of it this way, instead. Building a multi-belt system is a pain in the ass that complicates the design of your factory. Conveyor belt highways, multiplexers, tunnels, and a bunch of stuff related to the physical routing of your belts suddenly becomes relevant. But you can still increase throughput keeping a single belt, if your bottleneck is not belt speed but processing speed (in the industrial sense). I can have several factories sharing the same belt, which increases throughput but not latency.

Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. In Factorio you need 3 advanced circuits for the chemical science pack. If your science lab can produce 1 science pack every 24 seconds but your pipeline takes 16 seconds to produce one advanced circuit, your whole pipeline is going to have a latency of 48 seconds from start to finish due to being bottlenecked by the advanced circuit pipeline. Doubling the amount of processing units in each step of the circuit pipeline will double your throughput and bring your latency down to 24 seconds, as it should be. And if you have room for those extra processing units, you can do that without adding more belts.

The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

>>surema+yh4
>Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. [...]

This isn't latency in the same sense I was using the word. This is reciprocal throughput. Latency, as I was using the word, is the time it takes for an object to completely pass through a system; more generally, it's the delay between a cause and its effect/s. For example, you could measure how long it takes for an iron ore to be integrated into a final product at the end of the pipeline. This measure could be relevant in certain circumstances. If you needed to control throughput by restricting inputs, the latency would tell you how much lag there is between the time when you throttle the input supply and the time when the output rate starts to decrease.

>The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level. You take a unit of work that's both atomic (it's either complete or incomplete) and serial (a thread can't do anything else until it's completed it), and take a timestamp when it's begun processing and another when it's finished. The difference between the two is the latency of the system.

>>fluori+8E4
> This isn't latency in the same sense I was using the word.

But it is. If you add more factories for producing advanced chips, you can produce a chemical science pack from start to finish in 24 seconds (assuming producing an advanced circuit takes 16 seconds). Otherwise it takes 48 seconds, because you’re waiting sequentially for 3 advanced circuits to be completed. It doesn’t matter that the latency of producing an advanced circuit didn’t decrease. The relevant metric is the latency to produce a chemical science pack, which _did_ decrease, by fanning out the production of a sub-component.

Edit: actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

> Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level.

That’s what I’m saying about Factorio, though. You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

zlacker