zlacker

> Vertical scaling — a bigger, exponentially more expensive server

This is in practice not true at all. Vertical scaling is typically a sublinear cost increase (up to a point, but that point is a ridiculous beast of a machine), since you're (typically) upgrading just the CPU and/or just the RAM or just the storage; not all of them at once.

There are instances where you can get nearly 10x the machine for 2x the cost.

replies(5): >>geodel+r2 >>moreli+07 >>teawre+V7 >>dekhn+zc >>KRAKRI+GJ

>>margin+(OP)
The idea is don't let the logic come in the way of promoting "web scale" software.

>>margin+(OP)
The kind of server you'd run Kafka on tends to already be pretty far up the curve. I don't think I can get 10x our default broker for 20x the cost. Maybe 100x the cost. (I could probably get 2x it for 2x the cost but once you value HA the practical inflection point starts below the actual cost intersection.)

>>margin+(OP)
For small consumer products sure, but we're talking at the extreme end of performance and physical capabilities. Sure you can get a 2Ghz CPU for ~2x the price of a 200Mhz CPU, but how much are you going to pay for a 6.0Ghz CPU vs 5.0Ghz? 6.1Ghz vs 6.0Ghz?

replies(2): >>Sohcah+Qb >>margin+hi

>>teawre+V7
Think cores instead of clock speeds.

In the case of cloud instances, doubling cores is frequently less than 100% more expensive.

replies(2): >>The_Co+xd >>moreli+Pe

>>margin+(OP)
Disagree- typically vertical scaling is lumpy, and even worse- CPU and RAM upgrades are typically not linear, because you're limited by the number of slots/sockets and the manufacturers intentionally charge higher (expoentially) prices for the largest RAM and fastest CPUs.

replies(3): >>moreli+lg >>vegabo+Bn >>defend+j11

>>Sohcah+Qb
Increasing core count is not really vertical scaling. It's a hybrid between vertical and horizontal scaling, having some characteristics of both. It also tops out quite early (especially its cost-effectiveness for many use cases, but there's an absolute upper limit as well).

>>Sohcah+Qb
https://aws.amazon.com/msk/pricing/ prices scale linearly with CPU beginning with m5.large, and I wouldn't really want to run a production Kafka on anything less than m5.xlarge. (They do at least keep linearly scaling all the way up.) Speculating wildly, I could probably have run some of our real clusters on the equivalent of a 8xlarge, but of course 32 core systems were not widely available at that time. The cluster I run today, even a hypothetical 48xlarge would struggle.

YMMV for non-managed stuff, but really, you can only bump cores like 3 times realistically, 4 if you started really shitty, before you start getting into special pricing brackets.

>>dekhn+zc
Kafka is also a system that can make pretty good general use of more CPUs and more storage, but doesn't have much need for RAM. Tying the CPU and RAM together whether by CPU model or cloud vendor offerings is annoying if you're trying to scale only vertically.

replies(1): >>defend+101

>>teawre+V7
You can go from a 8T/16C Epyc 7xxx series CPU to a 32T/64C CPU and not even double the cost.

replies(2): >>fluori+9n >>teawre+0S

>>margin+hi
That's more like horizontal scaling, though. You get more throughput (transactions per second) but not lower latency (seconds per transaction). Though it may be more cost-effective to have a single 32-core machine than two 16-core machines.

replies(1): >>margin+rr

>>dekhn+zc
If they charge these big numbers more it's precisely because they're trying to capture some of the embarrassingly better value you get from vertical scaling. It's a testament to vertical scaling's effectiveness that they _can_ do so.

replies(1): >>foota+XI

>>fluori+9n
I disagree with this definition of horizontal scaling. If you're moving to a bigger computer rather than more computers, then you're scaling vertically and not horizontally.

(and fwiw, wikipedia agrees with this definition: https://en.wikipedia.org/wiki/Scalability#Horizontal_(scale_... )

replies(1): >>fluori+3v

>>margin+rr
Then it sounds like you have a disagreement of terminology with FTA, since the article is using the terms like I am. Vertical scaling means increasing the serial performance of the system, and horizontal scaling means increasing the parallel performance of the system. In this sense, vertical scaling past a certain point does indeed get exponentially more expensive, while horizontal scaling almost always scales linearly in cost, or better.

replies(2): >>margin+pE >>dekhn+vH

>>fluori+3v
What I'm commenting on is this phrasing from the article

> Vertical scaling — a bigger, exponentially more expensive server

> Horizontal scaling — distribute the load over more servers

replies(1): >>teawre+bU

>>fluori+3v
The terms are used loosely and it doesn't make a lot of sense to argue about the definitions.

I think it's true to say that vertical scaling normally is done by increasing the RAM and CPU of a single machine with a single address space and switch/bus. While horizontal scaling is normally adding more machines (additional addresses spaces and switch/bus). Historically this is because RAM to CPU performance (throughput and latency) in a single address space and bus greatly exceeds the performance of any NIC connecting machines with distinct address spaces and busses. And it mostly ignores effects like the performance costs of swapping/paging when you don't have enough RAM.

I haven't really seen many systems where horizontal scaling is truly linear, unless the problem is embarassingly parallel, like serving static content.

replies(1): >>fluori+u41

>>vegabo+Bn
Sure, but by doing so they consume the effectiveness?

replies(1): >>dekhn+8K

>>margin+(OP)
Also beyond a certain point, it makes sense to go straight to dedicated bare metal. The AWS tax is not worth paying if your workload is mostly fixed, somewhat fault tolerant (i.e. failed hardware on the weekends can be replaced on Monday without major interruption to business operations), and CPU bound. Get a high end machine on Hetzner and put everything behind a VPN or API auth and you will save more than 50% in spending.

replies(1): >>Rhodes+M41

>>foota+XI
No, because you pay a fixed cost to get higher performance and then benefit through the whole lifetime of the product (I'm assuming you are purchasing rationally and keep your machines loaded at 75% or better, and your software is not egregiously wasteful).

>>margin+hi
The article defines vertical scaling as using faster conveyer belts (serial performance) and horizontal scaling as using more conveyer belts (parallel performance).

So your example of adding more CPU cores would be horizontal scaling, while using a faster core would be vertical. Vertical scaling has diminishing returns.

>>margin+pE
Ok, I see where the lay person would get confused on this. In the context of this article, every core is what Wikipedia calls a "node". There is no difference between a single 32C CPU and 4x 8C CPUs except for their ability to share memory faster. Both are similarly defined as horizontal scaling in the context of this article. You're not going to finish a single workload any faster, but you're going to increase the throughput of finishing multiple workloads in parallel.

The fact that AMD chooses to package the "nodes" together on one die vs multiple doesn't change that.

replies(2): >>margin+DW >>surema+eo1

>>teawre+bU
The wikipedia article qualifies what it means with vertical scaling

> typically involving the addition of CPUs, memory or storage to a single computer.

replies(1): >>teawre+Sg1

>>moreli+lg
Kafka can keep a decent bit of data in RAM using file system pages. Often times you end up wasting CPUs on kafka nodes, not memory i think.

https://docs.confluent.io/platform/current/kafka/deployment....

replies(1): >>moreli+A71

>>dekhn+zc
With clouds this is not true anymore. They are exactly linear. If you ask for a smaller node they are simply propositioning a chunk of a larger machine anyway.

There is a point where the exponential pricing starts, but that point is way out there than most people expect. Probably ~100CPU, ~1TB RAM, >50Gbps network etc.

replies(1): >>dekhn+pn1

>>dekhn+vH
Note that I was referring to scaling of cost, not of performance. If your application parallelizes ideally, then in the worst case your cost will scale linearly, because you just add more machines and increase your power consumption by new_machine_count/previous_machine_count. It's possible adding more processors in the same address space increases the cost by an amount below new_core_count/previous_core_count, in which case the cost scales better than linearly.

>>KRAKRI+GJ
I haven't found this to be true generally unless your workloads are truly completely static, which I've never actually experienced.

Given what engineers at this level cost, their costs per hour dealing with all of the nonsense clouds handle for you (networking, storage, elastic scaling, instant replacement of faulty servers, load balancing, yadda yadda) end up being higher than whatever tax you're paying for using the cloud.

Economies of scale are real.

>>defend+101
I find that if you are seeking lots of consumers around large topics no amount of RAM is really sufficient, and if you are mostly sticking to the tails like a regular Kafka user, even 64GB is usually way more than enough.

CPU isn't usually a problem until you start using very large compactions, and then suddenly it can be a massive bottleneck. (Actually I would love to abuse more RAM here but log.cleaner.dedupe.buffer.size has a tiny maximum value!)

Kafka Streams (specifically) is also configured to transact by default, even though most applications aren't written to be able to actually benefit from that. If you run lots of different consumer services this results in burning a lot of CPU on transactions in a "flat profile"-y way that's hard to illustrate to application developers since each consumer, individually, is relatively small - there's just thousands of them.

>>margin+DW
This is one of those times when I feel like you just didn't read anything I typed. So... I'm just gonna let you be confidently incorrect.

replies(2): >>Dylan1+DG1 >>margin+VO2

>>defend+j11
They're linear... because they're charging you rates based on the cost of the large server, divided down into whatever server you provisioned.

Amusingly, for $94K (probably more like $85K after negotiation) you can buy a white box server: Dual Epyc 9000, 96 core/192thread, 3.5GHz, w/ 3TB RAM, 240T of very fast SSD, and a 10G NIC. The minimum config, Dual Epyc 9124, 16core/32thread, 64GB RAM, and only 4TB of storage is $9K (more like $8K after negotiation). That's "only" a factor of 10 in price for 8X CPUs, 48X the RAM, and 60X the storage.

replies(1): >>Dylan1+lG1

>>teawre+bU
The ability to “share memory faster” is a bigger distinction than you make it out to be. Distributed applications look quite different from merely multithreaded or multiprocess shared-memory applications due to the unreliability of the network, the increased latency, and other things which some refer to as the fallacies of distributed computing. To me, this is usually what people mean when they talk about “horizontal” vs. “vertical” scaling. With modern language-level support for concurrency, it hurts much more to go from a shared memory architecture to a distributed one than to go from a single-thread architecture to a multithreaded one.

>>dekhn+pn1
> They're linear... because they're charging you rates based on the cost of the large server, divided down into whatever server you provisioned.

And the reason they do it that way is because it's cheaper. Because the scaling is sublinear up to a good size.

>>teawre+Sg1
Or other people can disagree with your interpretation, especially because the analogy is somewhat strained and highly oversimplified.

>>teawre+Sg1
I'm reading what you're typing, but I just don't agree with it. It's also contradicted by both the article we're discussing and the wikipedia article; further it's an interpretation of vertical scaling that effectively doesn't

Distinction between horizontal and vertical scaling becomes nonsense if we accept your definitions, because literally nobody does that sort of vertical scaling.

replies(1): >>fluori+oi3

>>margin+VO2
Wrong. If you do any of these you're scaling vertically, even by that definition:

* Replace the CPU with a faster one, but with the same number of cores. Or simply run the same one at a higher clock rate.

* Add memory, or use faster memory.

* Add storage, or use faster storage.

These are all forms of vertical scaling because they reduce the time it takes to process a single transaction, either by reducing waits or by increasing computation speed.

> It's also contradicted by both the article we're discussing and the wikipedia article

The article agrees with this definition. Transaction latency decreases iff vertical scale increases. Transaction throughput increases with either form of scaling. Without this interpretation, the analogy to conveyor belts makes no sense.

replies(1): >>surema+dN3

>>fluori+oi3
Think of it this way, instead. Building a multi-belt system is a pain in the ass that complicates the design of your factory. Conveyor belt highways, multiplexers, tunnels, and a bunch of stuff related to the physical routing of your belts suddenly becomes relevant. But you can still increase throughput keeping a single belt, if your bottleneck is not belt speed but processing speed (in the industrial sense). I can have several factories sharing the same belt, which increases throughput but not latency.

Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. In Factorio you need 3 advanced circuits for the chemical science pack. If your science lab can produce 1 science pack every 24 seconds but your pipeline takes 16 seconds to produce one advanced circuit, your whole pipeline is going to have a latency of 48 seconds from start to finish due to being bottlenecked by the advanced circuit pipeline. Doubling the amount of processing units in each step of the circuit pipeline will double your throughput and bring your latency down to 24 seconds, as it should be. And if you have room for those extra processing units, you can do that without adding more belts.

The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

replies(1): >>fluori+N94

>>surema+dN3
>Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. [...]

This isn't latency in the same sense I was using the word. This is reciprocal throughput. Latency, as I was using the word, is the time it takes for an object to completely pass through a system; more generally, it's the delay between a cause and its effect/s. For example, you could measure how long it takes for an iron ore to be integrated into a final product at the end of the pipeline. This measure could be relevant in certain circumstances. If you needed to control throughput by restricting inputs, the latency would tell you how much lag there is between the time when you throttle the input supply and the time when the output rate starts to decrease.

>The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level. You take a unit of work that's both atomic (it's either complete or incomplete) and serial (a thread can't do anything else until it's completed it), and take a timestamp when it's begun processing and another when it's finished. The difference between the two is the latency of the system.

replies(1): >>surema+1i4

>>fluori+N94
> This isn't latency in the same sense I was using the word.

But it is. If you add more factories for producing advanced chips, you can produce a chemical science pack from start to finish in 24 seconds (assuming producing an advanced circuit takes 16 seconds). Otherwise it takes 48 seconds, because you’re waiting sequentially for 3 advanced circuits to be completed. It doesn’t matter that the latency of producing an advanced circuit didn’t decrease. The relevant metric is the latency to produce a chemical science pack, which _did_ decrease, by fanning out the production of a sub-component.

Edit: actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

> Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level.

That’s what I’m saying about Factorio, though. You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

replies(1): >>fluori+Ws4

>>surema+1i4
>actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

That's still reciprocal throughput. 1/(science packs/second). You're measuring the time delta delta between the production of two consecutive science packs, but this measurement implicitly hides all the work the rest of the factory did in parallel. If the factory is completely inactive, how soon can it produce a single science pack? That time is the latency.

>You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

Suppose instead of producing science packs, your factory produces colored cars. A customer can come along, press a button to request a car of a given color, and the factory gives it to them after a certain time. You want to answer customer requests as quickly as possible, so you always have ready black, white, and red cars, which are 99% of the requests, and your factory continuously produces cars in a red-green-blue pattern, at a rate 1 car per hour. Unfortunately your manufacturing process is such that the color must be set very easy in the pipeline and this changes the entire rest of the production sequence. If a customer comes along and presses a button, how long do they need to wait until they can get their car? That measurement is the latency of the system.

The best case is when there's a car already ready, so the minimum latency is 0 seconds. If two customers request the same color one after the other, the second one may need to wait up to three hours for the pipeline to complete a three-color cycle. But what if a customer wants a blue car? I've only been talking about throughput. Nothing of what I've said so far tells you how deep the pipeline is. It's entirely possible that even though your factory produces a red car every three hours, producing a blue car takes you three months. If you add an exact copy of the factory you can produce two red cards every three hours, but producing a single blue car still takes three months.

Adding parallelism can only affect the optimistic paths through a system, but it has no effect on the maximum latency. The only way to reduce maximum latency is to move more quickly through the pipeline (faster processor) or to shorten the pipeline (algorithmic optimization). You can't have a baby in one month by impregnating nine women.

replies(1): >>surema+Uv4

>>fluori+Ws4
> If the factory is completely inactive, how soon can it produce a single science pack? That time is the latency.

That is what I am trying to explain, now for the second time.

Let's say you have a magic factory that turns rocks into advanced circuits, for simplicity's sake, after 16 seconds. You need 3 advanced circuits for one chemical science pack. If you only have one circuit factory, you need to wait 3 * 16 seconds to produce three circuits. If you have three circuit factories that can grab from the conveyor belt at the same time, they can start work at the same time. Then the amount of time it takes to produce 3 advanced circuits, starting with all three factories completely inactive, is 16 seconds, assuming you have 3 rocks ready for consumption.

The time it takes to produce a chemical pack, in turn, is the time it takes to produce 3 circuits, plus 24 seconds to turn the finished circuits into a science pack. It stands to reason that if you can produce 3 circuits faster in parallel than sequentially, you can also produce chemical science packs faster sequentially.

> Suppose instead of producing science packs, your factory produces colored cars. A customer can come along, press a button to request a car of a given color, and the factory gives it to them after a certain time. You want to answer customer requests as quickly as possible, so you always have ready black, white, and red cars, which are 99% of the requests, and your factory continuously produces cars in a red-green-blue pattern, at a rate 1 car per hour. Unfortunately your manufacturing process is such that the color must be set very easy in the pipeline and this changes the entire rest of the production sequence. If a customer comes along and presses a button, how long do they need to wait until they can get their car? That measurement is the latency of the system.

Again, I understand this, so let me phrase it in a way that fits in your analogy. Creating a car is a complex operation that requires many different pieces to be created. I'm not a car mechanic, so I'm just guessing, but at a minimum you have the chassis, engine, tires, and the panels.

If you can manufacture the chassis, engine, tires, and panels simultaneously, it will decrease the total latency of producing one unit (a car). I'm not talking about producing different cars in parallel. Of course that won't decrease latency to produce a single car. I'm saying you parallelize the components of the car. The time it takes to produce the car, assuming every component can be manufactured independently, is the maximum amount of time it takes across the components, plus the time it takes to assemble them once they've been completed. So if the engine takes the longest, you can produce a car in the amount of time it takes to produce a engine, plus some constant.

Before, the amount of time is chassis + engine + tires + panels + assembly. Now, the time is engine + assembly, because the chassis, tires, and panels are already done by the time the engine is ready.