zlacker

I think the author has a point with one-way doors slowing down the adoption of distributed systems. The best way to build two way doors is to push for industry adoption of a particular API. In theory the backend of these APIs matter little to me, the developer, so long as they are fast and consistent. Some examples that come to mind is that Apache Beam is a "programming model" for Data pipelines, Akka is a "programming model" for stateful distributed systems, OpenTelemetry for logging/telemetry, and Kubernetes for orchestration. Oh, and local development is a strong preference.

replies(2): >>jaunty+A1 >>mikepu+s5

>>awkii+(OP)
OTel being a capture & ingest only specification is kind of messed up. There's no attempt from what I can tell for how to query or present stored data; it's just an over-the-wire specification, & that drastically limits usable scope. It means vendors each get to make their own services & backends & tools, but it's greviously limiting the effort as a whole, makes even an open spec like OTel a one-way door.

Ideally OTel would be more than observability, imo. Traces would be event-sources, would be a thing that begets more computing. The system to observe computing should in turn also be the system to react & respond to computing, should be begetting more computing elsewhere. That's the distributed system I want to see; local agents reporting their actions, other agents seeing that & responding. OTel would fit the need well, if only we could expand ourselves beyond thinking of observability as an operator affordance & start thinking of it as the system to track computation.

replies(1): >>singro+s4

>>jaunty+A1
Otel works as a standard since there isn't any need to innovate at that level. Despite the over complications it has, all the implementations largely have the same requirements, and it's useful to instrument everything the same way.

Querying unfortunately has lots of room for innovation, and it's really hard to nail down in a spec especially when the vendors all want to compete.

replies(1): >>__turb+5l3

>>awkii+(OP)
It boggles my mind that people accept architectures where the only dev story is a duplicate cloud instance of the required services.

Being able to bring the whole application up locally should be an absolute non-negotiable.

replies(2): >>cybera+v9 >>pjmlp+b29

>>mikepu+s5
> Being able to bring the whole application up locally should be an absolute non-negotiable.

This usually doesn't work that well for larger systems with services split between multiple teams. And it's not typically the RAM/CPU limitations that are the problem, but the amount of configuration that needs to be customized (and, in some cases, data).

Sooner or later, you just start testing with the other teams' production/staging environments rather than deal with local incompatibilities.

replies(1): >>choege+dA1

>>cybera+v9
> Sooner or later, you just start testing with the other teams' production/staging environments rather than deal with local incompatibilities.

That's probably about the time when your development pace goes downhill.

I think it's an interesting idea to consider: If some team interfaces with something outside of its control, they need to have a mock of it. That policy increases the development effort by at least a factor of two (you always have to create the mock alongside the thing), but it's just a linear increase.

replies(2): >>mikepu+cL1 >>cybera+TM1

>>choege+dA1
In theory it should be the cloud providers themselves maintaining the locally-runnable stand-ins for their services, but as it stands you basically either get it as a third party effort (MinIO for S3) or in cases where the service is just a hosted version of some existing OSS product (Postgres for RDS).

Either way, once the local version exists, then the job becomes maintaining all the infrastructure that lets you bring up the pieces, populate them with reasonable state and wire them into whatever the bits are that are being actively hacked-on.

>>choege+dA1
> That's probably about the time when your development pace goes downhill.

Oh, absolutely. But at this point, your team is probably around several dozen people and you have a product with paying customers. This naturally slows the development speed, however you organize the development process.

> I think it's an interesting idea to consider: If some team interfaces with something outside of its control, they need to have a mock of it. That policy increases the development effort by at least a factor of two (you always have to create the mock alongside the thing), but it's just a linear increase.

The problem is, you can't really recapture the actual behavior of a service in a mock.

To give you an example, DynamoDB in AWS has a local mock in-memory DB for testing and development. It has nearly the same functionality, but stores all the data in RAM. So the simulated global secondary indexes (something like table views in classic SQL databases) are updated instantly. But on the real database it's eventually consistent, and it can take a fraction of a second to update.

So when you try to use your service in production, it can start breaking under the load.

Perhaps, we need better mocks that also simulate the behavior of the real services for delays, retries, and so on.

replies(1): >>KAKAN+tS7

>>singro+s4
Otel is nice and all but I still think you are best off going 100% all in prometheus. Prometheus is so common that it has become a de-facto standard in metrics.

At BigCo we have migrated a number of internal things to Otel but I don’t think it has been worth the effort.

So many projects come with prometheus metrics, dashboards, and alerts out of the box that it becomes hard to use anything else. When I pick some random helm chart to install you can almost guarantee that is comes with prometheus integrations.

With grafana mimir you can now scale easily to a few billion metrics streams so a lot of the issues with the old model of prometheus have been fixed.

Like you said I don’t think there is much to innovate on in this area, which is a good thing.

>>cybera+TM1
This reminds me of an article I read somewhere (probably here in HN) wherein people implementing Banking Services just straight up test the API in Production after a few cycles of mock development, due to constantly having to deal with edge cases not present in the dev env.

>>mikepu+s5
We are back to timesharing days, in better clothing, and that is non negotiable from management point of view.