Transcending Posix: The End of an Era?

>>jsnell+(OP)
Hadn't heard of "Dennard scaling" before:

> These services cannot expect to run faster from year to year with increasing CPU clock frequencies because the end of Dennard scaling circa 2004 implies that CPU clock frequencies are no longer increasing at the rate that was prevalent during the commoditization of Unix.

Definition:

> Dennard scaling, also known as MOSFET scaling, is a scaling law which states roughly that, as transistors get smaller, their power density stays constant, so that the power use stays in proportion with area; both voltage and current scale (downward) with length.[1][2] The law, originally formulated for MOSFETs, is based on a 1974 paper co-authored by Robert H. Dennard, after whom it is named.[3]

* https://en.wikipedia.org/wiki/Dennard_scaling

The article then mentions Moore's Law.

>>323+vb
Could be this one? https://youtu.be/tCMs6XqY-rc

>>pid-1+cc
I'm guessing it's Tim Roscoe's keynote on how most fundamental questions about what the hardware is actually doing are invisible to traditional OS abstractions:

https://www.youtube.com/watch?v=36myc8wQhLo

Roscoe's talk is fairly long (the video is > 1h), but the basic thesis was taken up in a segment of Bryan Cantrill's 20 min OSFF talk:

https://www.youtube.com/watch?v=XbBzSSvT_P0

Both talks are very good; I recommend watching both, in either order.

>>jsnell+(OP)
> However, contemporary applications rarely run on a single machine. They increasingly use remote procedure calls (RPC), HTTP and REST APIs, distributed key-value stores, and databases,

I'm seeing an increasing trend of pushback against this norm. An early example was David Crawshaw's one-process programming notes [1]. Running the database in the same process as the application server, using SQLite, is getting more popular with the rise of Litestream [2]. Earlier this year, I found the post "One machine can go pretty far if you build things properly" [3] quite refreshing.

Most of us can ignore FAANG-scale problems and keep right on using POSIX on a handful of machines.

[1]: https://crawshaw.io/blog/one-process-programming-notes

[2]: https://litestream.io/

[3]; https://rachelbythebay.com/w/2022/01/27/scale/

>>garaet+9A
Do so at your own peril, they aren't part of the public API.

"NDK C library"

https://developer.android.com/ndk/guides/stable_apis#c_libra...

"Improving Stability with Private C/C++ Symbol Restrictions in Android N"

https://android-developers.googleblog.com/2016/06/improving-...

"Android changes for NDK developers"

https://android-developers.googleblog.com/2016/06/android-ch...

Termux developers also think they could do whatever they felt like on Android, guess what, they can't.

>>mwcamp+0v
If you have an application server then you still have RPCs coming from your user interface, even if you run the whole DB in process. And indeed POSIX has nothing to say about this. Instead people tend to abuse HTTP as a pseudo-RPC mechanism because that's what the browser understands, it tends to be unblocked by firewalls etc.

One trend in OS research (what little exists) is the idea of the database OS. Taking that as an inspiration I think there's a better way to structure things to get that same simplicity and in fact even more, but without many of the downsides. I'm planning to write about it more at some point on my company blog (https://hydraulic.software/blog.html) but here's a quick summary. See what you think.

---

In a traditional 3-tier CRUD web app you have the RDBMS, then stateless web servers, then JavaScript and HTML in the browser running a pseudo-stateless app. Because browsers don't understand load balancing you probably also have an LB in there so you can scale and upgrade the web server layer without user-visible downtime. The JS/HTML speaks an app specific ad-hoc RPC protocol that represents RPCs as document fetches, and your web server (mostly) translates back and forth between this protocol and whatever protocol your RDBMS speaks layering access control on top (because the RDBMS doesn't know who is logged in).

This approach is standard and lets people use web browsers which have some advantages, but creates numerous problems. It's complex, expensive, limiting for the end user, every app requires large amounts of boilerplate glue code, and it's extremely error prone. XSS, XSRF and SQL injection are all bugs that are created by this choice of architecture.

These problems can be fixed by using "two tier architecture". In two tier architecture you have your RDBMS cluster directly exposed to end users, and users log in directly to their RDBMS account using an app. The app ships the full database driver and uses it to obtain RPC services. Ordinary CRUD/ACL logic can be done with common SQL features like views, stored procedures and row level security [1][2][3]. Any server-side code that isn't neatly expressible with SQL is implemented as RDBMS server plugins.

At a stroke this architecture solves the following problems:

1. SQL injection bugs disappear by design because the RDBMS enforces security, not a highly privileged web app. By implication you can happily give power users like business analysts direct SQL query access to do obscure/one-off things that might otherwise turn into abandoned backlog items.

2. XSS, XSRF and all the other escaping bugs go away, because you're not writing a web app anymore - data is pulled straight from the database's binary protocol into your UI toolkit's data structures. Buffer lengths are signalled OOB across the entire stack.

3. You don't need a hardware/DNS load balancer anymore because good DB drivers can do client-side load balancing.

4. You don't need to design ad-hoc JSON/REST protocols that e.g. frequently suck at pagination, because you can just invoke server-side procedures directly. The DB takes care of serialization, result streaming, type safety, access control, error reporting and more.

5. The protocol gives you batching for free, so if you have some server logic written in e.g. JavaScript, Python, Kotlin, Java etc then it can easily use query results as input or output and you can control latency costs. With some databases like PostgreSQL you get server push/notifications.

6. You can use whatever libraries and programming languages you want.

This architecture lacks popularity today because to make it viable you need a few things that weren't available until very recently (and a few useful things still aren't yet). At minimum:

1. You need a way to distribute and update GUI desktop apps that isn't incredibly painful, ideally one that works well with JVM apps because JDBC drivers tend to have lots of features. Enter my new company, stage left (yes! that's right! this whole comment is a giant ad for our product). Hydraulic Conveyor was launched in July and makes distributing and updating desktop apps as easy as with a web app [4].

2. You're more dependent on having a good RDBMS. PostgreSQL only got RLS recently and needs extra software to scale client connections well. MS SQL Server is better but some devs would feel "weird" buying a database (it's not that expensive though). Hosted DBs usually don't let you install arbitrary extensions.

3. You need solid UI toolkits with modern themes. JetBrains has ported the new Android UI toolkit to the desktop [5] allowing lots of code sharing. It's reactive and thus has a Kotlin language dependency. JavaFX is a more traditional OOP toolkit with CSS support, good business widgets and is accessible from more languages for those who prefer that; it also now has a modern GitHub-inspired SASS based style pack that looks great [6] (grab the sampler app here [7]). For Lispers there's a reactive layer over the top [8].

4. There's some smaller tools that would be useful e.g. for letting you log into your DB with OAuth, for ensuring DB traffic can get through proxies.

Downsides?

1. Migrating between DB vendors is maybe harder. Though, the moment you have >1 web server you have the problem of doing a 'live' migration anyway, so the issues aren't fundamentally different, it'd just take longer.

2. Users have install your app. That's not hard and in a managed IT environment the apps can be pushed out centrally. Developers often get hung up on this point but the success of the installed app model on mobile, popularity of Electron and the whole video game industry shows users don't actually care much, as long as they plan to use the app regularly.

3. To do mobile/tablet you'd want to ship the DB driver as part of your app. There might be oddities involved, though in theory JDBC drivers could run on Android and be compiled to native for iOS using GraalVM.

4. Skills, hiring, etc. You'd want more senior devs to trailblaze this first before asking juniors to learn it.

[1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html

[2] https://docs.microsoft.com/en-us/sql/relational-databases/se...

[3] https://docs.oracle.com/database/121/TDPSG/GUID-72D524FF-5A8...

[4] https://hydraulic.software/

[5] https://www.jetbrains.com/lp/compose-mpp/

[6] https://github.com/mkpaz/atlantafx

[7] https://downloads.hydraulic.dev/atlantafx/sampler/download.h...

[8] https://github.com/cljfx/cljfx

>>mike_h+LI
The thought is somewhat inchoate still. I'm working with a pure functional language (Joy https://joypy.osdn.io/ ) and when it came time to add filesystem support I balked. Instead, I'm trying out immutable 3-tuples of (hash, offset, length) to identify sequences of bytes (for now the "backing store" is just a git repo.) Like I said, it's early days but so far it's very interesting and useful.

I get what you're saying about modern filesystems, and I agree. I guess from that POV I'm saying we could stand to remove some of the layers of abstraction?

>>carapa+6Q
Well, Git still uses mutable state stored in files. You can't avoid it - the world is mutable. The question is how to expose and manage the mutations.

At any rate you might be interested in a few different projects:

1. BlueStore: https://ceph.io/en/news/blog/2017/new-luminous-bluestore/

2. The DAT or IPFS protocols, which are based on the idea of immutable logs storing file data, identified by hashes, with public keys and signatures to handle mutability.

>>mwcamp+0v
> Running the database in the same process as the application server, using SQLite, is getting more popular with the rise of Litestream.

As someone who uses SQLite a lot, I'm suspicious of this claim. Litestream is strictly a backup tool, or, as its author puts it, disaster recovery tool. It gives you a bit more peace of mind than good old periodic snapshots, but it does not give you actual usable replication,* so I doubt it meaningfully increased SQLite adoption in the RDBMS space (compared to the application data format space where it has always done well).

* There was a live read replica beta which has since been dropped. Author did mention a separate tool they're working on which will include live replication. https://github.com/benbjohnson/litestream/issues/8#issuecomm...

>>mike_h+821
> devs wanted to use UNIX and Perl instead of VB/Delphi

What do you think drove this? Presumably plenty of people in the dark mass of 9-to-5 devs were happy with VB/Delphi. Jonathan Edwards has written [1] that VB came from "a more civilized age. Before the dark times… before the web." Did nerdy devs like me, with our adolescent anti-Microsoft attitude (speaking for myself anyway; I was born in 1980), ruin everything?

[1]: https://alarmingdevelopment.org/?p=865

>>jeff-d+jJ
> Now the posix APIs feels like the worst of both worlds.

I think, like anything, it depends on what you're doing, and how you're doing it.

> The "everything is a text file" interface is not great anymore, either.

Text doesn't seem markedly different from a JSON API? Perhaps "worse is better" more often when you are composing several programs/apps to make one system? Even complex distributed systems.

> But it feels like some of the core posix APIs and interfaces just aren't a great fit

And I think these APIs can thrive on top of (the good parts of) POSIX. I'm not so certain we need to fundamentally rethink this layer, because this layer seems to work pretty well at what it does well. It should be pruned occasionally, but I'm not sure we need a whole new abstraction.

FWIW, I think this what the article is saying. Let's create some new interfaces where the old interfaces don't model the hardware well[0].

[0]: https://queue.acm.org/detail.cfm?id=3212479

>>thetea+Kg1
Because fork() was very simple and conceptually "easy" to do when it first was introduced, and is now massively complex and has huge implications on every part of the system. It's not compositional, isn't thread safe, insecure by default (inherits env/fds), and it's also slow with all the state it must copy. And at a conceptual level it doesn't work in environments where the nature of a "process" and "address space" aren't synonymous. For instance if your application uses a hardware accelerator (NIC, GPU, whatever) fork() isn't ever viable or sensible, since the hardware resources can't be duplicated safely. And it'll never work in a design like WebAssembly (just an example of this idea, but WASM isn't the only one), again "process" and "virtual memory address space" are not the same. Consider that posix_spawn can make reasonable sense in WebAssembly at a first guess ("launch this wasm module"), but fork() in contrast is much more difficult when it implies COW semantics.

The reality is fork() is pretty much exclusively used to launch new processes these days, outside a few specific cases. Today, it's a poor fit for that problem. And the answer is what Windows has been doing (and POSIX has now had) for a long time: explicitly launching processes by giving a handle/pathname to an executable like posix_spawn. That's the first solution, anyway; a better one would be more capability-oriented design where you have to supply a new address space with all its resources yourself.

This HotOS paper is a pretty good detailed coverage of the argument; I find it very convincing. If fork() went away, I honestly wouldn't miss it, I think. https://www.microsoft.com/en-us/research/uploads/prod/2019/0...

>>jeff-d+jJ
> Most applications use a database for a zillion reasons

Is it time to look at database-first operating systems again? There have been a few. Tandem's Guardian was very well regarded in its day. Unfortunately, Tandem was acquired by DEC/Compaq, which tried to move it to the Alpha around 1997. Then, after HP acquired what was left of Compaq, HP tried to move it to Itanium. (There was a MIPS version in there somewhere.) After all those bad decisions, in 2014 it was finally ported to x86, by which time few cared. There's still an HP offering.[1]

In Guardian, the database system owned the raw disks. There were no "files". If you wanted something that looked like a file, it was just a blob in a database entry. Everything was transaction-oriented, like SQL. That was the whole point. Everything was a redundant, recoverable transaction.

The OS was cluster oriented. Adding or replacing servers was normally done without shutdown. Databases were replicated, and you could add new disks, wait for them to come into sync, and remove the old ones.

All this in the 1970s. They were way ahead of their time.

[1] https://www.hpe.com/us/en/servers/nonstop.html

>>oefrha+cV
For folks' context, the new tool that's being discussed in the thread mentioned by the parent here is litefs [0], as well as which you can also look at rqlite [1] and dqlite [2], which all provide different trade-offs (e.g. rqlite is 'more strongly consistent' than litefs).

[0]: https://github.com/superfly/litefs

[1]: https://github.com/rqlite/rqlite

[2]: https://github.com/canonical/dqlite

>>comex+v71
I'm giving a (slightly updated) version of my talk at the Storage Network Industry Association Storage Developer's Conference (2022) in Freemont, CA next thursday:

https://storagedeveloper.org/events/sdc-2022/agenda/2022-09-...

"Symbolic links Considered Harmful"

Might be relevant to readers :-).

>>nevera+LJ1
Yes! I ran Coherent 4.0 on a 386SX laptop when I was in high school (before moving to Linux.) Coherent had incredible documentation, something that is very rare today. I still remember that book with the shell on it, and learned a ton about systems administration and POSIX programming from it.

Here it is: https://archive.org/details/CoherentMan

>>agumon+w92
Could you elaborate? What does "smooth async" and "reactive subtrees" mean in the context of UX, that sounds more like developer experience than user experience.

Generally if you can do it on mobile you can do it elsewhere, right? If you want something like ReactJS and coroutines/async/await, look at Jetpack Compose. It's inspired by ReactJS but for Android/Desktop: https://developer.android.com/jetpack/compose

You don't need any particular UI toolkit though. Many years ago I did a tutorial on "functional programming in Kotlin":

https://www.youtube.com/watch?v=AhA-Q7MOre0

It uses JavaFX with a library called ReactFX that adds functional utilities on top of the UI framework. It shows how to do async fuzzy matching of user input against a large set of ngrams. I guess that's in the region of what you mean too.

>>pjmlp+X52
Coherent was relatively cheap if you wanted a PC unix clone. $100 in 1992: https://techmonitor.ai/technology/coherent_unixalike_for_int...

>>agumon+RB3
I think if you compare like with like it probably wasn't so bad. Web stuff was a lot cruder in 2010 too. JavaFX has two way data binding into the scene graph (a.k.a. DOM) and did from the start:

https://openjfx.io/javadoc/18/javafx.base/javafx/beans/bindi...

>>icedch+qZ2
That is the price for the software + the hardware to actually run it at an acceptable speed.

And all things being equal you could still get OS/2 as low as $49,

> The suggested introductory price of OS/2 2.0 is $139. However, the cost falls to $99 for users upgrading from DOS, which includes just about anyone, and to $49 for users who are willing to turn in their Microsoft version of Windows.

https://www.nytimes.com/1992/04/21/science/personal-computer...

zlacker

Transcending Posix: The End of an Era?