Transcending Posix: The End of an Era?

>>jsnell+(OP)
> However, contemporary applications rarely run on a single machine. They increasingly use remote procedure calls (RPC), HTTP and REST APIs, distributed key-value stores, and databases,

I'm seeing an increasing trend of pushback against this norm. An early example was David Crawshaw's one-process programming notes [1]. Running the database in the same process as the application server, using SQLite, is getting more popular with the rise of Litestream [2]. Earlier this year, I found the post "One machine can go pretty far if you build things properly" [3] quite refreshing.

Most of us can ignore FAANG-scale problems and keep right on using POSIX on a handful of machines.

[1]: https://crawshaw.io/blog/one-process-programming-notes

[2]: https://litestream.io/

[3]; https://rachelbythebay.com/w/2022/01/27/scale/

>>mwcamp+0v
If you have an application server then you still have RPCs coming from your user interface, even if you run the whole DB in process. And indeed POSIX has nothing to say about this. Instead people tend to abuse HTTP as a pseudo-RPC mechanism because that's what the browser understands, it tends to be unblocked by firewalls etc.

One trend in OS research (what little exists) is the idea of the database OS. Taking that as an inspiration I think there's a better way to structure things to get that same simplicity and in fact even more, but without many of the downsides. I'm planning to write about it more at some point on my company blog (https://hydraulic.software/blog.html) but here's a quick summary. See what you think.

---

In a traditional 3-tier CRUD web app you have the RDBMS, then stateless web servers, then JavaScript and HTML in the browser running a pseudo-stateless app. Because browsers don't understand load balancing you probably also have an LB in there so you can scale and upgrade the web server layer without user-visible downtime. The JS/HTML speaks an app specific ad-hoc RPC protocol that represents RPCs as document fetches, and your web server (mostly) translates back and forth between this protocol and whatever protocol your RDBMS speaks layering access control on top (because the RDBMS doesn't know who is logged in).

This approach is standard and lets people use web browsers which have some advantages, but creates numerous problems. It's complex, expensive, limiting for the end user, every app requires large amounts of boilerplate glue code, and it's extremely error prone. XSS, XSRF and SQL injection are all bugs that are created by this choice of architecture.

These problems can be fixed by using "two tier architecture". In two tier architecture you have your RDBMS cluster directly exposed to end users, and users log in directly to their RDBMS account using an app. The app ships the full database driver and uses it to obtain RPC services. Ordinary CRUD/ACL logic can be done with common SQL features like views, stored procedures and row level security [1][2][3]. Any server-side code that isn't neatly expressible with SQL is implemented as RDBMS server plugins.

At a stroke this architecture solves the following problems:

1. SQL injection bugs disappear by design because the RDBMS enforces security, not a highly privileged web app. By implication you can happily give power users like business analysts direct SQL query access to do obscure/one-off things that might otherwise turn into abandoned backlog items.

2. XSS, XSRF and all the other escaping bugs go away, because you're not writing a web app anymore - data is pulled straight from the database's binary protocol into your UI toolkit's data structures. Buffer lengths are signalled OOB across the entire stack.

3. You don't need a hardware/DNS load balancer anymore because good DB drivers can do client-side load balancing.

4. You don't need to design ad-hoc JSON/REST protocols that e.g. frequently suck at pagination, because you can just invoke server-side procedures directly. The DB takes care of serialization, result streaming, type safety, access control, error reporting and more.

5. The protocol gives you batching for free, so if you have some server logic written in e.g. JavaScript, Python, Kotlin, Java etc then it can easily use query results as input or output and you can control latency costs. With some databases like PostgreSQL you get server push/notifications.

6. You can use whatever libraries and programming languages you want.

This architecture lacks popularity today because to make it viable you need a few things that weren't available until very recently (and a few useful things still aren't yet). At minimum:

1. You need a way to distribute and update GUI desktop apps that isn't incredibly painful, ideally one that works well with JVM apps because JDBC drivers tend to have lots of features. Enter my new company, stage left (yes! that's right! this whole comment is a giant ad for our product). Hydraulic Conveyor was launched in July and makes distributing and updating desktop apps as easy as with a web app [4].

2. You're more dependent on having a good RDBMS. PostgreSQL only got RLS recently and needs extra software to scale client connections well. MS SQL Server is better but some devs would feel "weird" buying a database (it's not that expensive though). Hosted DBs usually don't let you install arbitrary extensions.

3. You need solid UI toolkits with modern themes. JetBrains has ported the new Android UI toolkit to the desktop [5] allowing lots of code sharing. It's reactive and thus has a Kotlin language dependency. JavaFX is a more traditional OOP toolkit with CSS support, good business widgets and is accessible from more languages for those who prefer that; it also now has a modern GitHub-inspired SASS based style pack that looks great [6] (grab the sampler app here [7]). For Lispers there's a reactive layer over the top [8].

4. There's some smaller tools that would be useful e.g. for letting you log into your DB with OAuth, for ensuring DB traffic can get through proxies.

Downsides?

1. Migrating between DB vendors is maybe harder. Though, the moment you have >1 web server you have the problem of doing a 'live' migration anyway, so the issues aren't fundamentally different, it'd just take longer.

2. Users have install your app. That's not hard and in a managed IT environment the apps can be pushed out centrally. Developers often get hung up on this point but the success of the installed app model on mobile, popularity of Electron and the whole video game industry shows users don't actually care much, as long as they plan to use the app regularly.

3. To do mobile/tablet you'd want to ship the DB driver as part of your app. There might be oddities involved, though in theory JDBC drivers could run on Android and be compiled to native for iOS using GraalVM.

4. Skills, hiring, etc. You'd want more senior devs to trailblaze this first before asking juniors to learn it.

[1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html

[2] https://docs.microsoft.com/en-us/sql/relational-databases/se...

[3] https://docs.oracle.com/database/121/TDPSG/GUID-72D524FF-5A8...

[4] https://hydraulic.software/

[5] https://www.jetbrains.com/lp/compose-mpp/

[6] https://github.com/mkpaz/atlantafx

[7] https://downloads.hydraulic.dev/atlantafx/sampler/download.h...

[8] https://github.com/cljfx/cljfx

>>mike_h+dF
Since I've seen a similar thing in the 90s, I have a practical point to make.

If a two-tier app sends out emails, PLSQL/dbplugin does it. Now every ops task for sending emails involves the DB and by extension, your data is at stake. To launch a new parallel process, or to roll a new version, or to spread to a different location, or to kill a frozen process, or to measure much RAM a new feature has eaten, these are all DB tasks despite the fact that the task was just for a send-email feature.

Anything happening server-side (i.e. not on a user's device) needs to pass DBA middlepersons.

To put it back on feet, the architecture might be: the DB is one of the services. A frontend can talk to a database, and the two can work out the protocol, the authn/authz, the load balancing. They don't need any CRUD "backend" that is not really a "back" "end" but just a glorified boilerplate SQL-to-JSON converter.

The tradeoff is that you lose a lot of implicit trust. An email service cannot trust the frontend with the business rules. If user is allowed to only send to a set of recipients - it's an email service that needs to query that set from the DB.

>>kubanc+Zk2
Yes, you can go for a mixed approach. As you observe, it might not change that much because most of the issues aren't dependent on how many tiers you have. If you have middlemen between you and prod they're probably there anyway, regardless of architecture. And something will have to query the DB to find out who the user can email. Whether that's a DB plugin written in Python, a web server or whether it's an email microservice that connects to the DB over the network, it's going to boil down to how much you care about service isolation vs distributed systems complexity.

If you wanted isolation of an email service in this design, you'd use the DB as a task queue. The app triggers a procedure (written in SQL, Python, Java or whatever) which verifies the business logic and then does an insert to a tasks table. The email microservice wakes up and processes the queued tasks. That's a pretty common design already.

>>mike_h+3q2
Ah, so you are saying to just bundle the DB with business logic and let it call other components (if any exist).

I thought about the whole idea over the weekend a bit and I'd say it is worth a try.

If you say that you have the distribution problem figured out, that makes it viable, it was the biggest obstacle in the 90s. What I'd expect it to mean is that to roll out a significant DB change, the frontend can self-update and not lose a hour-worth of users' unsaved work.

Also I think when selling this, you don't need to avoid the Delphi nostalgia that much. Everyone old who sees "remove the middle tier" will instantly go into mental mode of "uh-oh, those who do not learn from history are bound to repeat it". You are seeing a lot of it around this subthread - if you acknowledge upfront that you know you build on that past exp, it adds credibility.

>>kubanc+wX4
Yes, exactly. Glad to hear that! Thanks for the words of advice, it's helpful.

People always have different thresholds for what "solved" means. Today Conveyor gives you a Chrome-style update experience on Windows where the app updates in the background even if it's being used. The user isn't disrupted. Ditto on macOS if the user states they want silent background updates at first update (we'll probably tweak this so the user isn't asked and must opt-out explicitly). The user won't lose any unsaved work or be surprised by sudden changes.

So to make a DB change, you need to do it in a backwards compatible way until the clients have had a chance to fully update. Probably that means being compatible for a few days, if you have users who aren't always online. This is probably solved enough for the general case.

The developer experience in that mode is exactly the same as compiling Markdown to HTML using Jekyll or Hugo.

The next step is to go further, so code changes can take effect much faster than a background update cycle. It requires the client to have some notion of a page, screen, activity etc - some point between interactions where you can make changes without disrupting the user. And it requires the client to be informed by the server if code has changed, even whilst the user is running the app. This takes more work, but it's on the roadmap. Mostly we think the async model is OK. You have to change your schemas in backwards compatible ways even in the web case to avoid site outages or different web servers getting confused during a rolling upgrade.

zlacker