Want to store data, query it in arbitrary and hard to foresee ways and also want to easily tune performance for these queries? Relational datastore it is.
Want to have ACID? Well, relational datastore it is.
Kafka is not the right solution for these problems.
I mean, whoever in their right mind would want to:
- have a snapshot of data
- query data, including ad-hoc querying
- query related data
- have trasactional updates to data
When all you need is an unbounded stream of data that you need to traverse in order to do all these things.
Being able to see a snapshot is good, and I would hope to see a higher-level abstraction that can offer that on top of something Kafka-like. But making the current state the primary thing is a huge step backwards, especially when you don't get a history at all by default.
> - query data, including ad-hoc querying
OK, fair, ad-hoc queries are one thing that relational databases are legitimately good at. Something that can maintain secondary indicies and do query planning based on them is definitely useful. But you're asking for trouble if you use them in your live dataflow or allow ad-hoc queries to write to your datastore.
> - have trasactional updates to data
I do think this one is genuinely a mistake. What do you do when a transaction fails? All of the answers I've heard imply that you didn't actually need transactions in the first place.
Bank -> debit card purchase -> perform all required database work in a transaction -> transaction fails -> decline debit card purchase
Without transactions, in this scenario, maybe the debit card transaction fails but money is still taken out of your account? Doesn’t sound very pleasant.
Why?
When is "I need to query all of my log to get the current view of data" is a step forward? All businesses operate on the current view of data.
> OK, fair, ad-hoc queries are one thing that relational databases are legitimately good at.
Not just ad-hoc queries. Any queries.
> But you're asking for trouble if you use them in your live dataflow or allow ad-hoc queries to write to your datastore.
In our "live datafows" etc. we use a pre-determined set of queries that are guaranteed to run multiple orders of magnitude faster in a relational database on the current view of data than having to reconstruct all the data from an unbounded stream of raw events.
> What do you do when a transaction fails?
I roll back the transaction. As simple as that.
All businesses operate in response to events. Most of the things you do are because x happened rather than because the current state of the world is y.
> In our "live datafows" etc. we use a pre-determined set of queries that are guaranteed to run multiple orders of magnitude faster in a relational database on the current view of data than having to reconstruct all the data from an unbounded stream of raw events.
If you have a pre-determined set of queries, you can put together a corresponding set of stream transformations that will compute the results you need much faster than querying a relational database.
> I roll back the transaction. As simple as that.
And then what, completely discard the attempt without even a record that it happened?
Yes, but once an event happens, business needs access to current state of data.
> If you have a pre-determined set of queries, you can put together a corresponding set of stream transformations that will compute the results you need much faster than querying a relational database.
No, it won't. Because you won't be able to run "a corresponding set of transformations" on, say, a million clients.
You can, however, easily query this measly set on a laptop with an "overengineered" relational database.
> completely discard the attempt without even a record that it happened?
Somehow in your world audit logging doesn't exist.
Of course you can. It's a subset of the same computation, you're just doing it in a different place.
> Somehow in your world audit logging doesn't exist.
If you have to use a separate "audit logging" datastore to augment your relational database then I think you've proven my point.
>the attempt to charge is recorded in a leger
Hint: how do you think this attempt is recorded and fulfilled? Or, do you think "it's just appended" and bank recalculates your balance from scratch every time you spend 1$ on coke can?
Only bank I've heard of that's not using traditional relational database for ledger is Monzo [1] - but they still use Cassandra's transactions.
[1] https://www.scaleyourapp.com/an-insight-into-the-backend-inf...
That's how the bank I worked with did it. Of course there was caching in place so we didn't actually recompute everything every time, but the implementation of that was a lot closer to "commit a kafka offset" than an RDBMS-style transaction. (E.g. we didn't overwrite the "current balance" in-place, we appended a new "current balance as of time x").
Yeah, who could need to know exactly how many items of a particular product they have in stock currently, or how much money a customer has in her account at the particular moment she wants to do a withdrawal? It's really hard to come up with any useful real world examples when this could be the case.
> What do you do when a transaction fails?
It depends on why the transaction fails and in which way. But sometimes it is really useful to make sure that when one account is debited, another one is credited at the same time.
> Of course you can.
Of course, you can't. Because you can't run a million transformations. Whereas querying specific data for any of the one million clients? It's trivial on a relational database.
Moreover. If you need new queries into data, it's again trivial. Because you have the current view of your data, and you don't need to recalculate everything from the beginning of time just because your requirements ever so slightly changed.
> If you have to use a separate "audit logging" datastore to augment your relational database then I think you've proven my point.
No, I haven't.
It's funny, however, that you think that businesses don't require a current view of data and need to re-calc everything from scratch.
I think you've proved our point
There are just too many scenarios where not having transactions is dog slow or really really unwieldy.
One of those things being “store it in a relational model” or “write a sum to a key value store” or something else.
This ability comes for free with Kafka, but is very not-free when using a relational model.
It's not missing that because it doesn't even address that. I'm answering a specific point.
Transactions were kept by humans, literally for a few centuries, before the algorithm was adapted for computers.
> Relational database seem to be a crazily overengineered solution in search of a problem
Why would an answer to that need to mention Kafka consumers?
This is the part I was responding to.
Having access to the current state of the world is useful, having a log of what happened / how it got that way is essential. You've got to get the foundations right before you build a monumental edifice on top.