zlacker

[parent] [thread] 19 comments
1. chaps+(OP)[view] [source] 2022-10-20 01:45:53

  "In Python, most of the bottleneck comes from having to fetch and deserialize Redis data."
This isn't a fair comparison. Of freaking course postgres would be faster if it's not reaching out to another service.
replies(3): >>montan+02 >>redhal+12 >>ta2234+Da
2. montan+02[view] [source] 2022-10-20 02:09:07
>>chaps+(OP)
As a contributor, I think it's interesting when comments focus on the language (Python vs Rust) vs the architecture (local vs remote). Inference is embarrassingly parallelizable, with Python Flask or Postgres replicas. I think the interesting thing is that data retrieval costs tend to dominate other costs, and yet are often ignored.

ML algorithms get a lot focus and hype. Data retrieval, not as much.

replies(2): >>chaps+Af >>deepst+6z
3. redhal+12[view] [source] 2022-10-20 02:09:12
>>chaps+(OP)
Yes, that's essentially the point being made here. It's a fair comparison if your intent is to run this kind of job as quickly as possible.
replies(2): >>chaps+E2 >>pushed+T4
◧◩
4. chaps+E2[view] [source] [discussion] 2022-10-20 02:16:17
>>redhal+12
No it's not. It tells me exactly nothing useful about the postgresml performance because it's impossible for me to rule out redis and the http server when factoring in performance. It's two hops, with a guaranteed delay that the postgres setup won't have.

If they wanted it to be a fair comparison they should have used FDWs to connect to the same redis and http server that the python benchmarks tested against.

replies(2): >>vasco+k3 >>darksa+Ba
◧◩◪
5. vasco+k3[view] [source] [discussion] 2022-10-20 02:22:32
>>chaps+E2
The point is you don't need those hops if you use postgresML.

It's like if I told you to move to a place where you can walk 5mins to work, and you tell me it's not a fair comparison because right now you have to drive to the station and then get on a train and you're interested in a comparison where you walk to the train instead. You don't need the train because you're already there!

You don't need the network hops exactly because the data is already there in the right way.

replies(2): >>chaps+14 >>FreakL+km
◧◩◪◨
6. chaps+14[view] [source] [discussion] 2022-10-20 02:29:55
>>vasco+k3
I get the point of the post, but I still don't see how it's remotely useful to understand the performance in postgresml, as someone who's interested in using it for my own tooling. Maybe I don't spend enough time in ML space to know how often they use HTTP/redis in their flows as much as they do. Most of my stuff is just data on-disk, where adding two additional services would be embarrassingly overkill.

Don't you think it would be incredibly useful as a baseline if they included a third test with FDWs against redis and the http server?

replies(1): >>theamk+v9
◧◩
7. pushed+T4[view] [source] [discussion] 2022-10-20 02:40:00
>>redhal+12
I also don’t think it’s a fair comparison. There’s nothing stopping me from loading the model into the memory of each Flask process (or some shmem), and getting the same performance or possibly better than the Postgres implementation, considering coroutines are being used in the Python case.

Calling this Postgres vs Flask is misleading at best. It’s more like “1 tier architecture vs 2 tier architecture”

replies(1): >>montan+gb
◧◩◪◨⬒
8. theamk+v9[view] [source] [discussion] 2022-10-20 03:31:05
>>chaps+14
Are there any other FDWs that do ML inference?

Remember, this is not plain file serving -- this is actually invoking XGBoost library which does complex mathematical operations. The user does not get data from disk, they get inference results.

Unless you know of any other solution which can invoke XGBoost (or some other inference library), I don't see anything "embarrassingly overkill" there.

replies(1): >>chaps+5c
◧◩◪
9. darksa+Ba[view] [source] [discussion] 2022-10-20 03:38:59
>>chaps+E2
The article very clearly was pointing out that the performance advantage of postgresml comes from the architecture. Hell, they are even using the same exact algorithm. What benefit is to be had from benchmarking the same algorithm on the same architecture? Do we also need to make sure Teslas have internal combustion engines when we compare their performance to ICE cars?
10. ta2234+Da[view] [source] 2022-10-20 03:39:30
>>chaps+(OP)
Further in their methodology they wrap a microservice around python and redis (which is doing an unmarshalling from redis and a marshalling to json) but they're not doing that with postgres.

In fact, as far as I can tell, postgres is not running as a microservice here. The data still has to be marshalled into some output other services can use.

◧◩◪
11. montan+gb[view] [source] [discussion] 2022-10-20 03:47:32
>>pushed+T4
You get it. 1 tier is better than 2 tier. Python can't be 1 tier, unless it loads the full dataset which is not generally feasible for production online inference cases. PostgresML is 1 tier, and supports the traditional Python use cases.
replies(1): >>xapata+0e
◧◩◪◨⬒⬓
12. chaps+5c[view] [source] [discussion] 2022-10-20 03:58:14
>>theamk+v9
My issue isn't with the inference step or even the reading step, it's the fetching step.
replies(1): >>montan+4d
◧◩◪◨⬒⬓⬔
13. montan+4d[view] [source] [discussion] 2022-10-20 04:12:47
>>chaps+5c
How are you doing online ML inference, without fetching data?
◧◩◪◨
14. xapata+0e[view] [source] [discussion] 2022-10-20 04:25:21
>>montan+gb
Why can't Python be 1 tier? It's a general-purpose, extensible language. It can do anything that PostgreSQL can do.
◧◩
15. chaps+Af[view] [source] [discussion] 2022-10-20 04:43:28
>>montan+02
For anyone who skips the intro and just goes to the results, this is what they see: https://imgur.com/tEK73e8

A suggestion: clean up the blog post's charts and headers to make it much, much more clear that what's being compared isn't python vs postgresml.

replies(1): >>montan+qh
◧◩◪
16. montan+qh[view] [source] [discussion] 2022-10-20 05:04:03
>>chaps+Af
Another suggestion: Don't build you identity around a language or platform. They come and go. Except SQL. It's been around for longer than either of us.
replies(1): >>chaps+Ph
◧◩◪◨
17. chaps+Ph[view] [source] [discussion] 2022-10-20 05:09:32
>>montan+qh
Agreed, which is why I use postgres for most of my work unless I can't avoid it.
◧◩◪◨
18. FreakL+km[view] [source] [discussion] 2022-10-20 05:56:12
>>vasco+k3
You don't need those hops if you use Python either. Python runs inside Postgres.

https://www.postgresql.org/docs/current/plpython.html

Naturally Rust or C functions will still be faster.

replies(1): >>levkk+fQ1
◧◩
19. deepst+6z[view] [source] [discussion] 2022-10-20 08:29:48
>>montan+02
that is reason many older developer tend to do everything biz logic etc all in db store procedure/functions/view, etc. The cost of getting the data is native, no connection pooling needed, and with V8/python integration in the PG, it is non trivial what language you use. If you are dealing with large amount of data in a db, why not just do everything there. DB like sql has cursor, merge, that makes manipulating large set of data much easier than moving it on to another language environment.
◧◩◪◨⬒
20. levkk+fQ1[view] [source] [discussion] 2022-10-20 16:23:52
>>FreakL+km
PostgresML v1.0 was doing exactly that. When we rewrote in Rust for v2.0, we improved 35x: https://postgresml.org/blog/postgresml-is-moving-to-rust-for...
[go to top]