zlacker

PostgresML is 8-40x faster than Python HTTP microservices

submitted by redbel+(OP) on 2022-10-20 00:45:32 | 116 points 63 comments
[view article] [source] [links] [go to bottom]
replies(14): >>brecke+x4 >>chaps+b5 >>habibu+Y5 >>davewb+u7 >>koyani+c8 >>a-dub+D8 >>anonu+89 >>localh+d9 >>learnd+Mk >>urcyan+co >>johndo+4q >>8bitwh+cs >>atoav+uv >>rcarmo+8x
1. brecke+x4[view] [source] 2022-10-20 01:38:35
>>redbel+(OP)
Nice to see! Would have liked to have seen a more useful application of XGBoost, but I hope this helps alleviate the Python monopoly on ML.
replies(1): >>learnd+el
2. chaps+b5[view] [source] 2022-10-20 01:45:53
>>redbel+(OP)

  "In Python, most of the bottleneck comes from having to fetch and deserialize Redis data."
This isn't a fair comparison. Of freaking course postgres would be faster if it's not reaching out to another service.
replies(3): >>montan+b7 >>redhal+c7 >>ta2234+Of
3. habibu+Y5[view] [source] 2022-10-20 01:53:54
>>redbel+(OP)
Python is slow for ML. People will take time to realize it. The claim that most of the work is done in GPU -- covers only a small fraction of cases.

For example, in NLP a huge amount of pre and post processing of data is needed outside of the GPU.

replies(6): >>__mhar+l6 >>minhaz+6f >>riku_i+5i >>est+Vp >>wdroz+bC >>raverb+MF
◧◩
4. __mhar+l6[view] [source] [discussion] 2022-10-20 01:58:24
>>habibu+Y5
Spacy is much faster on the GPU. Many folks don't know that Cudf (a Pandas implementation for GPUs) parallelizes string operations (these are notoriously slow on Pandas)... shrug...
replies(1): >>westur+vg
◧◩
5. montan+b7[view] [source] [discussion] 2022-10-20 02:09:07
>>chaps+b5
As a contributor, I think it's interesting when comments focus on the language (Python vs Rust) vs the architecture (local vs remote). Inference is embarrassingly parallelizable, with Python Flask or Postgres replicas. I think the interesting thing is that data retrieval costs tend to dominate other costs, and yet are often ignored.

ML algorithms get a lot focus and hype. Data retrieval, not as much.

replies(2): >>chaps+Lk >>deepst+hE
◧◩
6. redhal+c7[view] [source] [discussion] 2022-10-20 02:09:12
>>chaps+b5
Yes, that's essentially the point being made here. It's a fair comparison if your intent is to run this kind of job as quickly as possible.
replies(2): >>chaps+P7 >>pushed+4a
7. davewb+u7[view] [source] 2022-10-20 02:12:31
>>redbel+(OP)
Is it easy to install on aws aurora postgresql?
replies(1): >>grzm+S7
◧◩◪
8. chaps+P7[view] [source] [discussion] 2022-10-20 02:16:17
>>redhal+c7
No it's not. It tells me exactly nothing useful about the postgresml performance because it's impossible for me to rule out redis and the http server when factoring in performance. It's two hops, with a guaranteed delay that the postgres setup won't have.

If they wanted it to be a fair comparison they should have used FDWs to connect to the same redis and http server that the python benchmarks tested against.

replies(2): >>vasco+v8 >>darksa+Mf
◧◩
9. grzm+S7[view] [source] [discussion] 2022-10-20 02:16:29
>>davewb+u7
It looks like it's a PostgreSQL extension, and probably not one supported by AWS RDS for PostgreSQL or Aurora PostgreSQL. AWS generally only supports extensions that ship with PostgreSQL (and maybe some limited third-party extensions?). The lists of supported extensions are here:

* https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLRelea...

* https://docs.aws.amazon.com/AmazonRDS/latest/AuroraPostgreSQ...

replies(1): >>deepst+rE
10. koyani+c8[view] [source] 2022-10-20 02:19:36
>>redbel+(OP)
Very interesting architecture. Article mentions XGBoost but what if I want to run another kind of algorithm? How does PostgresML support that use case?
replies(2): >>sanxiy+mc >>levkk+li
◧◩◪◨
11. vasco+v8[view] [source] [discussion] 2022-10-20 02:22:32
>>chaps+P7
The point is you don't need those hops if you use postgresML.

It's like if I told you to move to a place where you can walk 5mins to work, and you tell me it's not a fair comparison because right now you have to drive to the station and then get on a train and you're interested in a comparison where you walk to the train instead. You don't need the train because you're already there!

You don't need the network hops exactly because the data is already there in the right way.

replies(2): >>chaps+c9 >>FreakL+vr
12. a-dub+D8[view] [source] 2022-10-20 02:23:27
>>redbel+(OP)
i always thought that elasticsearch would make a good host for a ML enabled datastore. building indices and searching them are similar computational paradigms to training and inference and the scaling framework would lend itself well to both computation heavy training and query/inference.

although i dunno if it has good support for lots of floats. and i guess all the ml code would have to be java.

13. anonu+89[view] [source] 2022-10-20 02:29:05
>>redbel+(OP)
Kinda duh if your data never leaves the process space. Should be faster IMO...
replies(1): >>Jweb_G+hh
◧◩◪◨⬒
14. chaps+c9[view] [source] [discussion] 2022-10-20 02:29:55
>>vasco+v8
I get the point of the post, but I still don't see how it's remotely useful to understand the performance in postgresml, as someone who's interested in using it for my own tooling. Maybe I don't spend enough time in ML space to know how often they use HTTP/redis in their flows as much as they do. Most of my stuff is just data on-disk, where adding two additional services would be embarrassingly overkill.

Don't you think it would be incredibly useful as a baseline if they included a third test with FDWs against redis and the http server?

replies(1): >>theamk+Ge
15. localh+d9[view] [source] 2022-10-20 02:30:08
>>redbel+(OP)
It seems like Postgres isn't really doing anything here in the benchmark besides acting as a host for XGBoost? It needs to load the model parameters from the database whereas the Python microservice is reading the model parameters from a model.bin file in the filesystem. Both are one-time costs presumably (I'm guessing the SQL that loads the model keeps the model around in memory which seems reasonable given its performance gap to Python).

So it seems like what is needed is a better host for XGBoost models instead of having to install, maintain and launch an entire database? Or am I missing something here?

replies(1): >>montan+Aa
◧◩◪
16. pushed+4a[view] [source] [discussion] 2022-10-20 02:40:00
>>redhal+c7
I also don’t think it’s a fair comparison. There’s nothing stopping me from loading the model into the memory of each Flask process (or some shmem), and getting the same performance or possibly better than the Postgres implementation, considering coroutines are being used in the Python case.

Calling this Postgres vs Flask is misleading at best. It’s more like “1 tier architecture vs 2 tier architecture”

replies(1): >>montan+rg
◧◩
17. montan+Aa[view] [source] [discussion] 2022-10-20 02:48:26
>>localh+d9
I think what you're missing is that XGBoost is worthless without data to use for inference. That data can come from in process, or over the wire. One is fast, one is not.
replies(1): >>theamk+Rf
◧◩
18. sanxiy+mc[view] [source] [discussion] 2022-10-20 03:09:09
>>koyani+c8
You would need to add support to PostgresML. It doesn't seem to have an extension mechanism, maybe in the future. It seems easy enough, and you can use any Python libraries through Rust's PyO3 binding. PostgresML includes an example using HuggingFace transformers.
replies(1): >>koyani+ah
◧◩◪◨⬒⬓
19. theamk+Ge[view] [source] [discussion] 2022-10-20 03:31:05
>>chaps+c9
Are there any other FDWs that do ML inference?

Remember, this is not plain file serving -- this is actually invoking XGBoost library which does complex mathematical operations. The user does not get data from disk, they get inference results.

Unless you know of any other solution which can invoke XGBoost (or some other inference library), I don't see anything "embarrassingly overkill" there.

replies(1): >>chaps+gh
◧◩
20. minhaz+6f[view] [source] [discussion] 2022-10-20 03:33:41
>>habibu+Y5
> Python is slow for ML

The python runtime is slow in general. But anyone using it for ML is not actually using the python runtime to do any of the heavy lifting. All of the popular ML/Ai libraries for python like tensorflow, pytorch, numpy, etc. are just thin python wrappers on top of tens of thousands of lines of C/C++ code. People just use python because it's easy and there's a really good ecosystem of tools and libraries.

replies(1): >>madduc+cS
◧◩◪◨
21. darksa+Mf[view] [source] [discussion] 2022-10-20 03:38:59
>>chaps+P7
The article very clearly was pointing out that the performance advantage of postgresml comes from the architecture. Hell, they are even using the same exact algorithm. What benefit is to be had from benchmarking the same algorithm on the same architecture? Do we also need to make sure Teslas have internal combustion engines when we compare their performance to ICE cars?
◧◩
22. ta2234+Of[view] [source] [discussion] 2022-10-20 03:39:30
>>chaps+b5
Further in their methodology they wrap a microservice around python and redis (which is doing an unmarshalling from redis and a marshalling to json) but they're not doing that with postgres.

In fact, as far as I can tell, postgres is not running as a microservice here. The data still has to be marshalled into some output other services can use.

◧◩◪
23. theamk+Rf[view] [source] [discussion] 2022-10-20 03:40:27
>>montan+Aa
Well, imagine nginx plugin that runs XGBoost. Or even standalone Rust/C++ microservice which provides XGBoost via standard http interface. The data might come from filesystem, or loaded from network location on startup/reload and kept in memory.

Basically, postgresql is a stateful service, and stateful services are always major pain to manage -- you need to back them up, migrate, think about scaling... Sometimes they are inevitable, but that does not seem to be the case here.

If you have CI/CD set up, and do frequent deploys, it will be much easier and more reproducible to include models in build artifact and have them loaded from filesystem along with the rest of the code.

replies(1): >>montan+Xh
◧◩◪◨
24. montan+rg[view] [source] [discussion] 2022-10-20 03:47:32
>>pushed+4a
You get it. 1 tier is better than 2 tier. Python can't be 1 tier, unless it loads the full dataset which is not generally feasible for production online inference cases. PostgresML is 1 tier, and supports the traditional Python use cases.
replies(1): >>xapata+bj
◧◩◪
25. westur+vg[view] [source] [discussion] 2022-10-20 03:48:06
>>__mhar+l6
Apache Ballista and Polars do Apache Arrow and SIMD.

The Polars homepage links to the "Database-like ops benchmark" of {Polars, data.table, DataFrames.jl, ClickHouse, cuDF*, spark, (py)datatable, dplyr, pandas, dask, Arrow, DuckDB, Modin,} but not yet PostgresML? https://h2oai.github.io/db-benchmark/

◧◩◪
26. koyani+ah[view] [source] [discussion] 2022-10-20 03:57:14
>>sanxiy+mc
That makes sense. Thanks.
◧◩◪◨⬒⬓⬔
27. chaps+gh[view] [source] [discussion] 2022-10-20 03:58:14
>>theamk+Ge
My issue isn't with the inference step or even the reading step, it's the fetching step.
replies(1): >>montan+fi
◧◩
28. Jweb_G+hh[view] [source] [discussion] 2022-10-20 03:58:34
>>anonu+89
Right, so why not leave it there? I believe that is the point here, there's no trickery going on.
◧◩◪◨
29. montan+Xh[view] [source] [discussion] 2022-10-20 04:08:39
>>theamk+Rf
Stateful services are indeed more painful to manage than non stateful ones. Ignoring state (data fetch time) for ML as if the model artifact is the only important component is... not a winning strategy.
◧◩
30. riku_i+5i[view] [source] [discussion] 2022-10-20 04:11:09
>>habibu+Y5
> For example, in NLP a huge amount of pre and post processing of data is needed outside of the GPU.

it depends on your task, if you have large language model, bottleneck likely be in ML part. It could be pre/post-processing if model is shallow.

◧◩◪◨⬒⬓⬔⧯
31. montan+fi[view] [source] [discussion] 2022-10-20 04:12:47
>>chaps+gh
How are you doing online ML inference, without fetching data?
◧◩
32. levkk+li[view] [source] [discussion] 2022-10-20 04:14:25
>>koyani+c8
We have LightGBM also. The entire Scikit library is available (although via a Python wrapper) and a couple algorithms from Linfa (a Rust Scikit rewrite). We will be adding more algorithms and move to Rust entirely for v3.0.
replies(1): >>koyani+Ki1
◧◩◪◨⬒
33. xapata+bj[view] [source] [discussion] 2022-10-20 04:25:21
>>montan+rg
Why can't Python be 1 tier? It's a general-purpose, extensible language. It can do anything that PostgreSQL can do.
◧◩◪
34. chaps+Lk[view] [source] [discussion] 2022-10-20 04:43:28
>>montan+b7
For anyone who skips the intro and just goes to the results, this is what they see: https://imgur.com/tEK73e8

A suggestion: clean up the blog post's charts and headers to make it much, much more clear that what's being compared isn't python vs postgresml.

replies(1): >>montan+Bm
35. learnd+Mk[view] [source] 2022-10-20 04:43:36
>>redbel+(OP)
Python can be fast if you don't intentionally cripple it. Doing the following will be most likely a lot faster than postgresml:

- replace json (storing data as strings? really?) with a binary format like protobuf, or better yet parquet

- replace redis with duckdb for zero-copy reads

- replace pandas with polars for faster transformations

- use asynchronous, modern web framework for microservices like fastAPI

- Tune xgboost CPU resource usage with semaphores

replies(5): >>montan+jl >>jb_ger+Cl >>akx+nD >>lmeyer+1G >>levkk+WX1
◧◩
36. learnd+el[view] [source] [discussion] 2022-10-20 04:49:18
>>brecke+x4
"ML" was never monopolized by Python. Boosted decision trees, as the post demonstrates, are commonly done in Matlab, R, or Julia. Deep learning however is 99% Python interface.
◧◩
37. montan+jl[view] [source] [discussion] 2022-10-20 04:50:32
>>learnd+Mk
Low effort comment that didn't read the post.

- Multiple formats were compared

- Duckdb is not a production ready service

- Pandas isn't used

You seem to be trolling.

replies(1): >>learnd+Ol
◧◩
38. jb_ger+Cl[view] [source] [discussion] 2022-10-20 04:53:55
>>learnd+Mk
+1 and Gunicorn with an ASGI server/Uvicorn
◧◩◪
39. learnd+Ol[view] [source] [discussion] 2022-10-20 04:56:55
>>montan+jl
How would I be able to respond to the post in detail if I didn't read it? What a bizarre, defensive response. To address your points:

- Multiple formats were compared

Yes, but not a zero-copy or efficient format, like flatbuffer. It was mentioned as one of the highlights of postgresML:

> PostgresML does one in-memory copy of features from Postgres

> - Duckdb is not a production ready service

What issues did you have with duckdb? Could use some other in-memory store like Plasma if you don't like duckdb.

> - Pandas isn't used

that was responding to the point in the post:

> Since Python often uses Pandas to load and preprocess data, it is notably more memory hungry. Before even passing the data into XGBoost, we were already at 8GB RSS (resident set size); during actual fitting, memory utilization went to almost 12GB.

> You seem to be trolling.

By criticizing the blog post?

◧◩◪◨
40. montan+Bm[view] [source] [discussion] 2022-10-20 05:04:03
>>chaps+Lk
Another suggestion: Don't build you identity around a language or platform. They come and go. Except SQL. It's been around for longer than either of us.
replies(1): >>chaps+0n
◧◩◪◨⬒
41. chaps+0n[view] [source] [discussion] 2022-10-20 05:09:32
>>montan+Bm
Agreed, which is why I use postgres for most of my work unless I can't avoid it.
42. urcyan+co[view] [source] 2022-10-20 05:23:34
>>redbel+(OP)
If you are trying to compare the performance as a ML service, maybe you should try to compare it with other ML model serving frameworks like https://github.com/mosecorg/mosec or https://github.com/bentoml/BentoML. Flask/FastAPI are not built for ML services.
◧◩
43. est+Vp[view] [source] [discussion] 2022-10-20 05:42:09
>>habibu+Y5
That depends whether you include numpy as CPython or not.

Or did you really write layers of for loops in Python?

44. johndo+4q[view] [source] 2022-10-20 05:43:15
>>redbel+(OP)
This benchmark is not very useful. To get any real insights, you'd have to benchmark every single line of the prediction function (called "api") to see where the slowdown is actually coming from https://github.com/postgresml/postgresml/blob/15c8488ade86b0...

Everything else is just speculation.

◧◩◪◨⬒
45. FreakL+vr[view] [source] [discussion] 2022-10-20 05:56:12
>>vasco+v8
You don't need those hops if you use Python either. Python runs inside Postgres.

https://www.postgresql.org/docs/current/plpython.html

Naturally Rust or C functions will still be faster.

replies(1): >>levkk+qV1
46. 8bitwh+cs[view] [source] 2022-10-20 06:03:42
>>redbel+(OP)
Anything over 20x approaches the theoretical limit. And a services speed will only increase overall latency by the fraction it's used.
47. atoav+uv[view] [source] 2022-10-20 06:46:27
>>redbel+(OP)
There are many reasons to rely on python, speed is not one of them. You may still have speed when using python if you do it correctly, but this is more or less despite using python not because of it.

One reason for python in my eyes is maintainability: Well written python code can be easily understood and nearly as easily modified. Well written python code becomes close to what pseudo code would look like.

This is the reason python's weird jungle of dependency managment tools is so out of place for the language: It is a maintenance nightmare. I would describe myself as someone who is very able to deal with those problems, yet they are such an utter waste of time and energy.

48. rcarmo+8x[view] [source] 2022-10-20 07:03:50
>>redbel+(OP)
This is disingenuous since we’re not even looking at the same kind of TCP connection and request handling (using persistent connections in Python typically speeds things up, going async reduces concurrency overheads, etc.). But of course iteration through any kind of dataset to get a reply would take longer in Python as well when compared to C.
replies(1): >>levkk+TZ1
◧◩
49. wdroz+bC[view] [source] [discussion] 2022-10-20 08:05:04
>>habibu+Y5
In 2022, most people using NLP use transformers from huggingface. The tokenizer used is written in Rust and used transparently from Python.
◧◩
50. akx+nD[view] [source] [discussion] 2022-10-20 08:18:47
>>learnd+Mk
A good start would be to not do silly things like

    body = request.json
    key = json.dumps(body)
in the prediction code to begin with: https://github.com/postgresml/postgresml/blob/15c8488ade86b0...
replies(2): >>polski+ZN >>levkk+dY1
◧◩◪
51. deepst+hE[view] [source] [discussion] 2022-10-20 08:29:48
>>montan+b7
that is reason many older developer tend to do everything biz logic etc all in db store procedure/functions/view, etc. The cost of getting the data is native, no connection pooling needed, and with V8/python integration in the PG, it is non trivial what language you use. If you are dealing with large amount of data in a db, why not just do everything there. DB like sql has cursor, merge, that makes manipulating large set of data much easier than moving it on to another language environment.
◧◩◪
52. deepst+rE[view] [source] [discussion] 2022-10-20 08:31:52
>>grzm+S7
please just run postgresql in EC2, there are many presentation why that is preferred over RDS.
◧◩
53. raverb+MF[view] [source] [discussion] 2022-10-20 08:44:05
>>habibu+Y5
Premature optimization is an issue

If python is fast enough for your case, then fair enough. And yes, it is fast enough for a lot of cases out there. Especially, for example, if you batch requests.

◧◩
54. lmeyer+1G[view] [source] [discussion] 2022-10-20 08:46:45
>>learnd+Mk
agreed, reading this article was confusing, the python baseline is far from our reality

for reference, we're aiming for 1-100 GB / second, per server, in our python etl+ml+viz pipelines

interestingly, duckdb+polars are nice for small non-etl/ml perf, but once it's analytical processing, we use cudf / dask_cudf for much more perf per watt / $. I'd love the low overhead & typing benefits of polars, but as soon as you start looking at GB+/s and occasional bigger-than-memory, the core sw+hw needs to change a bit, end-to-end

(and if folks are into graph-based investigations, we're hiring backend/infra :) )

◧◩◪
55. polski+ZN[view] [source] [discussion] 2022-10-20 10:25:00
>>akx+nD
asking as a person that does not use Python every day - what would be a better solution here?
replies(1): >>manfre+oT
◧◩◪
56. madduc+cS[view] [source] [discussion] 2022-10-20 11:12:29
>>minhaz+6f
You forgot that there's also an overhead converting data from/to C++ and Python as well.

Sure, Python can make you start fast with any ML project, but when you have to deal with heavy-duty tasks, a switch to pure C++/Rust/Any-Compiled-Language implementations might be a good investment in terms of performance and cost-savings, especially if the above heavy tasks are done in any cloud platform

◧◩◪◨
57. manfre+oT[view] [source] [discussion] 2022-10-20 11:23:19
>>polski+ZN
request.json is converting the request payload from a str to a dict. json.dumps converts it back to a str.
◧◩◪
58. koyani+Ki1[view] [source] [discussion] 2022-10-20 13:46:04
>>levkk+li
Very cool.
◧◩◪◨⬒⬓
59. levkk+qV1[view] [source] [discussion] 2022-10-20 16:23:52
>>FreakL+vr
PostgresML v1.0 was doing exactly that. When we rewrote in Rust for v2.0, we improved 35x: https://postgresml.org/blog/postgresml-is-moving-to-rust-for...
◧◩
60. levkk+WX1[view] [source] [discussion] 2022-10-20 16:35:55
>>learnd+Mk
- We compared MessagePack as well, that's your typical binary format. It ended up being slower, which is what I've seen before when storing small floats (a typical ML feature). It's in the article with a whole section dedicated to why optimizing serializers won't help.

- I don't think doing one less `memcpy` will make Redis faster over the network.

- We didn't use Pandas during inference, only a Python list. You'd have to get pretty creative to do less work than that.

- That will use less CPU certainly, but I don't think it'll be faster because we still have to wait on a network resource to serve a prediction or on the GIL to deserialize the response.

- Tuning XGBoost is fun, but I don't think that's where the bottleneck is.

◧◩◪
61. levkk+dY1[view] [source] [discussion] 2022-10-20 16:37:13
>>akx+nD
If I turn that into a single line and that improves performance 40x... I will probably not do engineering for a while after that.
replies(1): >>learnd+803
◧◩
62. levkk+TZ1[view] [source] [discussion] 2022-10-20 16:43:42
>>rcarmo+8x
We're using persistent connections for Python/Gunicorn as well. See the Methodology section for more details.
◧◩◪◨
63. learnd+803[view] [source] [discussion] 2022-10-20 21:38:27
>>levkk+dY1
The parent comment said it would be "a good start". It's like adding sleep(1000) to a benchmark to purposely make it look worse than your own product.
[go to top]