zlacker

[parent] [thread] 10 comments
1. learnd+(OP)[view] [source] 2022-10-20 04:43:36
Python can be fast if you don't intentionally cripple it. Doing the following will be most likely a lot faster than postgresml:

- replace json (storing data as strings? really?) with a binary format like protobuf, or better yet parquet

- replace redis with duckdb for zero-copy reads

- replace pandas with polars for faster transformations

- use asynchronous, modern web framework for microservices like fastAPI

- Tune xgboost CPU resource usage with semaphores

replies(5): >>montan+x >>jb_ger+Q >>akx+Bi >>lmeyer+fl >>levkk+aD1
2. montan+x[view] [source] 2022-10-20 04:50:32
>>learnd+(OP)
Low effort comment that didn't read the post.

- Multiple formats were compared

- Duckdb is not a production ready service

- Pandas isn't used

You seem to be trolling.

replies(1): >>learnd+21
3. jb_ger+Q[view] [source] 2022-10-20 04:53:55
>>learnd+(OP)
+1 and Gunicorn with an ASGI server/Uvicorn
◧◩
4. learnd+21[view] [source] [discussion] 2022-10-20 04:56:55
>>montan+x
How would I be able to respond to the post in detail if I didn't read it? What a bizarre, defensive response. To address your points:

- Multiple formats were compared

Yes, but not a zero-copy or efficient format, like flatbuffer. It was mentioned as one of the highlights of postgresML:

> PostgresML does one in-memory copy of features from Postgres

> - Duckdb is not a production ready service

What issues did you have with duckdb? Could use some other in-memory store like Plasma if you don't like duckdb.

> - Pandas isn't used

that was responding to the point in the post:

> Since Python often uses Pandas to load and preprocess data, it is notably more memory hungry. Before even passing the data into XGBoost, we were already at 8GB RSS (resident set size); during actual fitting, memory utilization went to almost 12GB.

> You seem to be trolling.

By criticizing the blog post?

5. akx+Bi[view] [source] 2022-10-20 08:18:47
>>learnd+(OP)
A good start would be to not do silly things like

    body = request.json
    key = json.dumps(body)
in the prediction code to begin with: https://github.com/postgresml/postgresml/blob/15c8488ade86b0...
replies(2): >>polski+dt >>levkk+rD1
6. lmeyer+fl[view] [source] 2022-10-20 08:46:45
>>learnd+(OP)
agreed, reading this article was confusing, the python baseline is far from our reality

for reference, we're aiming for 1-100 GB / second, per server, in our python etl+ml+viz pipelines

interestingly, duckdb+polars are nice for small non-etl/ml perf, but once it's analytical processing, we use cudf / dask_cudf for much more perf per watt / $. I'd love the low overhead & typing benefits of polars, but as soon as you start looking at GB+/s and occasional bigger-than-memory, the core sw+hw needs to change a bit, end-to-end

(and if folks are into graph-based investigations, we're hiring backend/infra :) )

◧◩
7. polski+dt[view] [source] [discussion] 2022-10-20 10:25:00
>>akx+Bi
asking as a person that does not use Python every day - what would be a better solution here?
replies(1): >>manfre+Cy
◧◩◪
8. manfre+Cy[view] [source] [discussion] 2022-10-20 11:23:19
>>polski+dt
request.json is converting the request payload from a str to a dict. json.dumps converts it back to a str.
9. levkk+aD1[view] [source] 2022-10-20 16:35:55
>>learnd+(OP)
- We compared MessagePack as well, that's your typical binary format. It ended up being slower, which is what I've seen before when storing small floats (a typical ML feature). It's in the article with a whole section dedicated to why optimizing serializers won't help.

- I don't think doing one less `memcpy` will make Redis faster over the network.

- We didn't use Pandas during inference, only a Python list. You'd have to get pretty creative to do less work than that.

- That will use less CPU certainly, but I don't think it'll be faster because we still have to wait on a network resource to serve a prediction or on the GIL to deserialize the response.

- Tuning XGBoost is fun, but I don't think that's where the bottleneck is.

◧◩
10. levkk+rD1[view] [source] [discussion] 2022-10-20 16:37:13
>>akx+Bi
If I turn that into a single line and that improves performance 40x... I will probably not do engineering for a while after that.
replies(1): >>learnd+mF2
◧◩◪
11. learnd+mF2[view] [source] [discussion] 2022-10-20 21:38:27
>>levkk+rD1
The parent comment said it would be "a good start". It's like adding sleep(1000) to a benchmark to purposely make it look worse than your own product.
[go to top]