zlacker

[return to "PostgresML is 8-40x faster than Python HTTP microservices"]
1. learnd+Mk[view] [source] 2022-10-20 04:43:36
>>redbel+(OP)
Python can be fast if you don't intentionally cripple it. Doing the following will be most likely a lot faster than postgresml:

- replace json (storing data as strings? really?) with a binary format like protobuf, or better yet parquet

- replace redis with duckdb for zero-copy reads

- replace pandas with polars for faster transformations

- use asynchronous, modern web framework for microservices like fastAPI

- Tune xgboost CPU resource usage with semaphores

◧◩
2. lmeyer+1G[view] [source] 2022-10-20 08:46:45
>>learnd+Mk
agreed, reading this article was confusing, the python baseline is far from our reality

for reference, we're aiming for 1-100 GB / second, per server, in our python etl+ml+viz pipelines

interestingly, duckdb+polars are nice for small non-etl/ml perf, but once it's analytical processing, we use cudf / dask_cudf for much more perf per watt / $. I'd love the low overhead & typing benefits of polars, but as soon as you start looking at GB+/s and occasional bigger-than-memory, the core sw+hw needs to change a bit, end-to-end

(and if folks are into graph-based investigations, we're hiring backend/infra :) )

[go to top]