For example, in NLP a huge amount of pre and post processing of data is needed outside of the GPU.
The python runtime is slow in general. But anyone using it for ML is not actually using the python runtime to do any of the heavy lifting. All of the popular ML/Ai libraries for python like tensorflow, pytorch, numpy, etc. are just thin python wrappers on top of tens of thousands of lines of C/C++ code. People just use python because it's easy and there's a really good ecosystem of tools and libraries.
The Polars homepage links to the "Database-like ops benchmark" of {Polars, data.table, DataFrames.jl, ClickHouse, cuDF*, spark, (py)datatable, dplyr, pandas, dask, Arrow, DuckDB, Modin,} but not yet PostgresML? https://h2oai.github.io/db-benchmark/
it depends on your task, if you have large language model, bottleneck likely be in ML part. It could be pre/post-processing if model is shallow.
Or did you really write layers of for loops in Python?
If python is fast enough for your case, then fair enough. And yes, it is fast enough for a lot of cases out there. Especially, for example, if you batch requests.
Sure, Python can make you start fast with any ML project, but when you have to deal with heavy-duty tasks, a switch to pure C++/Rust/Any-Compiled-Language implementations might be a good investment in terms of performance and cost-savings, especially if the above heavy tasks are done in any cloud platform