Why not write simple SQL queries and use another language to do the transformations?
Are SQL engines really more efficient at filtering/matching/aggregating data when doing complex queries? Doesn't working without reusable blocks / tests / logs make development harder?
Syntax is one thing, but actual performance (and safety/maintenance) is another deal?
My users fell into (for the purposes of this discussion) three categories:
1. Analysts who prefer sheets
2. Data scientists that prefer pandas
3. Engineers who prefer C++/Java/JavaScript/Python
I'm fairly sure SQL isn't the first choice for any of them, but in all three cases a modern vectorized SQL engine will be the fastest option for expressing and executing many analysis and ETL tasks, especially when the datasets don't fit on a single machine. It's also easier to provide a shared pool of compute to run SQL than arbitrary code, especially with low latency.
Even as a query engine developer, I would prefer using a SQL engine. Performing even the basic optimizations a modern engine would perform -- columnar execution, predicate pushdown, pre-aggregation for shuffles, etc -- would be at least a week of work for me. A bit less if I built up a large library to assist.
My opinion is hardly uncommon. If you read over https://www.reddit.com/r/datascience/comments/c3lr9n/am_i_th... you will find many in agreement. Of those who "like" Pandas, it is often only a relative comparison to something worse.
The problems of the Pandas API were not intrinsic nor unavoidable. They were poor design choices probably caused by short-term thinking or a lack of experience.
Polars is a tremendous improvement.