https://static.simonwillison.net/static/2024/Pipe-Syntax-In-...
...okay, if I dial-back my feelings of resignation to mediocrity, then I'll admit that Google probably does have enough clout to make this go somewhere - but they'd need to add this to all their database offerings (BigQuery, Spanner, Firebase's SQL mode) and contribute patches to Postgres and MySQL/Maria - maybe after Microsoft relents a decade later to add it to MSSQL we'll maybe start to see Oracle's people refer to it vaguely as a nice-to-have they'll implement only after they start losing more blue-chip customers[1].
Also, it's giving me M (Excel PowerQuery) vibes too.
-------
[1]For context, Oracle's DB lacked a `bit`/`bool` column type for the past 40 years until last year. People had to use `char(1)` columns with CHECK constraints to store '0'/'1' - or worse: 'T'/'F' or 'Y'/'N' (see https://stackoverflow.com/a/3726846/159145 )
I don't love the multiple WHEREs.
For example, I recently wrote an article about taking random samples using SQL. Even though I was writing it for my blog, which is HTML, I proofread the article by rendering it as a PDF doc, printing it out, and reviewing it with a blue pen in hand.
What surprised me is that I also found it easier to review the article on the screen when it was in PDF format. TeX just does a way better job of putting words on a page than does a web browser.
Actually, if you want to do the comparison yourself, I'll put both versions online:
HTML: https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....
PDF: https://blog.moertel.com/images/public_html/blog/pix-2024060...
I don't think either version is hard to read, but if I had my choice, I'd read the PDF version. But maybe that's just me.
Let me know which you prefer.
https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-s...
I disagree that the paper not mentioning ‘OVER’ implies that the paper authors secretly think pipe syntax is a bad idea. They probably just wanted to keep the paper concise, or forgot about that one less-used bit of syntax.
Do you think that ‘OVER’ keyword implies something fundamentally wrong about pipe syntax? If so, how?
It definitely makes things easier to follow, but only for linear, ie. single table, transformations. The moment joins of multiple tables come into the picture things become hairy quick and then you actually start to appreciate the plain old sql which accounts for exactly this and allows you to specify column aliases in the entire cte clause. With this piping you lose scope of the table aliases and then you have to use weird hacks like mangling names of the joined in table in polars.
For single table processing the pipes are nice though. Especially eliminating the need for multiple different keywords for filter based on the order of execution (where, having, qualify (and pre-join filter which is missing)).
A missed opportunity here is the redundant [AGGREGATE sum(x) GROUP BY y]. Unless you need to specify rollups, [AGGREGATE y, sum(x)] is a sufficient syntax for group bys and duckdb folks got it right in the relational api.
Yes, it's an extension (available by default), which means you can freely mix with regular SQL and use pipes for just parts of your query.
https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-s...
> Pipe syntax can be mixed with standard syntax in the same query. For example, subqueries can use different syntax from the parent query.
My opinion is hardly uncommon. If you read over https://www.reddit.com/r/datascience/comments/c3lr9n/am_i_th... you will find many in agreement. Of those who "like" Pandas, it is often only a relative comparison to something worse.
The problems of the Pandas API were not intrinsic nor unavoidable. They were poor design choices probably caused by short-term thinking or a lack of experience.
Polars is a tremendous improvement.