Pipe Syntax in SQL

>>legran+(OP)
reminds me PRQL https://prql-lang.org

and SPL from Splunk

>>legran+(OP)
To see how well it works, I uploaded the PDF to Google AI Studio and ran the prompt "Convert this document to neatly styled semantic HTML" against the gemini-1.5-pro-exp-0801 model - the result is actually pretty good!

https://static.simonwillison.net/static/2024/Pipe-Syntax-In-...

>>legran+(OP)
The first-page of the paper has 13 co-authors listed - but all with the same affiliation ("Google, Inc") - so this is ultimately a single-vendor making a unilateral proposal to break with the past - which means I'm confident this proposal won't be gracing the pages of the ISO/IEC 9075 (ISO SQL) standards in my lifetime - no matter how badly we all need QoL improvements to SQL.

...okay, if I dial-back my feelings of resignation to mediocrity, then I'll admit that Google probably does have enough clout to make this go somewhere - but they'd need to add this to all their database offerings (BigQuery, Spanner, Firebase's SQL mode) and contribute patches to Postgres and MySQL/Maria - maybe after Microsoft relents a decade later to add it to MSSQL we'll maybe start to see Oracle's people refer to it vaguely as a nice-to-have they'll implement only after they start losing more blue-chip customers[1].

Also, it's giving me M (Excel PowerQuery) vibes too.

-------

[1]For context, Oracle's DB lacked a `bit`/`bool` column type for the past 40 years until last year. People had to use `char(1)` columns with CHECK constraints to store '0'/'1' - or worse: 'T'/'F' or 'Y'/'N' (see https://stackoverflow.com/a/3726846/159145 )

>>Cianti+06
IMO having SELECT before FROM is one of SQL's biggest mistakes. I would gladly welcome a new syntax that rectifies this. (Also https://duckdb.org/2022/05/04/friendlier-sql.html)

I don't love the multiple WHEREs.

>>0cf861+T9
Not to be outdone: https://thedailywtf.com/articles/what_is_truth_0x3f_

>>simonw+K5
Just my personal take, but when I have to read something carefully, I find it easier to do on paper.

For example, I recently wrote an article about taking random samples using SQL. Even though I was writing it for my blog, which is HTML, I proofread the article by rendering it as a PDF doc, printing it out, and reviewing it with a blue pen in hand.

What surprised me is that I also found it easier to review the article on the screen when it was in PDF format. TeX just does a way better job of putting words on a page than does a web browser.

Actually, if you want to do the comparison yourself, I'll put both versions online:

HTML: https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....

PDF: https://blog.moertel.com/images/public_html/blog/pix-2024060...

I don't think either version is hard to read, but if I had my choice, I'd read the PDF version. But maybe that's just me.

Let me know which you prefer.

>>LeonB+Bm
unfortunately KQL doesn't seem to have INSERT, UPDATE etc. support, it seems to be a pure query language for querying. Unless this strange different .insert syntax is what they intended for their language from the start? I don't know: https://learn.microsoft.com/en-us/kusto/management/data-inge...

>>legran+(OP)
Reminds me of Elixir’s Ecto library.

https://hexdocs.pm/ecto/crud.html#pipe-based-queries

>>richbe+O7
duckDB is what sql should be in 2024

https://duckdbsnippets.com/

>>jml7c5+ze
this is what you are looking for i believe (web archive format) https://en.wikipedia.org/wiki/WARC_(file_format)?oldformat=t...

>>simonw+P5
(This was originally a reply to >>41339138 but I detached it so the useful link could float higher in the thread - thanks!)

>>legran+(OP)
I've just update the EBNF for rarilroad diagram here https://github.com/google/zetasql/issues/134#issuecomment-14... with the full grammar.

>>datadr+go
Not just fast, it's web-scale

https://www.youtube.com/watch?v=b2F-DItXtZs

>>simonw+P5
I wrote up a few extra notes about how I did this here: https://simonwillison.net/2024/Aug/24/pipe-syntax-in-sql/

>>legran+(OP)
Google's open source ZetaSQL project added documentation covering pipe syntax last week, which I'm finding easier to follow than this paper:

https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-s...

>>richbe+O7
Duckdb also supports prql with an extension https://github.com/ywelsch/duckdb-prql

>>typede+1Y
ZetaSQL’s docs for the ‘OVER’ keyword you mention: https://github.com/google/zetasql/blob/master/docs/window-fu...

I disagree that the paper not mentioning ‘OVER’ implies that the paper authors secretly think pipe syntax is a bad idea. They probably just wanted to keep the paper concise, or forgot about that one less-used bit of syntax.

Do you think that ‘OVER’ keyword implies something fundamentally wrong about pipe syntax? If so, how?

>>legran+(OP)
This style is familiar to those writing dataframe logic in df libraries with sql semantics - spark, polars or duckdb relational (https://duckdb.org/docs/api/python/relational_api.html).

It definitely makes things easier to follow, but only for linear, ie. single table, transformations. The moment joins of multiple tables come into the picture things become hairy quick and then you actually start to appreciate the plain old sql which accounts for exactly this and allows you to specify column aliases in the entire cte clause. With this piping you lose scope of the table aliases and then you have to use weird hacks like mangling names of the joined in table in polars.

For single table processing the pipes are nice though. Especially eliminating the need for multiple different keywords for filter based on the order of execution (where, having, qualify (and pre-join filter which is missing)).

A missed opportunity here is the redundant [AGGREGATE sum(x) GROUP BY y]. Unless you need to specify rollups, [AGGREGATE y, sum(x)] is a sufficient syntax for group bys and duckdb folks got it right in the relational api.

>>sagarm+ZF
Re #2: I prefer https://pola.rs over Pandas

>>saltcu+xG
> this pipeline syntax to be mixed with regular compositional forms like CTEs and subqueries?

Yes, it's an extension (available by default), which means you can freely mix with regular SQL and use pipes for just parts of your query.

https://github.com/google/zetasql/blob/2024.08.2/docs/pipe-s...

> Pipe syntax can be mixed with standard syntax in the same query. For example, subqueries can use different syntax from the parent query.

>>typede+1Y
https://en.m.wikipedia.org/wiki/Window_function_(SQL)

They didn't add anything.

>>sagarm+sx1
In the early days, even as I appreciated what Pandas could do, I never found its API sane. Pandas has too many special cases and foot-guns. It is a notorious case of poor design.

My opinion is hardly uncommon. If you read over https://www.reddit.com/r/datascience/comments/c3lr9n/am_i_th... you will find many in agreement. Of those who "like" Pandas, it is often only a relative comparison to something worse.

The problems of the Pandas API were not intrinsic nor unavoidable. They were poor design choices probably caused by short-term thinking or a lack of experience.

Polars is a tremendous improvement.

>>almost+Gp
Something like the DOT language used in GraphViz

https://graphviz.org/doc/info/lang.html

>>legran+(OP)
I built a compiler based on the ideas from this paper: https://github.com/yazaldefilimone/spipe.

zlacker

Pipe Syntax in SQL