Pipe Syntax in SQL

>>legran+(OP)
I'm glad to see pipe syntax moving forward! In my time at Google, I wrote a lot of SQL queries that required a long sequence of intermixed filtering, grouping, aggregation, and analytic window operations. But the ordering of those operations rarely matched SQL's inside-out syntax, so I commonly had to chop up my logic and glue it into multiple chained CTEs. This busywork made my code's logic harder to follow and was frustrating, especially since I was familiar with alternative coding models, such as R's dplyr, that would have allowed me to express my logic as a simple linear pipeline.

I hope SQL pipes become commonplace so that I can use them anywhere I have to write SQL queries.

>>tmoert+69
What’s wrong with CTEs though? I have never thought of them as busywork and start with them if I know my query is going to be a multi-step process. I already thought them as “pipes” (in the abstract, not related to this paper). If the query flow is linear so is the sequence of CTEs.

The only slightly annoying thing with CTEs is that you always have to name them. You might consider this an overhead if the flow is strictly linear. But when the flow is not linear (eg when doing two different aggregations over the same data, to join them later) you need a way to refer to the intended CTE, so always requiring a name does add some consistency and doesn’t seem a big deal overall.

>>geertj+wH
> What’s wrong with CTEs though?

Depends on DB engines I suppose. I've come across that certain operations were not allowed in CTEs, and they can be an optimization barrier.

However if your query is dynamically modified at runtime, then CTEs can be a no-go. For example, we have a grid component which first does a count and then only selects the visible rows. This is great if you have expensive subselects as columns in a large table. However to do the counting it turns the main query into a sub-query, and it doesn't handle CTEs.

>>magica+yI
Understood. I should have asked my question a bit more specifically: what's wrong with CTEs that wouldn't be an issue with this new pipe syntax. I briefly scanned the paper and it appears there aren't any specific benefits to the pipe syntax that would make optimization easier. So we can expect that if a SQL engine doesn't optimize CTEs well it would likely have the same limitations for the pipe syntax.

Section 2.1.4 the paper lists the benefits of the pipe syntax over CTEs, and they are all based on ergonomics. As someone who has never had issues with the ergonomics of CTEs I must say I am not convinced that proposed syntax is better. It may be that I've been doing SQL for so long that I don't see its warts. Overall SQL feels like a very well designed and consistent language to me. The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

zlacker