zlacker

[parent] [thread] 10 comments
1. singro+(OP)[view] [source] 2024-08-24 16:34:25
In the query plan, filtering before or after an aggregation is the same, so it's a strange quirk that SQL requires a different word.
replies(2): >>0cf861+12 >>andy80+s2
2. 0cf861+12[view] [source] 2024-08-24 16:50:46
>>singro+(OP)
Indeed. Just as I think git’s N different ways to refer to the same operation was a blunder.
replies(1): >>andy80+K2
3. andy80+s2[view] [source] 2024-08-24 16:55:38
>>singro+(OP)
I was not there at the original design decisions of the language, but I imagine it was there specifically to help the person writing/editing the query easily recognize and interpret filtering before or after an aggregation. The explicitness makes debugging a query much easier and ensures it fails earlier. I don't see much reason to stop distinguishing one use case from the other, I'm not sure how that helps anything.
replies(3): >>0cf861+Y3 >>yen223+QP >>singro+bl1
◧◩
4. andy80+K2[view] [source] [discussion] 2024-08-24 16:57:29
>>0cf861+12
But pre- and post- aggregation filtering is not really "the same" operation.
replies(1): >>0cf861+73
◧◩◪
5. 0cf861+73[view] [source] [discussion] 2024-08-24 17:01:29
>>andy80+K2
If I use a CTE and filter the aggregate, feels the same to me.
replies(1): >>andy80+ed
◧◩
6. 0cf861+Y3[view] [source] [discussion] 2024-08-24 17:10:04
>>andy80+s2
I think this stems from the non-linear approach to reading a SQL statement. If it were top-to-bottom linear, like PRQL, then the distinction does not seem merited. It would then always be filtering from what you have collected up to this line.
◧◩◪◨
7. andy80+ed[view] [source] [discussion] 2024-08-24 18:20:20
>>0cf861+73
If you perform an aggregation query in a CTE, then filter on that in a subsequent query, that is different, because you have also added another SELECT and FROM. You would use WHERE in that case whether using a CTE or just an outer query on an inner subquery. HAVING is different from WHERE because it filters after the aggregation, without requiring a separate query with an extra SELECT.
replies(1): >>RaftPe+Vz
◧◩◪◨⬒
8. RaftPe+Vz[view] [source] [discussion] 2024-08-24 21:21:58
>>andy80+ed
> HAVING is different from WHERE because it filters after the aggregation, without requiring a separate query with an extra SELECT.

Personally I rarely use HAVING and instead use WHERE with subqueries for the following reasons:

1-I don't like repeating/duplicating a bunch of complex calcs, easier to just do WHERE in outer query on result

2-I typically have outer queries anyway for multiple reasons: break logic into reasonable chunks for humans, also for join+performance reasons (to give the optimizer a better chance at not getting confused)

replies(1): >>sgarla+242
◧◩
9. yen223+QP[view] [source] [discussion] 2024-08-24 23:09:37
>>andy80+s2
I think the original sin here is not making aggregation an explicitly separate thing, even though it should be. Adding a count(*) fundamentally changes what the query does, and what it returns, and what restrictions apply.
◧◩
10. singro+bl1[view] [source] [discussion] 2024-08-25 05:06:31
>>andy80+s2
I also wasn't there, but I think this actually wasn't to help authors and instead was a workaround for the warts of SQL. It's a pain to write

    SELECT * FROM (SELECT * FROM ... GROUP BY ...) t WHERE ...
and they decided this was common enough that they would introduce a HAVING clause for this case

    SELECT * FROM ... GROUP BY ... HAVING ...
But the real issue is that in order to make operations in certain orders, SQL requires you to use subselects, which require restating a projection for no reason and a lot of syntactical ceremony. E.g. you must give the FROM item a name (t), but it's not required for disambiguation.

Another common case is projecting before the filter. E.g. you want to reuse a complicated expression in the SELECT and WHERE clauses. Standard SQL requires you to repeat it or use a subselect since the WHERE clause is evaluated first.

◧◩◪◨⬒⬓
11. sgarla+242[view] [source] [discussion] 2024-08-25 13:54:17
>>RaftPe+Vz
The main (only?) task I routinely use HAVING for is finding duplicates.
[go to top]