zlacker

[parent] [thread] 3 comments
1. minkle+(OP)[view] [source] 2024-08-29 07:30:39
That is basically R with tidyverse.

  flights |>
    filter(
      carrier == "UA",
      dest %in% c("IAH", "HOU"),
      sched_dep_time > 0900,
      sched_arr_time < 2000
      ) |>
    group_by(flight) |>
    summarize(
      delay = mean(arr_delay, na.rm = TRUE),
      cancelled = sum(is.na(arr_delay)),
      n = n()
      ) |>
    filter(n > 10)
If you haven't used R, it has some serious data manipulation legs built into it.
replies(2): >>dan-ro+O3 >>countr+Ec
2. dan-ro+O3[view] [source] 2024-08-29 08:12:27
>>minkle+(OP)
An interesting thing to me about all these dplyr-style syntaxes is that Wickham thinks the group_by operator was a design mistake. In modern dplyr you can often specify a .by on an operation instead. I found switching to this style a pretty easy adjustment, and I think it’s a bit better. Example:

  d |> filter(id==max(id),.by=orderId)
I think PRQL were thinking a bit about ways to avoid a group_by operation and I think what they have is a kind of ‘scoped’ or ‘higher order’ group_by operation which takes your grouping keys and a pipeline and outputs a pipeline step that applies the inner pipeline to each group.
replies(1): >>_Winte+q9
◧◩
3. _Winte+q9[view] [source] [discussion] 2024-08-29 09:12:03
>>dan-ro+O3
Given 10 more years dplyr syntax might resemble data.table's
4. countr+Ec[view] [source] 2024-08-29 09:50:35
>>minkle+(OP)
My thoughts exactly, it even uses the same pipe syntax, though I do prefer `%>%`. I've been avoiding SQL for a while now as it feels so clunky next to the tidyverse
[go to top]