zlacker

[parent] [thread] 5 comments
1. rscho+(OP)[view] [source] 2020-11-30 11:22:44
As a lone data wrangler, I am dreaming of an "APL like R", i.e. geared specifically towards data manipulation and stats with an integrated columnar store (I am spoilt by J).

People always say that such a thing will never fly in teams due to syntactic issues, but APL really is a productivity secret weapon for loners and small teams!

replies(1): >>chrisp+U4
2. chrisp+U4[view] [source] 2020-11-30 12:14:21
>>rscho+(OP)
Is it a matter of writing a comprehensive stats library?
replies(1): >>rscho+M6
◧◩
3. rscho+M6[view] [source] [discussion] 2020-11-30 12:36:01
>>chrisp+U4
A comprehensive stat library for J would be close (there is an RServe lib), but the most important thing is the integrated data store. For big companies, it makes perfect sense to want a separate data store. But for small shops or loners, tight language integration is light-years better! With J, I am allowed a real database (no effing .csv) where I can use the same (terse) language as for the analysis. This is the killer feature. This is where you see that say, for example Julia, is made for serious industrial coding with teams of tens of people and not lone guys.

Following the same principle, R dplyr allows you to seamlessly interact with the DB by using a translation layer. However, every time I open R I find myself having to write tens if not hundreds of loc to shape the data where I'd do that in a few lines of J. For single researchers, it's actually much easier to read your one-page of J code 6 months later than it is for your 500-loc R script (again IMO).

Although I imagine it could be possible to really make a specialized APL geared towards data analysis as a strict DSL (APL is not a DSL). Meaning for example, making it more static and therefore statically compileable at the expense of losing things such as first-class environments (namespaces) or the "execute" primitive. One could also specialize the notation further towards statistics. There really is a whole realm of possibilities, here!

In a word, there is a market for lone scientists. It would be nice to have tools for that market ;-)

replies(1): >>chrisp+n8
◧◩◪
4. chrisp+n8[view] [source] [discussion] 2020-11-30 12:48:27
>>rscho+M6
You didn't mention it, so for avoidance of doubt: have you heard of k?

Commercial k variants come with a columnar data store (see Kx's q/kdb+, Shakti's k9).

replies(1): >>rscho+ub
◧◩◪◨
5. rscho+ub[view] [source] [discussion] 2020-11-30 13:20:16
>>chrisp+n8
Yes, I heard of K which makes some things a little more convenient and others a little less by adopting the "list-of-vectors" model instead of true multidimensional arrays. K still is very much in the line of traditional APLs, though. I don't have extensive experience with K, but J with the Jd DB seems very closely related.

I was thinking of more radical changes that would certainly disappoint APL purists by exchanging some well-known and highly-regarded APL capabilities for a more specialized language. Maybe I'll put my thoughts in code someday, if life doesn't get in the way.

The strength of APLs relies on having a fixed set of versatile primitives. From there, the terseness of the language allows one to write small but expressive and useful programs. This strongly limits the need for external libs and this is where the real value lies. Whereas in Python, you'd reach for multiple libs made in another language and often by other people, in APL you learn the prmitives and you're off to the races! Therefore, the fixed fraction of the programs you write (primitives vs functions) is much smaller (no libs with poor documentation) and you don't risk the carpet being swept from under you by changing lib APIs.

BTW J is stellar for data wrangling, and I encourage everyone to endure the multiple weeks of effort required to learn the basics of the language. Spend the time, it will be really rewarding!

replies(1): >>avmich+At1
◧◩◪◨⬒
6. avmich+At1[view] [source] [discussion] 2020-11-30 20:14:29
>>rscho+ub
> endure the multiple weeks of effort required to learn the basics

J for C programmers is a good book - https://www.jsoftware.com/docs/help807/jforc/contents.htm - and it could take significantly less than many weeks to "get" some important ideas. Specifically, make sure you understand ranks at chapters 5 and 6.

After you understand how +/ with different ranks can sum along different axis, you're well on the way.

I mean, here is a cube of numbers:

       i. 2 3 4
     0  1  2  3
     4  5  6  7
     8  9 10 11

    12 13 14 15
    16 17 18 19
    20 21 22 23
Plain +/ sums along the leading axis -

       +/ i. 2 3 4
    12 14 16 18
    20 22 24 26
    28 30 32 34
That's because rank of +/ is infinity, so / inserts pluses between highest-ranked items, of which there are two - a square

     0  1  2  3
     4  5  6  7
     8  9 10 11
and square

    12 13 14 15
    16 17 18 19
    20 21 22 23
(rank is a sort of dimension). So +/ just adds, element by element, these two squares together, giving the resulting square.

If you specify +/"0 - this sets the rank of the verb (function) to 0 - then +/ will be applied to each number separately and results will be combined. Adding a single number (not with itself - just as it is, without the other argument for summation) makes the same number, so +/"0 doesn't change the result - it's the same cube as in i. 2 3 4

Trying with +/"1 gives

       +/"1 i. 2 3 4
     6 22 38
    54 70 86
That's because +/"1 now is a verb of rank 1, so it works with items (subarrays) of rank 1. In cube i. 2 3 4 there are 6 subarrays of rank 1, 3 of them are in the first "plane" and 3 of them are in the second "plane". +/"1 takes each such subarray of rank 1 separately and sums elements in it (inserts + between elements of such array), and J then aggregates results into the array.

Finally,

       +/"2 i. 2 3 4
    12 15 18 21
    48 51 54 57
sums within 2-dimensional arrays. Elements of such arrays are 1-dimensional arrays, so those arrays are summed, element by element. There are two planes, so the result has two element (two arrays of rank 1), and each element is array of rank 1, obtained from summing 3 arrays of rank 1.

The book tells it better, of course.

[go to top]