Notation as a Tool of Thought

>>mafaa+(OP)
As a lone data wrangler, I am dreaming of an "APL like R", i.e. geared specifically towards data manipulation and stats with an integrated columnar store (I am spoilt by J).

People always say that such a thing will never fly in teams due to syntactic issues, but APL really is a productivity secret weapon for loners and small teams!

>>rscho+sO
Is it a matter of writing a comprehensive stats library?

>>chrisp+mT
A comprehensive stat library for J would be close (there is an RServe lib), but the most important thing is the integrated data store. For big companies, it makes perfect sense to want a separate data store. But for small shops or loners, tight language integration is light-years better! With J, I am allowed a real database (no effing .csv) where I can use the same (terse) language as for the analysis. This is the killer feature. This is where you see that say, for example Julia, is made for serious industrial coding with teams of tens of people and not lone guys.

Following the same principle, R dplyr allows you to seamlessly interact with the DB by using a translation layer. However, every time I open R I find myself having to write tens if not hundreds of loc to shape the data where I'd do that in a few lines of J. For single researchers, it's actually much easier to read your one-page of J code 6 months later than it is for your 500-loc R script (again IMO).

Although I imagine it could be possible to really make a specialized APL geared towards data analysis as a strict DSL (APL is not a DSL). Meaning for example, making it more static and therefore statically compileable at the expense of losing things such as first-class environments (namespaces) or the "execute" primitive. One could also specialize the notation further towards statistics. There really is a whole realm of possibilities, here!

In a word, there is a market for lone scientists. It would be nice to have tools for that market ;-)

>>rscho+eV
You didn't mention it, so for avoidance of doubt: have you heard of k?

Commercial k variants come with a columnar data store (see Kx's q/kdb+, Shakti's k9).

>>chrisp+PW
Yes, I heard of K which makes some things a little more convenient and others a little less by adopting the "list-of-vectors" model instead of true multidimensional arrays. K still is very much in the line of traditional APLs, though. I don't have extensive experience with K, but J with the Jd DB seems very closely related.

I was thinking of more radical changes that would certainly disappoint APL purists by exchanging some well-known and highly-regarded APL capabilities for a more specialized language. Maybe I'll put my thoughts in code someday, if life doesn't get in the way.

The strength of APLs relies on having a fixed set of versatile primitives. From there, the terseness of the language allows one to write small but expressive and useful programs. This strongly limits the need for external libs and this is where the real value lies. Whereas in Python, you'd reach for multiple libs made in another language and often by other people, in APL you learn the prmitives and you're off to the races! Therefore, the fixed fraction of the programs you write (primitives vs functions) is much smaller (no libs with poor documentation) and you don't risk the carpet being swept from under you by changing lib APIs.

BTW J is stellar for data wrangling, and I encourage everyone to endure the multiple weeks of effort required to learn the basics of the language. Spend the time, it will be really rewarding!

zlacker