zlacker

I'm excited about this for probably different reasons than most: I think Typescript could be a more ergonomic way to develop ML models than Python because you can automatically infer and check tensor dimensions while you are writing code! Compare this to the mess of comments you usually see writing pytorch telling you that x is of shape [x, y, z].

  // An empty 3x4 matrix
  const tensorA = tensor([3, 4])
  
  // An empty 4x5 matrix
  const tensorB = tensor([4, 5])

  const good = multiplyMatrix(tensorA, tensorB);
        ^
        Inferred type is Tensor<readonly [3, 5]>
  
  const bad = multiplyMatrix(tensorB, tensorA);
                             ^^^^^^^
                             Argument of type 'Tensor<readonly [4, 5]>' is not 
                             assignable to parameter of type '[never, "Differing 
                             types", 3 | 5]'.(2345)

I prototyped this for PotatoGPT [1] and some kind stranger on the internet wrote up a more extensive take [2]. You can play with an early version on the Typescript playground here [3] (uses a twitter shortlink for brevity)

[1] https://github.com/newhouseb/potatogpt

[2] https://sebinsua.com/type-safe-tensors

[3] https://t.co/gUzzTl4AAN

replies(12): >>whimsi+O3 >>teruak+84 >>a1371+e5 >>nicoco+Y6 >>tzheng+3a >>saiojd+0e >>modele+1h >>tehsau+Eo >>mhh__+jr >>6gvONx+3z >>polyga+P21 >>rd1123+JS1

>>newhou+(OP)
That work looks really interesting! I am also excited about type safety when it comes to tensors. My understanding was that this type safe approach to tensor shape had encountered issues because it was difficult/impossible (maybe?) to reason about the shape of some common operators at compile time. But perhaps those operators are not really necessary. [0]

Some sort of typed 'named tensor' that could be combined with einsum notation at runtime would be awesome, ie. (don't really know TS/JS well but pseudocode)

  import { torch } from 'pytorch' as t
  import { torch.nn } from 'pytorch' as nn

  const tensorA: Tensor[Batch, Seq, Emb] = t.randn([10,10,10]) // initialize tensor
  const transformLayer = nn.Einsum((Batch, Seq, Emb),(Emb)->(Batch, Seq))

  const tensorB: Tensor[Emb2] = t.randn([20])

  const transformedOutput = transformLayer(tensorA, tensorB) // type error: Emb2 does not match Emb

[0]: https://github.com/pytorch/pytorch/issues/26889

replies(1): >>newhou+E6

>>newhou+(OP)
I think you are absolutely right. It's easy to think you are supposed to use a [x y z] tensor when it expects a [z y x] and you don't find out until runtime.

It would he even better if tensor dims from loaded models could be infered ahead of time in the editor.

>>newhou+(OP)
I really hope that takes off because you are correct. Python though has such a fluid syntax that I'm not sure TS can match. For example when you want to sum two Numpy arrays, you just need the + operator, while that sort of thing is notoriously unpredictable in JS.

replies(2): >>srouss+W8 >>saiojd+xe

>>whimsi+O3
This is a great thread, thanks! Somehow I missed it when looking for prior art.

When I initially started implementing this I was hung up on similar concerns. For example in GPT2/PotatoGPT the MLP player is 4x the width of the residual stream. I went down a rabbit hole of addition and multiplication in Typescript types (the type system is Turing complete, so it's technically possible!) and after crashing my TS language server a bunch I switched tacticts.

Where I ended up was to use symbolic equivalence, which turned out to be more ergonomic anyway, i.e.

  type Multiply<A extends number, B extends number> = 
    number & { label: `${A} * ${B}` }
  const Multiply = <A extends number, B extends number>(a: A, b: B) => 
    a * b as Multiply<A, B>;

such that

  tensor([
    params.EmbeddingDimensions, // This is a literal with known size
    Multiply(4, params.EmbeddingDimensions)] as const)

is inferred as

  Tensor<readonly [768, Multiply<4, 768>]>

Notably, switching to a more symbolic approach makes it easier for type checking dimensions that can change at runtime, so something like:

  tensor([Var(tokens.length, 'Sequence Length'), 
          Multiply<4, Var(tokens.length, 'Sequence Length')>])

infers as

  Tensor<readonly [
     Var<'Sequence Length'>, 
     Multiply<4, Var<'Sequence Length'>>]>

And you'll get all the same correctness constraints that you would if these were known dimensions.

The downside to this approach is that typescript won't know that Multiply<4, Var<'A'>> is equivalent to Multiply<Var<'A'>, 4> but in practice I haven't found this to be a problem.

Finally, on more complicated operators/functions that compose dimensions from different variables Typescript is also very capable, albeit not the most ergonomic. You can check my code for matrix multiplication and Seb's writeup for another example of a zip function).

replies(1): >>t-vi+A42

>>newhou+(OP)
I believe there is WIP to get python type annotations for arrays/tensors shape, but it's not a thing yet, indeed.

>>a1371+e5
I wonder if you could not do some operator overloading on the TS side to do some rewriting to get things like tensor addition on tensor types.

Heck, if you are doing that, maybe convert to webgpu automatically as well.

Someone very enterprising might do this in bun using zig.

>>newhou+(OP)
Just a little push back here, I think you strike on the right theme where a programming language could fill this gap. However, I wonder if new domain specific languages will eventually be the more elegant solution. Think Modular's Mojo [1] or Meta's KNYFE [2] mentioned earlier this week.

[1] - https://www.modular.com/mojo [2] - https://ai.facebook.com/blog/meta-training-inference-acceler...

replies(1): >>newhou+Ea

>>tzheng+3a
It's a great question. I don't really have a horse in this race as long as whatever wins is maximally ergonomic. I think as long as the DSL is Turing complete such that you could "compute" on tensor shapes then we win. That said, it's very easy to build a type system that isn't so flexible (see most other languages) so I think it'd have to likely be a focus of the DSL from the get go.

>>newhou+(OP)
Another thing that TS does nicely is object handling in general: dot access for objects attributes, object destructuring, typed objects for function options. In most ML projects I see a bunch of functions that look like:

    def my_fn(x, **kwargs):
       ...
       return y_1, y_2, y_3

Which is a pain because kwargs could be anything really + now every call site has to expect 3 return values exactly while knowing their order; there's no way of adding an extra return value without changing everyone. In typescript the same function could look like:

    function myFn(x, options = { someOption: 1 }) {
       ...
       return { y_1, y_2, y_3 };
    }

Which is so much nicer because everything is typed with all types inferred automatically! And you don't burden the call sites with values they don't need:

    const { y_1 } = myFn(x, { someOption: 1 });

In Python, everyone mostly passes unbundled arguments through every function, and changing anything involves threading these untyped arguments through a bunch of untyped call sites, its not the end of the world but we can do better...

replies(2): >>praecl+si >>int_19+zab

>>a1371+e5
Three.js works just fine with functions like `.add`, it sure is ugly though. It kind of blows the mind that javascript has had so many syntactic additions over the years but still has no operator overloading.

>>newhou+(OP)
Without multidimensional array slicing or operator overloading it seems like Typescript could never be anywhere near as ergonomic as Python for ML, despite its other advantages.

replies(2): >>praecl+yi >>phailh+Lk

>>saiojd+0e
I’m of the same opinion. While I think I will keep the standard parameter order from torch, I will include the options overload to give all the benefits you describe.

replies(1): >>saiojd+Yl

>>modele+1h
Those are niceties and can be implemented with some small hacks. Most big nets do very little slicing. Lots of dimension permutations (transpose, reshape, and friends) but less slicing. I personally use a lot of slicing so will do my best to support a clean syntax.

replies(2): >>tysam_+NC >>whimsi+gG1

>>modele+1h
What's the advantage of those "ergonomics" if you have to memorize all the quirks? With a language like Typescript, all those operations become explicit instead of implicit, letting you take full advantage of your IDE with autocomplete, documentation, and compile-time warnings. Python sacrifices all of those just to save a few keystrokes.

replies(1): >>int_19+Lab

>>praecl+si
Awesome :D Really nice project by the way

>>newhou+(OP)
If you want to do this today you can also use the torch c++ api! It’s whats pytorch binds to under the hood.

replies(1): >>whimsi+pG1

>>newhou+(OP)
Dependant types or it's a toy.

>>newhou+(OP)
That’s a good point, but I think python will be much more feasible because of operator overloading:

(x+y)*z/3

vs

x.add(y).mul(z).div(3)

And that’s just a really simple example.

I’m also hopeful that pythons new variadic generic types make progress here in python.

>>praecl+yi
I've come to believe over the last few years that slicing is one of the most critical parts of a good ML array framework for a number of things and I've used it heavily. PyTorch, if I understand correctly, still doesn't have it right in terms of some forms of slice assignment and the handling of slice objects (please correct me if I'm wrong) though it is leagues better than tensorflow was.

I've written a lot of dataloader and such code over the last number of years, and the slicing was probably the most important (and most hair-pulling) parts for me. I've really debated writing my own wrapper at some point (if it is indeed worth the effort) just to keep my sanity, even if it is as the expense of some speed.

>>newhou+(OP)
I don't know if you knew but this is how TensorFlow 1 worked. Unfortunately, that was a widely unpopular design choice because it was hard to overload the same function for tensors of different dimensions, among other things.

replies(1): >>newhou+i91

>>polyga+P21
Interesting, do you have any references or examples? Some brief googling around hasn't found anything like this. The fact that overloading was an issue makes me think that TF1 was doing something different because Typescript generic type parameters allow you to do "overloading" galore (by only specifying constraints rather than enumerating every possible call format).

>>praecl+yi
I disagree with this, slice notation is powerful and I use it quite a bit in DL.

Even just the [:, None] trick replacing unsqueeze is super useful for me.

>>tehsau+Eo
? I don't think torch C++ supports this.

>>newhou+(OP)
It seems that many agree with this. At the risk of getting downvoted I want to share an opposing opinion:

This way of thinking is not just unhelpful but even harmful. If one would often benefit from these checks while coding, then they should not be relying on a type checker. They should be thinking more, and writing comments is a great way to do that.

This is especially true because many operations on ndarrays / tensors can yield perfectly valid shapes with completely unintended consequences. When comments are written reasonably well they help avoid these difficult-to-debug, correct-output-shape-but-unintended-result mistakes. Not to mention the additional clear benefit of helping one quickly re-understand the tensor manipulations when coming back to the code weeks or months later.

And more generally, if one can get in the habit of writing these comments before the code, it can help push them away from the write-quickly-now-debug-later mentality. I have seen this bite folks many times, both while teaching ugrad + grad courses and while working at large tech companies.

replies(1): >>newhou+xe2

>>newhou+E6
Out of curiosity, how do you handle things where the output shape is input dependent (as opposed to only dependent on input shapes)? This is from `torch.sum(tensor, dim)` where dim might be nonconstant to `torch.nonzero(x)` and of course advanced indexing.

>>rd1123+JS1
Where do you draw the line? Is type checking in any domain harmful because it acts a crutch for your mental model of how your code works? One could similarly extrapolate this to any static analysis in any language.

>>saiojd+0e
Python also has pattern matching on dicts and typed kwargs these days. It seems that the only thing missing is syntactic sugar for unconditional destructuring.

replies(1): >>saiojd+kvj

>>phailh+Lk
What is implicit about either feature, and what difference do they make from the IDE perspective assuming equivalent type annotations in both languages?

replies(1): >>phailh+7Ob

>>int_19+Lab
"Assuming equivalent type annotations" is the problem. Can't do it with Python, full stop. If we could, we wouldn't be having this conversation at all! It can't catch any mistakes because its type system is simply not expressive enough. You have to hold the type information in your head and make sure you slice and multiply correctly.

>>int_19+zab
Yes! It's getting close, but we are still far from things being convenient and widely adopted