// An empty 3x4 matrix
const tensorA = tensor([3, 4])
// An empty 4x5 matrix
const tensorB = tensor([4, 5])
const good = multiplyMatrix(tensorA, tensorB);
^
Inferred type is Tensor<readonly [3, 5]>
const bad = multiplyMatrix(tensorB, tensorA);
^^^^^^^
Argument of type 'Tensor<readonly [4, 5]>' is not
assignable to parameter of type '[never, "Differing
types", 3 | 5]'.(2345)
I prototyped this for PotatoGPT [1] and some kind stranger on the internet wrote up a more extensive take [2]. You can play with an early version on the Typescript playground here [3] (uses a twitter shortlink for brevity)[1] https://github.com/newhouseb/potatogpt
Some sort of typed 'named tensor' that could be combined with einsum notation at runtime would be awesome, ie. (don't really know TS/JS well but pseudocode)
import { torch } from 'pytorch' as t
import { torch.nn } from 'pytorch' as nn
const tensorA: Tensor[Batch, Seq, Emb] = t.randn([10,10,10]) // initialize tensor
const transformLayer = nn.Einsum((Batch, Seq, Emb),(Emb)->(Batch, Seq))
const tensorB: Tensor[Emb2] = t.randn([20])
const transformedOutput = transformLayer(tensorA, tensorB) // type error: Emb2 does not match Emb
[0]: https://github.com/pytorch/pytorch/issues/26889It would he even better if tensor dims from loaded models could be infered ahead of time in the editor.
When I initially started implementing this I was hung up on similar concerns. For example in GPT2/PotatoGPT the MLP player is 4x the width of the residual stream. I went down a rabbit hole of addition and multiplication in Typescript types (the type system is Turing complete, so it's technically possible!) and after crashing my TS language server a bunch I switched tacticts.
Where I ended up was to use symbolic equivalence, which turned out to be more ergonomic anyway, i.e.
type Multiply<A extends number, B extends number> =
number & { label: `${A} * ${B}` }
const Multiply = <A extends number, B extends number>(a: A, b: B) =>
a * b as Multiply<A, B>;
such that tensor([
params.EmbeddingDimensions, // This is a literal with known size
Multiply(4, params.EmbeddingDimensions)] as const)
is inferred as Tensor<readonly [768, Multiply<4, 768>]>
Notably, switching to a more symbolic approach makes it easier for type checking dimensions that can change at runtime, so something like: tensor([Var(tokens.length, 'Sequence Length'),
Multiply<4, Var(tokens.length, 'Sequence Length')>])
infers as Tensor<readonly [
Var<'Sequence Length'>,
Multiply<4, Var<'Sequence Length'>>]>
And you'll get all the same correctness constraints that you would if these were known dimensions.The downside to this approach is that typescript won't know that Multiply<4, Var<'A'>> is equivalent to Multiply<Var<'A'>, 4> but in practice I haven't found this to be a problem.
Finally, on more complicated operators/functions that compose dimensions from different variables Typescript is also very capable, albeit not the most ergonomic. You can check my code for matrix multiplication and Seb's writeup for another example of a zip function).
Heck, if you are doing that, maybe convert to webgpu automatically as well.
Someone very enterprising might do this in bun using zig.
[1] - https://www.modular.com/mojo [2] - https://ai.facebook.com/blog/meta-training-inference-acceler...
def my_fn(x, **kwargs):
...
return y_1, y_2, y_3
Which is a pain because kwargs could be anything really + now every call site has to expect 3 return values exactly while knowing their order; there's no way of adding an extra return value without changing everyone. In typescript the same function could look like: function myFn(x, options = { someOption: 1 }) {
...
return { y_1, y_2, y_3 };
}
Which is so much nicer because everything is typed with all types inferred automatically! And you don't burden the call sites with values they don't need: const { y_1 } = myFn(x, { someOption: 1 });
In Python, everyone mostly passes unbundled arguments through every function, and changing anything involves threading these untyped arguments through a bunch of untyped call sites, its not the end of the world but we can do better...(x+y)*z/3
vs
x.add(y).mul(z).div(3)
And that’s just a really simple example.
I’m also hopeful that pythons new variadic generic types make progress here in python.
I've written a lot of dataloader and such code over the last number of years, and the slicing was probably the most important (and most hair-pulling) parts for me. I've really debated writing my own wrapper at some point (if it is indeed worth the effort) just to keep my sanity, even if it is as the expense of some speed.
Even just the [:, None] trick replacing unsqueeze is super useful for me.
This way of thinking is not just unhelpful but even harmful. If one would often benefit from these checks while coding, then they should not be relying on a type checker. They should be thinking more, and writing comments is a great way to do that.
This is especially true because many operations on ndarrays / tensors can yield perfectly valid shapes with completely unintended consequences. When comments are written reasonably well they help avoid these difficult-to-debug, correct-output-shape-but-unintended-result mistakes. Not to mention the additional clear benefit of helping one quickly re-understand the tensor manipulations when coming back to the code weeks or months later.
And more generally, if one can get in the habit of writing these comments before the code, it can help push them away from the write-quickly-now-debug-later mentality. I have seen this bite folks many times, both while teaching ugrad + grad courses and while working at large tech companies.