zlacker

That work looks really interesting! I am also excited about type safety when it comes to tensors. My understanding was that this type safe approach to tensor shape had encountered issues because it was difficult/impossible (maybe?) to reason about the shape of some common operators at compile time. But perhaps those operators are not really necessary. [0]

Some sort of typed 'named tensor' that could be combined with einsum notation at runtime would be awesome, ie. (don't really know TS/JS well but pseudocode)

  import { torch } from 'pytorch' as t
  import { torch.nn } from 'pytorch' as nn

  const tensorA: Tensor[Batch, Seq, Emb] = t.randn([10,10,10]) // initialize tensor
  const transformLayer = nn.Einsum((Batch, Seq, Emb),(Emb)->(Batch, Seq))

  const tensorB: Tensor[Emb2] = t.randn([20])

  const transformedOutput = transformLayer(tensorA, tensorB) // type error: Emb2 does not match Emb

[0]: https://github.com/pytorch/pytorch/issues/26889

replies(1): >>newhou+Q2

>>whimsi+(OP)
This is a great thread, thanks! Somehow I missed it when looking for prior art.

When I initially started implementing this I was hung up on similar concerns. For example in GPT2/PotatoGPT the MLP player is 4x the width of the residual stream. I went down a rabbit hole of addition and multiplication in Typescript types (the type system is Turing complete, so it's technically possible!) and after crashing my TS language server a bunch I switched tacticts.

Where I ended up was to use symbolic equivalence, which turned out to be more ergonomic anyway, i.e.

  type Multiply<A extends number, B extends number> = 
    number & { label: `${A} * ${B}` }
  const Multiply = <A extends number, B extends number>(a: A, b: B) => 
    a * b as Multiply<A, B>;

such that

  tensor([
    params.EmbeddingDimensions, // This is a literal with known size
    Multiply(4, params.EmbeddingDimensions)] as const)

is inferred as

  Tensor<readonly [768, Multiply<4, 768>]>

Notably, switching to a more symbolic approach makes it easier for type checking dimensions that can change at runtime, so something like:

  tensor([Var(tokens.length, 'Sequence Length'), 
          Multiply<4, Var(tokens.length, 'Sequence Length')>])

infers as

  Tensor<readonly [
     Var<'Sequence Length'>, 
     Multiply<4, Var<'Sequence Length'>>]>

And you'll get all the same correctness constraints that you would if these were known dimensions.

The downside to this approach is that typescript won't know that Multiply<4, Var<'A'>> is equivalent to Multiply<Var<'A'>, 4> but in practice I haven't found this to be a problem.

Finally, on more complicated operators/functions that compose dimensions from different variables Typescript is also very capable, albeit not the most ergonomic. You can check my code for matrix multiplication and Seb's writeup for another example of a zip function).

replies(1): >>t-vi+M02

>>newhou+Q2
Out of curiosity, how do you handle things where the output shape is input dependent (as opposed to only dependent on input shapes)? This is from `torch.sum(tensor, dim)` where dim might be nonconstant to `torch.nonzero(x)` and of course advanced indexing.