PyTorch for WebGPU

>>mighdo+(OP)
I'm excited about this for probably different reasons than most: I think Typescript could be a more ergonomic way to develop ML models than Python because you can automatically infer and check tensor dimensions while you are writing code! Compare this to the mess of comments you usually see writing pytorch telling you that x is of shape [x, y, z].

  // An empty 3x4 matrix
  const tensorA = tensor([3, 4])
  
  // An empty 4x5 matrix
  const tensorB = tensor([4, 5])

  const good = multiplyMatrix(tensorA, tensorB);
        ^
        Inferred type is Tensor<readonly [3, 5]>
  
  const bad = multiplyMatrix(tensorB, tensorA);
                             ^^^^^^^
                             Argument of type 'Tensor<readonly [4, 5]>' is not 
                             assignable to parameter of type '[never, "Differing 
                             types", 3 | 5]'.(2345)

I prototyped this for PotatoGPT [1] and some kind stranger on the internet wrote up a more extensive take [2]. You can play with an early version on the Typescript playground here [3] (uses a twitter shortlink for brevity)

[1] https://github.com/newhouseb/potatogpt

[2] https://sebinsua.com/type-safe-tensors

[3] https://t.co/gUzzTl4AAN

>>newhou+Zd
That work looks really interesting! I am also excited about type safety when it comes to tensors. My understanding was that this type safe approach to tensor shape had encountered issues because it was difficult/impossible (maybe?) to reason about the shape of some common operators at compile time. But perhaps those operators are not really necessary. [0]

Some sort of typed 'named tensor' that could be combined with einsum notation at runtime would be awesome, ie. (don't really know TS/JS well but pseudocode)

  import { torch } from 'pytorch' as t
  import { torch.nn } from 'pytorch' as nn

  const tensorA: Tensor[Batch, Seq, Emb] = t.randn([10,10,10]) // initialize tensor
  const transformLayer = nn.Einsum((Batch, Seq, Emb),(Emb)->(Batch, Seq))

  const tensorB: Tensor[Emb2] = t.randn([20])

  const transformedOutput = transformLayer(tensorA, tensorB) // type error: Emb2 does not match Emb

[0]: https://github.com/pytorch/pytorch/issues/26889

>>whimsi+Nh
This is a great thread, thanks! Somehow I missed it when looking for prior art.

When I initially started implementing this I was hung up on similar concerns. For example in GPT2/PotatoGPT the MLP player is 4x the width of the residual stream. I went down a rabbit hole of addition and multiplication in Typescript types (the type system is Turing complete, so it's technically possible!) and after crashing my TS language server a bunch I switched tacticts.

Where I ended up was to use symbolic equivalence, which turned out to be more ergonomic anyway, i.e.

  type Multiply<A extends number, B extends number> = 
    number & { label: `${A} * ${B}` }
  const Multiply = <A extends number, B extends number>(a: A, b: B) => 
    a * b as Multiply<A, B>;

such that

  tensor([
    params.EmbeddingDimensions, // This is a literal with known size
    Multiply(4, params.EmbeddingDimensions)] as const)

is inferred as

  Tensor<readonly [768, Multiply<4, 768>]>

Notably, switching to a more symbolic approach makes it easier for type checking dimensions that can change at runtime, so something like:

  tensor([Var(tokens.length, 'Sequence Length'), 
          Multiply<4, Var(tokens.length, 'Sequence Length')>])

infers as

  Tensor<readonly [
     Var<'Sequence Length'>, 
     Multiply<4, Var<'Sequence Length'>>]>

And you'll get all the same correctness constraints that you would if these were known dimensions.

The downside to this approach is that typescript won't know that Multiply<4, Var<'A'>> is equivalent to Multiply<Var<'A'>, 4> but in practice I haven't found this to be a problem.

Finally, on more complicated operators/functions that compose dimensions from different variables Typescript is also very capable, albeit not the most ergonomic. You can check my code for matrix multiplication and Seb's writeup for another example of a zip function).

>>newhou+Dk
Out of curiosity, how do you handle things where the output shape is input dependent (as opposed to only dependent on input shapes)? This is from `torch.sum(tensor, dim)` where dim might be nonconstant to `torch.nonzero(x)` and of course advanced indexing.

zlacker