A number of test failures for me on chromium 113.0.5672.63 (ungoogled chromium) MacOS Ventura 13.3.1: https://pastebin.com/eM6ZA3j2
I'll open a ticket if it helps..
// An empty 3x4 matrix
const tensorA = tensor([3, 4])
// An empty 4x5 matrix
const tensorB = tensor([4, 5])
const good = multiplyMatrix(tensorA, tensorB);
^
Inferred type is Tensor<readonly [3, 5]>
const bad = multiplyMatrix(tensorB, tensorA);
^^^^^^^
Argument of type 'Tensor<readonly [4, 5]>' is not
assignable to parameter of type '[never, "Differing
types", 3 | 5]'.(2345)
I prototyped this for PotatoGPT [1] and some kind stranger on the internet wrote up a more extensive take [2]. You can play with an early version on the Typescript playground here [3] (uses a twitter shortlink for brevity)[1] https://github.com/newhouseb/potatogpt
Some sort of typed 'named tensor' that could be combined with einsum notation at runtime would be awesome, ie. (don't really know TS/JS well but pseudocode)
import { torch } from 'pytorch' as t
import { torch.nn } from 'pytorch' as nn
const tensorA: Tensor[Batch, Seq, Emb] = t.randn([10,10,10]) // initialize tensor
const transformLayer = nn.Einsum((Batch, Seq, Emb),(Emb)->(Batch, Seq))
const tensorB: Tensor[Emb2] = t.randn([20])
const transformedOutput = transformLayer(tensorA, tensorB) // type error: Emb2 does not match Emb
[0]: https://github.com/pytorch/pytorch/issues/26889This is a dumb question but... are GPUs really that much faster than CPUs specifically at the math functions tested on this page?
xlogy trunc tan/tanh sub square sqrt sin/sinc/silu/sinh sign sigmoid sqrt/rsqrt round relu reciprocal rad2deg pow positive neg mul logaddexp/logaddexp2 log/log1p/log10/log2 ldexp hypot frac floor expm1 exp2 exp div deg2rad cos/cosh copysign ceil atan/atan2 asinh/asin add acosh/acos abs
Those are the types of math GPUs are good at? I thought they were better at a different kind of math, like matrices or something?
[1] - https://www.modular.com/mojo [2] - https://ai.facebook.com/blog/meta-training-inference-acceler...
https://www.w3.org/TR/WGSL/#floating-point-evaluation
It’s not such a problem for real nets since you avoid those values like the plague. But the tests catch them and I need to make the tests are tolerant. Thanks for the results!
[1] https://anansi.pages.dev/ [2] https://github.com/infrawhispers/anansi/tree/main/embedds/li...
privacy focused semantic search / ML at the edge is looking brighter every day.
This would give access to any math notation in a more flexible way, implementing a custom DSL in a type safe but expressive way.
Imagine writing stuff like
const result = math`${a} + ${b} / ${c}`
How does the performance of webgpu-torch compare to compiling PyTorch to WASM with emscripten and WebGPU?
tfjs benchmarks: Environment > backend > {WASM, WebGL, CPU, WebGPU, tflite} https://tensorflow.github.io/tfjs/e2e/benchmarks/local-bench... src: https://github.com/tensorflow/tfjs/tree/master/e2e/benchmark...
tensorflow/tfjs https://github.com/tensorflow/tfjs
tfjs-backend-wasm https://github.com/tensorflow/tfjs/tree/master/tfjs-backend-...
tfjs-backend-webgpu https://github.com/tensorflow/tfjs/tree/master/tfjs-backend-...
([...], tflite-support, tflite-micro)
From facebookresearch/shumai (a JS tensor library) https://github.com/facebookresearch/shumai/issues/122 :
> It doesn't make sense to support anything besides WebGPU at this point. WASM + SIMD is around 15-20x slower on my machine[1]. Although WebGL is more widely supported today, it doesn't have the compute features needed for efficient modern ML (transformers etc) and will likely be a deprecated backend for other frameworks when WebGPU comes online.
tensorflow rust has a struct.Tensor: https://tensorflow.github.io/rust/tensorflow/struct.Tensor.h...
"ONNX Runtime merges WebGPU backend" https://github.com/microsoft/onnxruntime https://news.ycombinator.com/item?id=35696031 ... TIL about wonnx: https://github.com/webonnx/wonnx#in-the-browser-using-webgpu...
microsoft/onnxruntime: https://github.com/microsoft/onnxruntime
Apache/arrow has language-portable Tensors for cpp: https://arrow.apache.org/docs/cpp/api/tensor.html and rust: https://docs.rs/arrow/latest/arrow/tensor/struct.Tensor.html and Python: https://arrow.apache.org/docs/python/api/tables.html#tensors https://arrow.apache.org/docs/python/generated/pyarrow.Tenso...
Fwiw it looks like the llama.cpp Tensor is from ggml, for which there are CUDA and OpenCL implementations (but not yet ROCm, or a WebGPU shim for use with emscripten transpilation to WASM): https://github.com/ggerganov/llama.cpp/blob/master/ggml.h
Are the recommendable ways to cast e.g. arrow Tensors to pytorch/tensorflow?
FWIU, Rust has a better compilation to WASM; and that's probably faster than already-compiled-to-JS/ES TensorFlow + WebGPU.
What's a fair benchmark?
- /? pytorch tensorflow benchmarks webgpu 2023 site:github.com https://www.google.com/search?q=pytorch+tensorflow+benchmark...
- [tfjs benchmarks]
- huggingface/transformers:src/transformers/benchmark https://github.com/huggingface/transformers/tree/main/src/tr...
the absolute golden benchmarks are https://github.com/pytorch/benchmark They are a diverse set of userland code taken from github as-is and made into benchmarks.