Role: GPU Performance Software Development Engineer
Location: San Jose, CA (see posting for details)
We’re building Wave, a high-performance GPU programming language and compiler for machine learning workloads. Wave combines a Python-embedded DSL with an MLIR-based compiler stack to give engineers explicit control over GPU kernel performance.
We’re looking for a low-level GPU performance engineer to own end-to-end performance of Wave’s kernels.
You’ll work on
• Hand-tuned GPU kernels (GEMM, Attention, MoE, decoding)
• Instruction-level optimization: memory, registers, scheduling, wave/warp behavior
• HIP/CUDA, GPU intrinsics (MFMA/MMA), and inline assembly • MLIR dialects, lowering pipelines, and performance-critical compiler passes
• Profiling via disassembly, counters, rocprof/Nsight
Wave codebase: https://github.com/iree-org/wave
Job posting: https://careers.amd.com/careers-home/jobs/77039?lang=en-us