Posted on: 13/02/2026
About the Role :
Build the world's fastest CPU-based AI inference engines. You'll architect C++ libraries that power production-grade LLMs and vision models, squeezing every cycle from modern processors using AVX-512, operator fusion, and NUMA-aware memory systems.
Key Responsibilities :
- Design & implement high-performance AI model architectures using SIMD intrinsics (AVX2/AVX-512) and processor-specific optimizations
- Build reusable components for the AI Model Library - GEMM kernels, operator fusion, cache-optimized inference pipelines
- Profile & optimize cache hierarchy, NUMA-aware memory allocation, and CPU-based inferencing for sub-ms latency
- Write production-grade Modern C++ (C++17/20) with OpenMP parallelization and HPC best practices
- Conduct rigorous code reviews ensuring zero-memory-leak, thread-safe implementations
- Debug complex performance issues across multi-socket NUMA systems
- Collaborate with research teams to productionize novel inference techniques
Must-Have Technical Skills :
HIGHLY PRIORITIZED (Screening Criteria) :
- AVX intrinsics & SIMD vectorization (AVX2 minimum, AVX-512 preferred)
- Cache hierarchy optimization (L1/L2/L3 prefetching)
- NUMA-aware memory allocation & topology-aware scheduling
- GEMM / blocked matrix multiplication kernels
- CPU inference engines (no GPU dependency)
- Operator fusion & kernel fusion techniques
- High-performance computing (HPC) patterns
CORE REQUIREMENTS :
- Modern C++17/20 (smart pointers, coroutines, concepts)
- OpenMP 5.0+ for multi-threaded parallelism
- Linux performance tools (perf, VTune, flamegraphs)
- Memory profiling (valgrind, sanitizers, JeMalloc)
Nice-to-Haves (Interview Boosters) :
- oneDNN, TVM, or Apache TVM inference engine experience
- Intel ISPC or LLVM vectorization expertise
- PyTorch/TensorFlow C++ inference backends
- ARM SVE or RISC-V vector experience
- Real-time systems or embedded AI deployment
Qualifications :
- B.E./B.Tech/M.Tech in CS/EE with 3-7 years C++ experience
- Proven track record of shipping production C++ AI inference code
- Deep understanding of Linear Algebra and numerical stability
- Linux power user - perf, gdb, strace fluency
- Strong English communication for cross-team collaboration
Did you find something suspicious?