HamburgerMenu
hirist

Senior C++ Developer - AI Inference Engine

K S A INC.
Bangalore
7 - 20 Years

Posted on: 13/02/2026

Job Description

About the Role :

Build the world's fastest CPU-based AI inference engines. You'll architect C++ libraries that power production-grade LLMs and vision models, squeezing every cycle from modern processors using AVX-512, operator fusion, and NUMA-aware memory systems.

Key Responsibilities :

- Design & implement high-performance AI model architectures using SIMD intrinsics (AVX2/AVX-512) and processor-specific optimizations

- Build reusable components for the AI Model Library - GEMM kernels, operator fusion, cache-optimized inference pipelines

- Profile & optimize cache hierarchy, NUMA-aware memory allocation, and CPU-based inferencing for sub-ms latency

- Write production-grade Modern C++ (C++17/20) with OpenMP parallelization and HPC best practices

- Conduct rigorous code reviews ensuring zero-memory-leak, thread-safe implementations

- Debug complex performance issues across multi-socket NUMA systems

- Collaborate with research teams to productionize novel inference techniques

Must-Have Technical Skills :

HIGHLY PRIORITIZED (Screening Criteria) :

- AVX intrinsics & SIMD vectorization (AVX2 minimum, AVX-512 preferred)

- Cache hierarchy optimization (L1/L2/L3 prefetching)

- NUMA-aware memory allocation & topology-aware scheduling

- GEMM / blocked matrix multiplication kernels

- CPU inference engines (no GPU dependency)

- Operator fusion & kernel fusion techniques

- High-performance computing (HPC) patterns

CORE REQUIREMENTS :

- Modern C++17/20 (smart pointers, coroutines, concepts)

- OpenMP 5.0+ for multi-threaded parallelism

- Linux performance tools (perf, VTune, flamegraphs)

- Memory profiling (valgrind, sanitizers, JeMalloc)

Nice-to-Haves (Interview Boosters) :

- oneDNN, TVM, or Apache TVM inference engine experience

- Intel ISPC or LLVM vectorization expertise

- PyTorch/TensorFlow C++ inference backends

- ARM SVE or RISC-V vector experience

- Real-time systems or embedded AI deployment

Qualifications :

- B.E./B.Tech/M.Tech in CS/EE with 3-7 years C++ experience

- Proven track record of shipping production C++ AI inference code

- Deep understanding of Linear Algebra and numerical stability

- Linux power user - perf, gdb, strace fluency

- Strong English communication for cross-team collaboration


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in