HamburgerMenu
hirist

CynLr - Software Research Engineer - GPU

Posted on: 05/02/2026

Job Description

Title : SW Research Engineer - GPU performance

Location : Bengaluru, Karnataka, India

About CynLr & Role :


Just like a baby's brain, CynLr Visual Intelligence stack makes Robots to instinctively see & pick any object under any ambience, without any training. (a demo video link).


Today, a robot that can fit a screw into a nut without slipping a thread, doesnt exist.

Imagine what it would take for a robot to assemble a Smartphone or a car by putting together 1000s of parts with varied shapes and weights, all in random orientations. Thus factories become complex, needing heavy customization of their environment.

CynLr enabled visual robots intuitively learns to handles even unknown objects, on-the fly - eliminating the need for customization and an universal alternative to custom machines. Simplifying factory lines into modular LEGO blocks of micro-factories.

CynLr's Vision & ML stacks are largely Neuroscience inspired and builds its HW, Sensors & ML blocks from scratch.

As a GPU developer, you will be responsible for building and translating the Visual Pathway of the Brain & assist in Building ML&RL learning models that are performance-optimized GPU code block and build mathematical models that are better represented in GPU.

Key Responsibilities :


- Translating the Neuroscience models from Brain, identified & designed by the Algo team into Bio mimicking GPU codes.

- Designing and optimising foundational neural networks and modelling neurons (basically optimizing mathematical models that involve time-weighted kernels) ground up.

- Optimizing Time Continuous kernels - not just High-Level Kernel optimizations that are shipped with CUDA.


- Design the framework of Pipelined Image processing & identify the performance bottlenecks in Image processing Pipeline.

- CUDA core optimization to achieve maximum performance for a pipelined processing between multiple blocks of functions executing simultaneously.

- Dynamic Load balancing between kernels and functions.

- Interleaving processing between CPU and GPU and runtime modification of GPU processing control flow from CPU.

- Build with NvidiaDirect to access memory directly from Peripheral devices (PCIe), Display and USB, bypassing the CPU for acquiring images from Camera Hardware.

- Construct Direct Visualization of GPU Memory for Debugging without CPU transfer.

Team Structure :

The engineering team will comprise of - Algo Team, GPU Team, SW Dev Team & HW Team. Members of other teams will be passive members of each team apart from the team they lead. The Algo Team will research on Neuroscience and provide the Neural Models & Vision algorithms, while the GPU Team will provide the GPU optimizations for the algos, HW team will provide the HW integration and SW team with translate GPU
optimized algos into SW blocks.


Each team will split the implementation among other teams and guide them through the implementation. Every team member will be a passive member of all other teams.

Requirements in Practice :


- Experienced with Low-level CUDA API

- Strong with fundamentals of C++/C.

- Adept with Visual Studio developer toolchain.

- Experience in low-level performance analysis and optimization with a strong understanding of the GPU HW architecture and HW-oriented performance optimization.

- Including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, NVIDIA Nsight Compute and Graphics Developer tools for debugging

- Exposure to Omniverse is a Plus

Must have an understanding of :

- GPU-based application development. Knowledge of CUDA (Excellency is not necessary)

- State machine architecture

- Realtime computing

- Memory architectures and optimizations.

- MIMD, SIMD

Good to have experience and practice with :


- Compiler working and construction.

- CPU architectures - x86, x64 & ARM

- Hardware-associated driver development.

- OS and layers (Board Support Packages, BIOS, UEFI, BootLoader)

- UI-based deployable application development

What will you do?

Simplistically put - you will think all the algorithms that the Neuroscience team comes up with through GPU for maximum performance. You will break down the entire pipeline of processing that imitates the visual pathway into optimized blocks and kernels of processing in GPU. You will meticulously discover the mathematical models that gives the maximum timing performance for every Neural Model/algorithm that the Vision and Neuro team comes up with.

You will also be building some aspects of Debugging, profiling and Image visualizing tools for GPU.

How will you Do?

Though confined to a focused area, the work is pretty much expected to be entrepreneurial with the exact advantages and difficulties of a startup. Since this is a startup and the product will evolve at a rapid pace, the contours of the technology, toolchain & Process will have to constantly adapt to the changes. You will participate in actively contributing in building the process & tool chain too. You will have complete freedom here, but you will be subjected to reviews.

For Senior roles :

Part of your design effort involves requirements discovery and developing architectures that are agnostic to requirement changes. The SW part of the product significantly evolves as per your thought process and will henceforth carry your signature in it.


You will also be building a team as the product evolves to maintain and develop further.

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in