Posted on: 08/09/2025
Job Title : Computer Vision Research Engineer
Location : Noida
Experience : 2-5 Years
Qualification :B.Tech
Job description
Core Responsibilities:
- Architect and develop novel Deep Learning (DL) and Machine Learning (ML) neural network architectures, including those for generative AI.
- Design, develop, and refine Vision-Language Model (VLM) architectures, and integrate them with generative AI techniques for
advanced multimodal understanding and generation.
- Spearhead the New Product Development (NPD) process, delivering innovative, differentiated products/solutions that provide distinct consumer advantages and exceed market benchmarks.
- Engineer and implement advanced computer vision and deep learning algorithms to resolve intricate scene understanding challenges and their multimodal interpretations.
- Evaluate experimental visual and contextual data to discern and execute performance enhancement strategies.
- Facilitate the transition of algorithms into operational prototypes and/or real-time system demonstrations.
- Establish and manage a cohesive team of resources to ensure seamless integration with related functions.
Job Requirements :
- 2 - 5 years of experience in computer vision, Generative AI, Vision Language Models (VLM), multimodal systems, and image processing.
- A strong background in designing image understanding, multimodal, and cross-domain learning algorithms.
- Hands-on experience with deployable, Real-time Vision Language Models (VLMs).
- Practical experience with acceleration frameworks such as NVIDIA TensorRT, DeepStream, and Intel OpenVINO is preferred.
- Proficiency in programming languages including Python or C++.
- Experience with Docker and Kubernetes.
- Knowledge of pattern recognition and Edge AI product development.
- Ability to review code, design architectures, and document core components.
- Familiarity with research and development practices and industry standards.
- Preferred : Hands-on experience with Natural Language Processing (NLP) and the integration of visual and textual signals for multimodal systems.
Did you find something suspicious?