Posted on: 13/08/2025
Responsibilities :
- Design and implement deep learning models for image, video, object detection, and audio classification tasks.
- Apply and fine-tune CNN-based architectures and Vision Transformers (e. g., ViT, Swin).
- Integrate attention mechanisms (e. g., SE, CBAM, Transformer attention) into model architectures for enhanced feature learning.
- Utilize pretrained models for transfer learning and multi-task learning.
- Work with video data using spatio-temporal modeling techniques (e. g., 3D CNNs, temporal attention).
- Extract and process features from audio using spectrograms, MFCCs, or learned embeddings.
- Evaluate and optimize models for speed, accuracy, and robustness.
- Collaborate across teams to deploy models into production.
- Strong programming skills in Python, with experience in PyTorch or TensorFlow.
- Hands-on experience with CNNs, pretrained networks, and attention modules.
- Solid knowledge of Vision Transformers, including recent architectures (e. g., Swin, DeiT).
- Understanding of attention mechanisms (self-attention, cross-attention, squeeze-and-excitation, etc. ).
- Experience implementing and training object detection models (YOLO, SSD, Faster R-CNN, Retina Net, DETR).
- Experience in video analysis and temporal modelling.
- Strong grasp of audio classification workflows and features.
- Experience handling large-scale datasets and designing data pipelines.
Did you find something suspicious?