We seek a highly skilled and experienced Lead Machine Learning Engineer with extensive expertise in multimodal generative AI models, cross-modal architectures, and multimodal fusion techniques. The ideal candidate will have a strong technical background spanning text, vision, audio, and video modalities, and the drive to mentor, educate, and advocate for the adoption of new and emerging technologies.

Qualifications :

- 7+ years experience in machine learning engineering, with at least 2+ years focused on generative AI or multimodal systems.

- Proven experience developing and deploying multimodal generative AI systems with a deep understanding of architectures that bridge multiple modalities (text-to-image, image-to-text, text-to-video, audio-visual models, etc.).

- Strong expertise in vision models and architectures including diffusion models, vision transformers, and multimodal embeddings.

- Experience with Large Language Models and their integration with visual and audio modalities.

- Experience with multimodal retrieval systems and vector databases.

- Professional experience developing Python libraries for machine-learning applications.

- Strong background in PyTorch, HuggingFace Transformers/Diffusers, and specialized libraries (e.g., Stable Diffusion, OpenAI CLIP, timm, torchaudio, torchvision).

- Demonstrated ability to lead and mentor a team.