What Youll Own :

- Build and extend backend services that power AI-driven media search and metadata enrichment

- Develop, integrate, and deploy AI/ML inference pipelines (embeddings, vision/audio models, transcription, background removal, etc.)

- Fine-tune and optimize computer vision and generative models (e.g., U2Net, BiRefNet, CLIP, Whisper, YOLO, diffusion models)

- Work with large datasets (100k5M images): preprocessing, augmenting, and structuring for training/inference

- Contribute to building pipelines for tasks like background removal, inpainting/outpainting, banner generation, logo/face detection, and multimodal embeddings

- Integrate with vector databases (e.g., FAISS, Pinecone, Weaviate, Qdrant) for similarity and semantic search

- Collaborate with the engineering team to deploy scalable AI inference endpoints (Docker + GPU/EC2/SageMaker)

Skills & Experience We Expect :

- Core Python (Required) : Solid programming and debugging skills in production systems

- AI/ML Libraries : Hands-on experience with PyTorch and/or TensorFlow, NumPy, OpenCV, Hugging Face Transformers

- Model Training/Fine-Tuning : Experience fine-tuning pre-trained models for vision, audio, or multimodal tasks

- Data Handling : Preprocessing and augmenting image/video datasets for training and evaluation

- Comfortable with chaining or orchestrating multimodal inference workflows (e.g., image + audio + OCR unified embedding)

Bonus Points If You :

- Have worked with generative models (diffusion, inpainting, or outpainting)

- Understand large-scale media workflows (video, design files, time-coded metadata)

- Enjoy experimenting with new models and pushing them into production

- Care about making AI useful in real-world creative pipelines

- Vector Search familiarity with FAISS, Pinecone, or similar for embeddings-based search