Posted on: 13/01/2026
Description :
- Design, build, and maintain end-to-end ML pipelines for batch processing of satellite and aerial imagery.
- Deploy and scale ML models in production on AWS infrastructure, leveraging services like SageMaker, Bedrock, or custom-built solutions.
- Implement MLflow for experiment tracking, model versioning, and model registry management.
- Architect batch inference systems optimised for throughput and cost-efficiency.
- Work with geospatial data formats and coordinate reference systems.
- Collaborate with data scientists to transition models from research to production.
- Partner with platform engineering to build scalable compute, GPU clusters, and storage layers.
- Implement comprehensive model monitoring systems to track performance degradation and data drift.
- Design and execute canary deployments and A/B testing frameworks for safe model rollouts.
- Build active learning pipelines to continuously improve model performance with minimal labelling effort.
- Establish model evaluation frameworks and performance benchmarking processes.
- Create alerting and observability systems for production ML workloads.
- Mentor ML engineers and data scientists on best practices for production ML.
- Drive technical decision-making on ML infrastructure and tooling.
- Collaborate with platform and data engineering teams to optimise the ML stack.
- Establish engineering standards and contribute to architectural roadmaps.
Requirements :
- 5+ years of experience in machine learning engineering with 2+ years in a senior or lead capacity.
- Proven track record deploying and maintaining ML systems in production using AWS services (SageMaker, Lambda, ECS/EKS, S3 etc.)
- Strong hands-on experience with tools like MLflow, WandB, or similar for experiment tracking and model management.
- Deep expertise in image segmentation and computer vision techniques using frameworks like PyTorch or TensorFlow.
- Production experience with ensemble models (xgboost, lightgbm, RF), ML Operations Expertise.
- Experience implementing model monitoring, drift detection, and alerting systems.
- Hands-on experience with canary deployments, A/B testing and Shadow deployments for ML models.
- Knowledge of active learning strategies and human-in-the-loop ML systems.
- Strong understanding of model evaluation metrics, bias detection, and performance analysis.
- Expert-level Python programming with ML libraries (scikit-learn, PyTorch/TensorFlow, NumPy, pandas, etc).
- Experience with distributed batch processing frameworks (Airflow, Step Functions, Argo Workflows, Spark, Dask, Ray or similar).
- Proficiency with AWS ML ecosystem and infrastructure-as-code (Terraform, CloudFormation).
- Hands-on experience with dataset versioning tools such as DVC, LakeFS, Delta Lake, Quilt, or Pachyderm.
- Strong software engineering fundamentals : unit/integration testing, CI/CD, version control, observability, design patterns.
- Experience with containerization (Docker, Kubernetes) for model deployment.
- Good to have experience with ML Orchestration tools like Kubeflow, Vertex AI, etc.
- Nice to have experience with GPUs : scheduling GPU jobs, optimising GPU performance, and memory profiling
Did you find something suspicious?