Description :

- Design, build, and maintain end-to-end ML pipelines for batch processing of satellite and aerial imagery.

- Deploy and scale ML models in production on AWS infrastructure, leveraging services like SageMaker, Bedrock, or custom-built solutions.

- Implement MLflow for experiment tracking, model versioning, and model registry management.

- Architect batch inference systems optimised for throughput and cost-efficiency.

- Work with geospatial data formats and coordinate reference systems.

- Collaborate with data scientists to transition models from research to production.

- Partner with platform engineering to build scalable compute, GPU clusters, and storage layers.

- Implement comprehensive model monitoring systems to track performance degradation and data drift.

- Design and execute canary deployments and A/B testing frameworks for safe model rollouts.

- Build active learning pipelines to continuously improve model performance with minimal labelling effort.

- Establish model evaluation frameworks and performance benchmarking processes.

- Create alerting and observability systems for production ML workloads.

- Mentor ML engineers and data scientists on best practices for production ML.

- Drive technical decision-making on ML infrastructure and tooling.

- Collaborate with platform and data engineering teams to optimise the ML stack.

- Establish engineering standards and contribute to architectural roadmaps.

Requirements :

- 5+ years of experience in machine learning engineering with 2+ years in a senior or lead capacity.

- Proven track record deploying and maintaining ML systems in production using AWS services (SageMaker, Lambda, ECS/EKS, S3 etc.)

- Strong hands-on experience with tools like MLflow, WandB, or similar for experiment tracking and model management.

- Deep expertise in image segmentation and computer vision techniques using frameworks like PyTorch or TensorFlow.

- Production experience with ensemble models (xgboost, lightgbm, RF), ML Operations Expertise.

- Experience implementing model monitoring, drift detection, and alerting systems.

- Hands-on experience with canary deployments, A/B testing and Shadow deployments for ML models.

- Knowledge of active learning strategies and human-in-the-loop ML systems.

- Strong understanding of model evaluation metrics, bias detection, and performance analysis.

- Expert-level Python programming with ML libraries (scikit-learn, PyTorch/TensorFlow, NumPy, pandas, etc).

- Experience with distributed batch processing frameworks (Airflow, Step Functions, Argo Workflows, Spark, Dask, Ray or similar).

- Proficiency with AWS ML ecosystem and infrastructure-as-code (Terraform, CloudFormation).

- Hands-on experience with dataset versioning tools such as DVC, LakeFS, Delta Lake, Quilt, or Pachyderm.

- Strong software engineering fundamentals : unit/integration testing, CI/CD, version control, observability, design patterns.

- Experience with containerization (Docker, Kubernetes) for model deployment.

- Good to have experience with ML Orchestration tools like Kubeflow, Vertex AI, etc.

- Nice to have experience with GPUs : scheduling GPU jobs, optimising GPU performance, and memory profiling