Job Description

Responsibilities :

- Manage and optimize end-to-end MLOps pipelines for data collection model training, validation, and monitoring while ensuring team collaboration and effective resource allocation.

- Drive the implementation of model compression, quantization, and distributed training techniques to enhance performance, encouraging innovative solutions from team members.

- Track key metrics and optimize deployed models to ensure ongoing effectiveness, collaborating with team members to identify improvement opportunities.

- Collaborate with cloud architects and DevOps teams to design and maintain scalable ML infrastructure, ensuring effective resource management and deployment.

- Work closely with applied scientists and analysts to transform model requirements into production-ready solutions, facilitating teamwork across departments.

- Establish and maintain monitoring and alerting systems for deployed models, ensuring prompt issue resolution while guiding the team in best practices.

- Create and uphold documentation for ML architecture and best practices to ensure knowledge sharing within the team, promoting continuous improvement.

- Stay current with advancements in ML technologies and lead ongoing enhancement initiatives within the team, encouraging team participation in the ML community.

Requirements :

- Bachelor's/Master's / PhD in Computer Science or related field.

- 5 years of experience in machine learning with a strong portfolio of deployed ML models for various use cases, including batch streaming and real-time.

- Proficient in Python for model development and data manipulation with experience in Java or Scala for building production systems.

- Familiarity with messaging queues (e. g. Kafka SQS) and MLOps tools (e. g. MLflow, Kubeflow, Airflow).

- Experience with cloud platforms (AWS, Google Cloud, Azure) and containerization technologies (Docker, Kubernetes).

- Knowledge of machine learning frameworks (e. g. TensorFlow, PyTorch) and databases (e. g. Elasticsearch, MongoDB, PostgreSQL).

- Understanding of data processing and ETL tools (e. g. Apache Spark, Kafka).

- Experience with monitoring tools like Grafana and Prometheus.

- Strong problem-solving skills and an analytical mindset.