Posted on: 03/11/2025
Description :
- Design and implement CI/CD pipelines for AI and ML model training, evaluation, and RAG system deployment (including LLMs, vectorDB, embedding and reranking models, governance and observability systems, and guardrails).
- Provision and manage AI infrastructure across cloud hyperscalers (AWS/GCP), using infrastructure-as-code tools -strong preference for Terraform-.
- Maintain containerized environments (Docker, Kubernetes) optimized for GPU workloads and distributed compute.
- Support vector database, feature store, and embedding store deployments (e.g., pgVector, Pinecone, Redis, Featureform. MongoDB Atlas, etc).
- Monitor and optimize performance, availability, and cost of AI workloads, using observability tools (e.g., Prometheus, Grafana, Datadog, or managed cloud offerings).
- Collaborate with data scientists, AI/ML engineers, and other members of the platform team to ensure smooth transitions from experimentation to production.
- Implement security best practices including secrets management, model access control, data encryption, and audit logging for AI pipelines.
- Help support the deployment and orchestration of agentic AI systems (LangChain, LangGraph, CrewAI, Copilot Studio, AgentSpace, etc.
Must Haves :
- 4+ years of DevOps, MLOps, or infrastructure engineering experience.
- Preferably with 2+ years in AI/ML environments.
- Hands-on experience with cloud-native services (AWS Bedrock/SageMaker, GCP Vertex AI, or Azure ML) and GPU infrastructure management.
- Strong skills in CI/CD tools (GitHub Actions, ArgoCD, Jenkins) and configuration management (Ansible, Helm, etc.
- Proficient in scripting languages like Python, Bash, -Go or similar is a nice plus-.
- Experience with monitoring, logging, and alerting systems for AI/ML workloads.
- Deep understanding of Kubernetes and container lifecycle management.
Bonus Attributes :
- Exposure to MLOps tooling such as MLflow, Kubeflow, SageMaker Pipelines, or Vertex Pipelines.
- Familiarity with prompt engineering, model fine-tuning, and inference serving.
- Experience with secure AI deployment and compliance frameworks.
- Knowledge of model versioning, drift detection, and scalable rollback strategies.
Abilities :
- Ability to work with a high level of initiative, accuracy, and attention to detail.
- Ability to prioritize multiple assignments effectively.
- Ability to meet established deadlines.
- Ability to successfully, efficiently, and professionally interact with staff and customers.
- Excellent organization skills.
- Critical thinking ability ranging from moderately to highly complex.
- Flexibility in meeting the business needs of the customer and the company.
- Ability to work creatively and independently with latitude and minimal supervision.
- Ability to utilize experience and judgment in accomplishing assigned goals.
- Experience in navigating organizational structure.
Did you find something suspicious?