We are looking for a skilled Gen AI Platform Engineer to join our team. The ideal candidate will have experience in managing LLM-based systems, with expertise in infrastructure management, prompt versioning, fine-tuning, and deployment. This role requires a strong understanding of GenAI workloads, performance tuning, scalability, and governance in cloud environments such as AWS, Azure, and Google Cloud.

Key Responsibilities :

- Manage and oversee the infrastructure for LLM-based systems, ensuring seamless operation and scalability.

- Fine-tune, evaluate, and deploy prompt-based models, leveraging industry-standard tools and platforms.

- Ensure the performance, scalability, and governance of GenAI workloads in cloud environments (AWS, Azure, Google Cloud).

- Build and deploy AI use cases and solutions using the respective platforms and tools.

- Collaborate with cross-functional teams to ensure effective deployment and performance optimization.

- Lead the evaluation and enhancement of LLM-based models through iterative testing and fine-tuning.

- Handle deployment pipelines, including CI/CD for LLM models.

- Contribute to setting up automated processes for model fine-tuning and versioning.

- Work on optimizing cloud-based infrastructure to support the growth of GenAI workloads.

Required Skills :

- Strong experience with cloud platforms such as AWS Sagemaker, Google Vertex, or Azure AI.

- Proficiency in handling LLM systems, prompt fine-tuning, and versioning.

- Hands-on experience with infrastructure management, model deployment, and optimization.

- Strong understanding of cloud architecture, performance, and scalability for GenAI workloads.

- Proficiency in Python, SQL, and Bash scripting.

- Experience with machine learning frameworks such as Hugging Face, TensorFlow, PyTorch.

- Familiarity with CI/CD pipelines, Docker, Kubernetes, and MLOps workflows.

- Strong analytical skills and ability to troubleshoot complex infrastructure issues.

Nice to Have Skills :

- Familiarity with NLP frameworks and libraries such as Hugging Face, TensorFlow, PyTorch.

- Experience in working with large-scale data processing frameworks like Apache Spark, Hadoop.

- Knowledge of model explainability and interpretability techniques for LLMs.

- Familiarity with containerization technologies (e.g., Docker, Kubernetes) for model deployment and orchestration.

- Hands-on experience with MLOps pipelines.

Tools & Technical Skills :

- Platforms : AWS Sagemaker, Google Vertex, Azure AI.

- Tools : Docker, Kubernetes, Terraform, Jenkins (CI/CD), MLflow.

- Languages : Python, SQL, Bash scripting.

- Frameworks : Hugging Face, TensorFlow, PyTorch, Keras.