Posted on: 02/09/2025
Experience : 6 -8 Years.
Location : Hyderabad.
Primary Skills :
- Infrastructure as Code (IaC) : Terraform, CloudFormation.
- Automation & Configuration Management : Ansible, CI/CD pipelines, container orchestration tools.
- Programming & Scripting : Python, GoLang, Java, Perl.
- AI/ML & Gen AI Services : AWS Bedrock, SageMaker, NLP, and other AI/ML tools.
- Cloud Infrastructure Design & Management : High availability, scalability, disaster recovery planning.
What Youll Do :
- Collaborate with the development team to understand requirements for application infrastructure.
- Design, deploy, and manage cloud infrastructure on AWS (or other cloud platforms) to ensure high availability, scalability, and performance.
- Use Terraform / CloudFormation to define and maintain Infrastructure as Code (IaC), automating provisioning and deployment processes.
- Develop and maintain automation scripts and playbooks using tools like Ansible to streamline configuration, management, and orchestration of resources.
- Contribute to building Gen AI competency on AWS, requiring strong expertise in AI/ML services such as Bedrock, SageMaker, NLP, and other advanced AI offerings.
- Establish monitoring and alerting systems to proactively identify performance issues.
- Conduct load testing to verify scalability and recovery capabilities.
- Write and maintain high-quality code to define, automate, and manage infrastructure.
- Collaborate closely with engineers, QA analysts, and other stakeholders to address technical issues and optimize infrastructure performance.
Roles & Responsibilities :
Infrastructure Management :
- Design, implement, and configure networking, storage, and security policies.
- Ensure scalability, reliability, and disaster recovery.
- Monitor infrastructure performance through logs, metrics, and alerts; troubleshoot issues quickly.
- Keep infrastructure updated with security patches, test changes before deployment.
Automation & CI/CD :
- Maintain and improve the entire product development lifecycle.
Application & Platform Health :
- Conduct root-cause analysis and resolve performance or infrastructure issues.
Collaboration & Innovation :
- Stay updated with emerging tools, platforms, and practices to ensure continuous improvement.
Requirements :
- Strong experience with cloud platforms (AWS, Azure, GCP) to deploy, monitor, and manage applications.
- Hands-on expertise with scripting languages/frameworks (Python, GoLang, Java, Perl).
- Deep understanding of CI/CD concepts and ability to design/manage pipelines for safe and efficient deployments.
- Expertise in AI/ML services (AWS Bedrock, SageMaker, NLP, etc.) to support Gen AI initiatives and competency development.
- Knowledge of networking fundamentals (TCP/IP, DNS, HTTP) and ability to configure secure, stable connections.
- Proficiency with infrastructure automation tools (Terraform, CloudFormation, Ansible).
- Strong troubleshooting and analytical skills to investigate logs, error messages, and code flow.
- Experience with caching, compression, and other optimization techniques for web services.
- Ability to define project goals, timelines, and resource allocation, while identifying and mitigating security threats.
- Strong communication and collaboration skills.
Did you find something suspicious?
Posted By
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1539404
Interview Questions for you
View All