Posted on: 23/01/2026
We are seeking DevOps Engineers who want to work on designing, building, and operating cloud-native, enterprise-level platforms connected to a large fleet of IoT devices. Our technology stack includes :
- Languages/Frameworks : Python, Java, C#/.NET
- Databases : DynamoDB, MySQL, MS-SQL, PostgreSQL, MongoDB, InfluxDB, TimescaleDB
- Cloud Platform : AWS
- Observability : Datadog, Grafana, Prometheus, OpenSearch, CloudWatch
Responsibilities :
- Design, deploy, automate, and manage AWS cloud-based production systems with IoT-connected devices, ensuring availability, performance, scalability, and security
- Build and maintain comprehensive observability solutions including metrics, logs, and distributed tracing to provide full-stack visibility across applications and infrastructure
- Design and implement alerting strategies that minimize noise, reduce alert fatigue, and enable rapid detection of production issues
- Develop runbooks, automated remediation workflows, and self-healing infrastructure to reduce mean time to recovery (MTTR)
- Analyze cloud spend and implement cost optimization strategies including right-sizing, Reserved Instances, Savings Plans, and resource lifecycle management
- Build dashboards and reporting tools to provide visibility into infrastructure costs and enable teams to make data-driven decisions
- Build and maintain self-service platforms through automation to increase developer productivity and assure product/service quality
- Troubleshoot and solve problems across AWS infrastructure and application domains; lead incident response and conduct blameless post-mortems
- Design durable and consistent patterns for distributed systems; recommend architecture and process improvements
- Collaborate across multiple functional and technical teams to deliver projects on time and build enterprise-level platforms per the roadmap
- Analyze and resolve complex infrastructure and application deployment issues
- Evaluate emerging technology trends to enable evolving business and operating models
- Facilitate the evaluation and selection of software products, services, and standards; design standard and custom software configurations
- Assess existing platforms to identify deficiencies and improvements; recommend whether to maintain, refresh, or retire products, services, or systems
- Ensure critical system security using industry-leading cloud security solutions
Requirements :
- 7+ years of overall experience, with 3+ years in enterprise environments
- 3+ years building and managing cloud and IoT platforms supporting large, highly available, enterprise-grade applications
- 4+ years working with AWS technologies (e.g., EC2, EKS, ECS, S3, Redshift, VPC, Glacier, IAM, CloudWatch, SQS, Lambda, CloudTrail, Systems Manager, KMS, Kinesis) with emphasis on the AWS Well-Architected Framework
- Strong experience implementing observability solutions including metrics collection, centralized logging, and distributed tracing (e.g., OpenTelemetry, Jaeger, X-Ray)
- Proven ability to design effective alerting systems with appropriate thresholds, escalation policies, and on-call rotations
- Experience with incident management, root cause analysis, and building automated remediation workflows
- Demonstrated track record of identifying and implementing AWS cost optimization strategies (right-sizing, Reserved Instances, Savings Plans, spot instances, resource scheduling)
- Familiarity with AWS cost management tools (Cost Explorer, Budgets, Cost Allocation Tags, Compute Optimizer)
- Strong Infrastructure-as-Code skills using tools such as Terraform, Ansible, Python, and Shell scripting
- Hands-on experience with containerization and orchestration (e.g., Docker, Kubernetes, AWS EKS, ECS)
- Solid experience in 24x7 production AWS environments, including CI/CD pipelines (Jenkins, GitLab CI, etc.)
- Strong understanding of Site Reliability Engineering principles, SLOs/SLIs/SLAs, error budgets, and chaos engineering
- Linux and Windows server administration
- Experience with observability and monitoring platforms (e.g., Datadog, Grafana, Prometheus, OpenSearch/Elastic Stack, CloudWatch, PagerDuty)
- Understanding of network topologies and protocols (DNS, HTTP/HTTPS, SSH, SFTP, SMTP)
- Experience with IT compliance and risk management frameworks (e.g., NIST, SOC 2, SOX, FedRAMP)
- Experience collaborating with client IT organizations to define appropriate solutions
Preferred Qualifications :
- AWS Solutions Architect Professional certification
- CKA : Certified Kubernetes Administrator certification
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
DevOps / Cloud
Job Code
1605498