HamburgerMenu
hirist

Job Description

We are seeking DevOps Engineers who want to work on designing, building, and operating cloud-native, enterprise-level platforms connected to a large fleet of IoT devices. Our technology stack includes :

- Languages/Frameworks : Python, Java, C#/.NET

- Databases : DynamoDB, MySQL, MS-SQL, PostgreSQL, MongoDB, InfluxDB, TimescaleDB

- Cloud Platform : AWS

- Observability : Datadog, Grafana, Prometheus, OpenSearch, CloudWatch

Responsibilities :

- Design, deploy, automate, and manage AWS cloud-based production systems with IoT-connected devices, ensuring availability, performance, scalability, and security

- Build and maintain comprehensive observability solutions including metrics, logs, and distributed tracing to provide full-stack visibility across applications and infrastructure

- Design and implement alerting strategies that minimize noise, reduce alert fatigue, and enable rapid detection of production issues

- Develop runbooks, automated remediation workflows, and self-healing infrastructure to reduce mean time to recovery (MTTR)

- Analyze cloud spend and implement cost optimization strategies including right-sizing, Reserved Instances, Savings Plans, and resource lifecycle management

- Build dashboards and reporting tools to provide visibility into infrastructure costs and enable teams to make data-driven decisions

- Build and maintain self-service platforms through automation to increase developer productivity and assure product/service quality

- Troubleshoot and solve problems across AWS infrastructure and application domains; lead incident response and conduct blameless post-mortems

- Design durable and consistent patterns for distributed systems; recommend architecture and process improvements

- Collaborate across multiple functional and technical teams to deliver projects on time and build enterprise-level platforms per the roadmap

- Analyze and resolve complex infrastructure and application deployment issues

- Evaluate emerging technology trends to enable evolving business and operating models

- Facilitate the evaluation and selection of software products, services, and standards; design standard and custom software configurations

- Assess existing platforms to identify deficiencies and improvements; recommend whether to maintain, refresh, or retire products, services, or systems

- Ensure critical system security using industry-leading cloud security solutions

Requirements :

- 7+ years of overall experience, with 3+ years in enterprise environments

- 3+ years building and managing cloud and IoT platforms supporting large, highly available, enterprise-grade applications

- 4+ years working with AWS technologies (e.g., EC2, EKS, ECS, S3, Redshift, VPC, Glacier, IAM, CloudWatch, SQS, Lambda, CloudTrail, Systems Manager, KMS, Kinesis) with emphasis on the AWS Well-Architected Framework

- Strong experience implementing observability solutions including metrics collection, centralized logging, and distributed tracing (e.g., OpenTelemetry, Jaeger, X-Ray)

- Proven ability to design effective alerting systems with appropriate thresholds, escalation policies, and on-call rotations

- Experience with incident management, root cause analysis, and building automated remediation workflows

- Demonstrated track record of identifying and implementing AWS cost optimization strategies (right-sizing, Reserved Instances, Savings Plans, spot instances, resource scheduling)

- Familiarity with AWS cost management tools (Cost Explorer, Budgets, Cost Allocation Tags, Compute Optimizer)

- Strong Infrastructure-as-Code skills using tools such as Terraform, Ansible, Python, and Shell scripting

- Hands-on experience with containerization and orchestration (e.g., Docker, Kubernetes, AWS EKS, ECS)

- Solid experience in 24x7 production AWS environments, including CI/CD pipelines (Jenkins, GitLab CI, etc.)

- Strong understanding of Site Reliability Engineering principles, SLOs/SLIs/SLAs, error budgets, and chaos engineering

- Linux and Windows server administration

- Experience with observability and monitoring platforms (e.g., Datadog, Grafana, Prometheus, OpenSearch/Elastic Stack, CloudWatch, PagerDuty)

- Understanding of network topologies and protocols (DNS, HTTP/HTTPS, SSH, SFTP, SMTP)

- Experience with IT compliance and risk management frameworks (e.g., NIST, SOC 2, SOX, FedRAMP)

- Experience collaborating with client IT organizations to define appropriate solutions

Preferred Qualifications :

- AWS Solutions Architect Professional certification

- CKA : Certified Kubernetes Administrator certification

info-icon

Did you find something suspicious?

Similar jobs that you might be interested in