HamburgerMenu
hirist

Houghton Mifflin Harcourt - Senior DevOps Engineer - AWS Infrastructure

Posted on: 06/01/2026

Job Description

Technical infrastructure :

- Cloud & Infrastructure: AWS EC2, Terraform Enterprise, Docker, Aurora, Mesos, Kubernetes, ELK (Elastic Search, Logstash & Kibana).

- Observability & Automation: Grafana, Prometheus, Datadog, Telegraf, Runscope, Apollo, GraphQL.

- Development Stack: Microservices architecture, Spring, Java & NodeJS, React, Express.js.

- Data & Storage: Amazon RDS, Dynamo DB, Postgres, Oracle, MySQL, Influx DB, Linux, Jenkins, GitHub.

- AI & Agentic Automation: AWS Bedrock LLMs and AWS Bedrock Engineer for building and integrating scalable, low-latency AI-driven automation capabilities.

About The Role :

You will constantly be asking, what are the most important infrastructure problems we need to solve for today, that will increase the reliability and performance of our applications and infrastructure.

- Identify and solve the most critical infrastructure challenges to improve system reliability, scalability, and performance.

- Design, test, and implement AI-enhanced DevOps workflows, including autonomous agents for monitoring, remediation, and optimization.

- Partner with SRE and development teams to build robust, self-service deployment pipelines and infrastructure tooling.

- Evaluate new technologies to continuously improve system automation, cost efficiency, and security.

- Work with AI-enhanced monitoring and self-healing infrastructure components powered by agentic patterns.

Key Responsibilities:

- Build, maintain, and evolve cloud infrastructure with Infrastructure as Code (Terraform, CloudFormation).

- Manage containerized workloads (Docker, Kubernetes) at scale, with a focus on extending capabilities through AI-driven orchestration.

- Implement and maintain advanced monitoring, observability, and alerting systems enhanced with agent-based analytics.

- Automate workflows to reduce manual intervention and accelerate delivery cycles.

- Collaborate with cross-functional teams to ensure infrastructure meets the needs of high-availability, low-latency applications.

- Regularly review and optimize existing architecture for cost, security, and performance improvements.

Skills & Experience :

- 6 to 10 years of hands-on SRE/DevOps experience in an Agile environment.

- Proven ability to collaborate across engineering and operations, with pragmatic problem-solving.

- Deep experience with AWS and infrastructure design patterns, and in recommending appropriate AWS services, including newer AI-focused tools like Bedrock.

- Strong knowledge and skills of AI-enhanced DevOps workflows and agentic infrastructure models.

- Able to quickly resolve outages, lead incident response, and restore service reliability.

- Proficiency in diagnosing outages and restoring service with urgency.

- Infrastructure as Code expertise (Terraform, CloudFormation).

- Experience with containerization (Docker, Kubernetes).

- Familiarity with CI/CD tools, scripting languages, and observability platforms.

- Strong collaboration skills, with the ability to influence and guide best practices

Preferred Skills And Interests :

- Solid RDBMS experience (Postgres, MySQL, etc.), with tuning and performance expertise.

- Strong Linux fundamentals.

- Event-driven systems and message queue management

- Security, including firewalls, load balancing, secret management.


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in