Description :

Requirements :

A day in the life :

- Architect and deploy scalable infrastructure and platform services (Monitoring, logging, etc) with a focus on simplicity and automation.

- Own the performance and reliability of backend services, data pipelines, platform services, etc, and work with developers to ensure monitoring and alerting best practices are being adopted.

- Develop platform capabilities, such as Release management, Monitoring infrastructure, Logging centralization, CI/CD, etc that enables developers to develop and deploy software with high velocity and quality.

- Implement DevSecOps principles and continuously secure systems by conforming to InfoSec best practices.

- Keep a keen eye on infrastructure costs and capacity - and design systems to be cost effective.

- Develop a high level understanding of Fairmatic services and their relations, enabling you to debug and address critical issues and bottlenecks.

- Exemplify and foster Fairmatics humble, collaborative and impact-obsessed culture.

What you will need :

- 6+ years in Site Reliability role (DevOps/System Administration) maintaining Linux systems in cloud environments (We use AWS)

- Excellent understanding of Linux system and Network Fundamentals

- Deep understanding of Monitoring and Alerting systems such as Prometheus, Graphite or equivalent

- Experience with infrastructure automation tools such Ansible, Puppet or equivalent (We use Ansible)

- Expertise with general purpose scripting and programming languages like Python, Ruby or equivalent and Shell scripting (Bash)

- Experience managing HTTP APIs, Message brokers like Kafka, relational databases like Postgresql

- Knowledge of Hadoop and Big Data processing frameworks like Spark is a plus!

- Comfortable working in a highly agile, intensely iterative software development environment