Posted on: 08/01/2026
Description :
We are seeking an experienced Infrastructure Site Reliability Engineer (SRE) to join our team. This role is critical for ensuring the reliability, scalability, and performance of our infrastructure, particularly in managing and optimizing high-throughput data systems. You will work closely with engineering teams to design, implement, and maintain robust infrastructure solutions that meet our growing needs.
Job Overview :
- As an Infrastructure SRE at DataBahn, you will be at the forefront of managing and optimizing our Kafka andOpenSearch clusters, AWS services, and multi-cloud environments.
- Your expertise will be key in ensuring themooth operation of our infrastructure, enabling us to deliver high-performance and reliable services.
- This is an exciting opportunity to contribute to a dynamic team that is shaping the future of data observability and orchestration pipelines.
Key Responsibilities :
- Kafka Management : Set up, manage, and scale Kafka clusters, including implementing and optimizing Kafka Streams and Connect for seamless data integration. Fine-tune Kafka brokers and optimize producer/consumer configurations to ensure peak performance.
- OpenSearch Expertise : Configure and manage OpenSearch clusters, optimizing indexing strategies and query performance. Ensure high availability and fault tolerance through effective data replication and sharding. Set up monitoring and alerting systems to track cluster health.
- AWS Services Proficiency : Manage AWS RDS instances, including provisioning, configuration, and scaling. Optimize database performance and ensure robust backup and recovery strategies. Deploy,manage, and scale Kubernetes clusters on AWS EKS, configuring networking and security policies, and integrating EKS with CI/CD pipelines for automated deployment.
- Multi-Cloud Environment Management : Design and manage infrastructure across multiple cloud providers, ensuring seamless cloud networking and security. Implement disaster recovery strategies and optimize costs in a multi-cloud setup.
- Linux Administration : Optimize Linux server performance, manage system resources, and automate processes using shell scripting. Apply best practices for security hardening and troubleshoot Linux- related issues effectively.
- CI/CD Automation : Design and manage CI/CD pipelines using tools like Jenkins, GitLab CI, or CircleCI, and ArgoCD. Automate deployment processes, integrate with version control systems, and implement advanced deployment strategies like blue-green deployments, canary releases, and rolling updates. Ensure security and compliance within CI/CD processes.
Qualifications :
- Bachelors, Masters, or Doctorate in Computer Science or a related field.
- Deep knowledge of Kafka, with hands-on experience in cluster setup, management, and performance tuning.
- Expertise in OpenSearch cluster management, indexing, query optimization, and monitoring.
- Proficiency with AWS services, particularly RDS and EKS, including experience in database management, performance tuning, and Kubernetes deployment.
- Experience in managing multi-cloud environments, with a strong understanding of cloud networking, security, and cost optimization strategies.
- Strong background in Linux administration, including system performance tuning, shell scripting, and security hardening.
- Proficiency with CI/CD automation tools and best practices, with a focus on secure and compliant pipeline management.
- Strong analytical and problem-solving skills, essential for troubleshooting complex technical challenges
Did you find something suspicious?
Posted by
Posted in
DevOps / SRE
Functional Area
Site Reliability Engineering
Job Code
1598338