About the role :
We are seeking a motivated and experienced Software Development Engineer Backend (SDE-II /SDE-III) to join our team. You will be responsible for maintaining and enhancing our Deep Learning and LLM backend systems that serve high-scale applications. Your work will directly impact our ability to deliver reliable, efficient, and innovative AI solutions to our customers.
Key Outcomes for the First 12 Months :
Achieve High Operational Standards :
- Ensure AI backend services meet stringent Service Level Objectives (SLOs) with a focus on 99.99% uptime, optimal accuracy, swift response times, and a high success ratio for all operations.
Quality of service :
- Advance the observability of model inference and performance metrics to facilitate proactive monitoring and rapid troubleshooting.
- Enhance the lead time for changes in AI backend services by emphasizing code quality, readability, and reliability. Implement release strategies that are quicker, safer, and more efficient.
- Identify and execute initiatives aimed at enhancing performance, operational efficiency, reliability, and cost-effectiveness, ensuring our systems deliver strategic business value.
- Drive customer delight by improving the Net Promoter Score (NPS) through faster response times to queries, effective bug fixes, and the swift execution of feature requests.
Pioneer Strategic Innovations :
- Lead strategic initiatives in Machine Learning (ML) and Large Language Model (LLM) inference to secure a competitive advantage for the business.
Mentorship and Leadership :
- Provide guidance and mentorship to junior team members, fostering an environment of growth and accountability for delivering exceptional outcomes.
Technical competencies :
Languages :
- Strong proficiency in Nodejs and Python.
ML inference :
- Hands on experience with ML Inference frameworks such as TritonServer, TensorRT, ONNX etc.
Backend systems :
- Deep understanding of REST APIs, distributed systems, Cloud Infrastructure (preferably AWS).
Code Quality :
- Write clean, maintainable, and efficient code. Conduct and participate in code reviews to ensure code quality and adherence to standards.
Scalability & Performance :
- Design and implement systems with a focus on performance, scalability, and fault-tolerance.
Production Releases :
- Familiarity with CI/CD pipelines, DevOps practices, and automated testing frameworks.
Monitoring & Observability :
- Proficient in implementing monitoring solutions for system reliability and performance. Utilize tools and frameworks for effective logging, alerting, and visualization of system metrics.
Mentorship :
- Ability to mentor juniors in the team. Ensure that designs and code align with the overall architecture and best practices.
Behavioral competencies :
Problem solving :
- You can analyze complex business problems and translate them into scalable, efficient technical solutions. You generate new and innovative approaches to solving problems.
Analytical skills :
- You are able to structure and process qualitative and quantitative data and draw insightful conclusions from it. You exhibit a probing mind and achieve penetrating insights.
Hardwork and Persistence :
- You possess a strong willingness to work hard. You demonstrate tenacity and willingness to go the distance to get something done.
Action bias :
- You have a tendency to take initiative and make decisions swiftly. You proactively move forward and tackle challenges, even in uncertain situations, prioritizing action over inaction.
Learn quickly :
- You have the ability to learn quickly and proficiently understand and absorb new information. You stay updated with emerging technologies and industry trends to drive continuous improvement.
Enthusiasm :
- You exhibit passion and excitement over work. You have a can-do attitude.
Efficiency :
- You produce significant output with minimal wasted effort. Flexibility / Adaptability You adjust quickly to changing priorities and conditions and cope effectively with complexity and change.
Qualifications :
- 2-5 years of total experience working with backend systems at scale preferably on AI / LLM/ DeepLearning systems.