Description :

Role Overview :

We are seeking an experienced and highly skilled Applied Scientist to join our dynamic team. This role uniquely blends the capabilities of a Research Scientist, Data Scientist, and ML Software Engineerdesigned for someone who can bridge the gap between cutting-edge research, scalable algorithms, and production-grade software engineering. You will not only develop, train, and fine-tune LLM models but also architect and deploy them at enterprise scale. You will be responsible for the complete lifecycle of AI featuresfrom reading academic papers and prototyping algorithms to optimizing distributed systems, writing robust code, and shipping to production.

Unlike traditional data scientists who focus on analysis and dashboards, you will prioritize practical implementation at scale. You'll work across the full stack : model development, infrastructure optimization, deployment automation, and production monitoring. You will collaborate closely with engineering teams to integrate complex RAG systems, modular agent architectures, and advanced AI algorithms into our core platform, ensuring robustness, low latency, and high reliability.

Key Responsibilities :

Machine Learning & AI Development :

- Design and implement advanced Machine Learning models (supervised, unsupervised, reinforcement learning, deep neural networks, Transformers) to solve complex business problems and optimize end-to-end performance.

- Develop and fine-tune domain-specific LLMs, including multi-modal models (text, vision, speech, audio) using cutting-edge techniques : LoRA, QLoRA, PEFT, and full fine-tuning strategies.

- Translate GenAI and ML problems into well-defined problem statements, design optimal solution architectures, and evaluate multiple approaches using rigorous experimentation frameworks.

- Read, interpret, and implement complex academic research papers (from NIPS, ICML, ICLR, ACL, EMNLP), adapting novel methodologies into scalable, enterprise-ready software systems.

- Design and architect RAG (Retrieval-Augmented Generation) systems, knowledge graphs, vector databases integration, and modular agent architectures for enhanced reasoning and decision-making.

- Develop innovative solutions for prompt engineering, chain-of-thought reasoning, in-context learning, and agentic workflows to improve model reliability and performance.

- Conduct rigorous experiments to validate hypotheses, benchmark models, and optimize hyperparameters across distributed environments.

- Collaborate with academic and research partners to co-develop innovative solutions, publish findings, and integrate cutting-edge methods into production.

Software Engineering & Production Systems :

- Write production-quality, maintainable code in Python, C++, Java, that powers mission-critical AI applications serving thousands of concurrent users.

- Design scalable system architectures for serving ML models, considering trade-offs between latency, throughput, consistency, and cost in multi-cloud or on-premises environments.

- Optimize model inference for production deployment : implement quantization, pruning, knowledge distillation, and kernel optimization (CUDA, Triton, TensorRT) to achieve <100ms latency targets.

- Build and maintain end-to-end ML data pipelines : data collection, validation, cleaning, feature engineering, ETL processes, versioning, and streaming pipelines using Apache Spark, Airflow, Kafka, or Beam.

- Architect and develop API services to expose ML models using FastAPI, gRPC, or REST patterns, ensuring high availability, rate limiting, and graceful degradation.

- Implement robust error handling, logging, and observability throughout the ML stack; troubleshoot and resolve complex production issues with minimal downtime.

- Lead model versioning and experiment tracking using MLflow, Weights & Biases, or custom solutions; ensure reproducibility and auditability of all training runs.

- Develop A/B testing frameworks and monitoring systems to detect model drift, data skew, and performance degradation in production; automate retraining pipelines.

Infrastructure, DevOps & Deployment :

- Architect and deploy containerized ML applications using Docker, Kubernetes, and container orchestration platforms; design multi-zone/multi-region deployment strategies.

- Build CI/CD pipelines for automated testing, deployment, and rollback of ML models using Jenkins, GitLab CI, GitHub Actions, or similar tools.

- Optimize cloud infrastructure costs across AWS, Google Cloud, or Azure; leverage spot instances, autoscaling, and resource optimization techniques.

- Set up monitoring, alerting, and incident management systems for production ML systems using Prometheus, Grafana, ELK stack, or DataDog.

- Apply MLOps/LLMOps best practices : model registry, artifact management, feature stores, parameter stores, and end-to-end automation of the ML lifecycle.

- Manage data security and governance, including encryption, access controls, audit logs, and compliance with GDPR, HIPAA, or other regulatory requirements.

Collaboration & Leadership :

- Collaborate with cross-functional teams : Product, Engineering, Design, and Business stakeholders to align AI solutions with business objectives and deliver measurable value.

- Mentor and guide junior data scientists and ML engineers, fostering a culture of engineering excellence, scientific rigor, and continuous learning.

- Participate in architecture reviews and design discussions, providing technical guidance on system design, scalability, and reliability.

- Lead technical documentation and knowledge sharing within the team; present findings and innovations to stakeholders and the broader scientific community.

Required Qualifications :

Education & Foundations :

- Master's or Ph.D. degree in Computer Science, Machine Learning, Statistics, Mathematics, Physics, or a related quantitative field.

o Exception : Candidates with exceptional industry experience, strong CS fundamentals, and demonstrated expertise in shipping production ML systems will be considered.

Software Engineering Fundamentals (Critical) :

- Deep CS Fundamentals : Strong understanding of data structures (arrays, linked lists, trees, graphs, heaps), algorithms (sorting, searching, dynamic programming, graph algorithms), and computational complexity analysis (Big O notation).

- Object-Oriented Programming (OOP) : Experience designing and implementing classes, inheritance, polymorphism, interfaces, and design patterns (Singleton, Factory, Observer, Strategy, etc.).

- Software Design Principles : Knowledge of SOLID principles, DRY, KISS; ability to write clean, maintainable, well-documented code.

- Version Control & Collaboration : Expert-level proficiency with Git, GitHub, GitLab, or Bitbucket; comfortable with branching strategies, code reviews, and collaborative workflows.

- Testing & Quality Assurance : Experience writing unit tests (pytest, unittest, JUnit), integration tests, and system tests; understanding of mocking, fixtures, and test-driven development (TDD).

- Debugging & Problem-Solving : Strong capability to diagnose and fix complex bugs in distributed systems; experience with debugging tools and profilers.

Programming Languages & Development :

- Primary Language Expertise (choose at least one) :

1. Python : Advanced proficiency with libraries ecosystem (NumPy, Pandas, Scikit-learn, PyTorch, TensorFlow, FastAPI, pydantic)

2. C++ / C11 or later : Memory management, performance optimization, concurrency

3. Java : Spring Boot, dependency injection, JVM performance tuning

- Secondary Language Proficiency : Familiarity with 1-2 additional languages (Scala, Rust, TypeScript/Node.js, Kotlin).

- Scripting & Automation : Bash/Shell scripting for automation and DevOps tasks; ability to write scripts for data processing, deployment, and monitoring.

- SQL Expertise : Advanced SQL proficiency including complex queries, window functions, query optimization, indexing strategies; experience with PostgreSQL, MySQL, or other RDBMS

SOFT SKILLS & ATTRIBUTES :

- Builder Mindset : Passion for creating end-to-end systems that work in production. Comfortable starting from ambiguous requirements and iterating rapidly.

- Engineering Excellence : Holds yourself and your team to high standards of code quality, testing, documentation, and system reliability.

- Customer Orientation : Ability to translate business needs into practical, scalable solutions that deliver measurable value.

- Startup Agility : Thrives in fast-paced environments. Able to self-manage, prioritize, and drive initiatives independently.

- Scientific Rigor : Strong hypothesis-driven approach; comfort with experimentation, statistical validation, and empirical evaluation.

- AI-Native Curiosity : Openness to leveraging AI tools, agents, and emerging automation techniques to improve productivity and design smarter systems.

- Continuous Learner : Passionate about staying updated with latest research, tools, and best practices in ML, AI, and software engineering.

- Problem Solver : Systematic approach to debugging and troubleshooting; able to break down complex problems into manageable pieces.

- Collaborative Spirit : Excellent communication and interpersonal skills; comfortable working with engineers, researchers, product managers, and other stakeholders