Posted on: 21/01/2026
Description :
What youll do Own AI feature delivery from prototype ?
- Build RAG pipelines (chunking, embeddings, vector stores), prompt/program orchestration, and guardrails.
- Fine-tune and/or distill models (open/closed source) for classification, generation, and tool-use.
- Implement robust offline & online evals (unit evals, golden sets, regression tests, user-feedback loops).
- Ship reliable services : APIs, workers, model servers, and monitoring/observability (latency, cost, quality).
- Partner with product/design to shape problem statements, success metrics, and experiment plans.
- Champion engineering best practices (reviews, testing, docs, incident learnings).
Requirements :
- Tech you might use here :
- Languages : Python, TypeScript/Node.
- AI/ML : PyTorch, Hugging Face, OpenAI/Anthropic/other LLM APIs, vLLM/TensorRT-LLM, LangChain/LlamaIndex (pragmatically).
- Data & Retrieval : Postgres, Redis, Milvus/pgvector/Weaviate, Kafka.
- Infra : Docker, Kubernetes, CI/CD, Grafana/Prometheus, cloud (AWS/GCP).
- Quality : Prompt/unit tests, offline eval harnesses, canary analysis, A/B testing.
- Were looking for 3 to 7+ years of software engineering experience, with 13+ in applied ML/LLM or search/retrieval.
- Strong Python engineering (typing, testing, packaging) and service design (APIs, queues, retries, idempotency).
- Hands-on with at least two of : RAG in prod, finetuning (LoRA/QLoRA), embeddings/annoy/hnsw, function/tool calling, or model serving at scale.
- Practical evaluation mindset : create golden datasets, design metrics (accuracy, faithfulness, toxicity, latency, cost).
- Product sense and ownership : you measure impact, not just model scores.
- Clear communication and collaborative habits (PRs, design docs, incident notes).
Nice to have :
- Experience with multi-tenant architectures, RBAC/ABAC, and data governance.
- Safety & reliability work (red-teaming, jailbreak defenses, PII handling).
- Frontend familiarity (React) to iterate quickly on UX for AI features.
- Prior startup experience or 0?1 product building.
What success looks like (first 90 days) :
- Ship a scoped AI feature into customer hands with an eval harness and dashboards.
- Reduce either latency or cost of an existing pipeline by ~2030% without quality loss.
- Add at least one reusable internal component (chunker, ranker, guardrail, eval set).
Interview process :
- Intro chat (30 min) : role fit & expectations.
- Technical deep-dive (60 min) : systems + ML/LLM problem solving.
- Practical exercise (take-home or pairing, 34 hrs) : build a small RAG/eval pipeline.
- Final loop (6090 min) : product & culture, past work, offer Q&A.
Did you find something suspicious?