HamburgerMenu
hirist

Job Description



Job Description :

Customer Interview

No location criteria

Key Responsibilities :


- Analyze tracing logs from LLM inference and training runs to identify performance issues and inefficiencies.

- Develop tools and scripts to parse, visualize, and monitor LLM tracing data.

- Collaborate with ML and infra teams to recommend and implement performance optimizations.

- Create documentation and dashboards to track optimization progress over time.

- Investigate and resolve model latency and throughput issues related to runtime behavior.

- Contribute to best practices for performance tracing, benchmarking, and logging across model deployments.

Required Qualifications :


- Bachelors or Masters degree in Computer Science, Machine Learning, or related field.

- Experience working with large-scale ML models, preferably LLMs (e.g., GPT, BERT, etc.)

- Proficiency in Python and common ML frameworks (e.g., PyTorch, TensorFlow).

- Familiarity with model tracing tools such as PyTorch Profiler, TensorBoard, DeepSpeed, or similar.

- Strong problem-solving skills and attention to detail in analyzing complex logs and metrics.

Preferred Qualifications :


- Experience with distributed training/inference and GPU performance optimization.

- Knowledge of systems profiling tools (e.g., NVIDIA Nsight, perf, Flamegraphs).

- Background in MLOps, observability, or AI infrastructure.


info-icon

Did you find something suspicious?