We're looking for someone who's as comfortable building ML pipelines as they are optimizing infrastructure for scale. If you thrive on solving real-world data challenges, love experimenting, and don't shy away from getting your hands dirty with deployment, this is your jam.

Responsibilities :

- Apply a strong understanding of machine learning principles and algorithms, with a focus on LLMs such as GPT-4 BERT, and similar architectures.

- Leveraged deep learning frameworks like TensorFlow, PyTorch, or Keras to train and fine-tune LLMs.

- Utilize deep knowledge of computer architecture, especially GPUs, to maximize utilization and efficiency.

- Work with cloud platforms(AWS, Azure, GCP) to manage and optimize resources for training large-scale deep learning models.

- Use containerization and orchestration tools(Docker, Kubernetes) for scalable and reproducible ML deployments.

- Apply principles of parallel and distributed computing, including distributed training for deep learning models.

- Work with big data and distributed computing technologies(Hadoop, Spark) to handle large-volume datasets.

- Implement MLOps practices and use related tools to manage the complete ML lifecycle.

- Contribute to the infrastructure side of multiple ML projects, particularly those involving deep learning models such as BERT and Transformers.

- Manage resources and optimize performance for large-scale ML workloads, both on-premise and in the cloud.

- Handle challenges in training large models, including memory management, optimizing data loading, and troubleshooting hardware issues.

- Collaborate closely with data scientists and ML engineers to understand infrastructure needs and deliver efficient solutions.

Requirements :

- Strong knowledge of machine learning and deep learning algorithms, especially LLMs.

- Proficiency in Python and deep learning frameworks (TensorFlow, PyTorch, Keras).

- Expertise in GPU architecture and optimization.

- Experience with parallel and distributed computing concepts.

- Hands-on with containerization (Docker) and orchestration (Kubernetes).

Tech Stack and Tools :

- Cloud : AWS, Azure, GCP.

- Big Data : Hadoop, Spark.

- MLOps Tools : MLflow, Kubeflow, or similar.

- Infrastructure Optimization : Resource allocation, distributed training, GPU performance tuning.

Nice-to-Have :

- Prior experience training large-scale deep learning models(BERT, Transformers).

- Exposure to high-scale environments and large datasets.

- Ability to troubleshoot hardware bottlenecks and optimize data pipelines.

Did you find something suspicious?

Posted By

Akasmat

HR at Magna Hire

Last Active: 27 Nov 2025

Job Views:
55

Applications: 21

Recruiter Actions: 0

Posted in

AI/ML

Functional Area

ML / DL Engineering

Job Code

1531140

Jobs by location

Interview Questions for you

View All

Top 25 LLM Interview Questions and Answers

Top 50+ GitHub Interview Questions and Answers

Top 25+ Database Testing Interview Questions and Answers