Posted on: 23/07/2025
We are seeking a highly skilled and experienced Senior Python & ML Engineer with a strong background in PySpark, machine learning, and large language models (LLMs).
The ideal candidate will be instrumental in designing, developing, and deploying scalable data pipelines, machine learning models, and LLM-powered applications.
This role requires a deep understanding of Python's ecosystem, distributed computing with PySpark, and practical experience in building and optimizing AI solutions.
Responsibilities :
Data Engineering & ETL :
- Optimize PySpark jobs for performance, efficiency, and cost-effectiveness on large datasets.
- Implement data quality checks and ensure data integrity throughout the pipeline.
Machine Learning Development :
- Perform feature engineering, model selection, hyperparameter tuning, and model evaluation.
- Integrate ML models into production systems, ensuring scalability and reliability.
Large Language Model (LLM) Integration & Development :
- Develop and implement solutions leveraging LLMs for tasks such as natural language understanding, text generation, summarization, and question answering.
- Fine-tune and adapt pre-trained LLMs for specific business needs and datasets.
- Explore and implement techniques for prompt engineering, RAG (Retrieval Augmented Generation), and LLM evaluation.
Technical Leadership & Collaboration :
- Mentor junior team members and contribute to best practices for code quality, testing, and deployment.
- Participate in code reviews, design discussions, and architectural decisions.
- Stay up-to-date with the latest advancements in Python, PySpark, ML, and LLMs.
Deployment & Operations :
applications in production environments.
- Troubleshoot and resolve issues related to data pipelines, ML models, and LLM applications.
Required Skills & Qualifications :
Education : Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related quantitative field.
Python Expertise :
- In-depth knowledge of Python's scientific computing stack (NumPy, Pandas).
- Experience with testing frameworks (e.g., pytest) and version control (Git).
PySpark Proficiency :
- Solid understanding of Spark architecture, RDDs, DataFrames, and Spark SQL.
- Experience with optimizing Spark jobs for performance and resource utilization.
Machine Learning :
- Proficiency with key ML libraries such as scikit-learn, TensorFlow, and/or PyTorch.
- Understanding of various machine learning algorithms, their strengths, and limitations.
Large Language Models (LLMs) :
- Familiarity with LLM frameworks (e.g., Hugging Face Transformers, LangChain, LlamaIndex).
- Understanding of concepts like embeddings, tokenization, prompt engineering, and fine-tuning.
Cloud Platforms (Preferred) :
Other Key Skills :
- Excellent communication and teamwork abilities.
- Ability to work independently and as part of a collaborative team.
Nice-to-Have Skills :
developing interactive web applications using Shiny.
- Experience with streaming data technologies (e.g., Kafka, Spark Streaming).
- Familiarity with containerization technologies (Docker, Kubernetes).
- Knowledge of MLOps tools and practices (e.g., MLflow, Kubeflow).
- Experience with graph databases or other NoSQL databases.
- Contributions to open-source projects.
Did you find something suspicious?