Posted on: 12/11/2025
Description :
Roles & Responsibilities
- Develop and maintain Python-based data solutions using libraries such as SDGym, Synthpop, Faker, Synthea.
- Perform data wrangling, cleaning, and preprocessing for structured and unstructured datasets.
- Conduct experimentation and visualization using Jupyter notebooks, leveraging matplotlib, seaborn, Plotly, and similar tools.
- Integrate structured output tooling (Pydantic, LangChain) with LLM pipelines.
- Collaborate with cross-functional teams to deliver data-driven insights and solutions.
- Build and optimize data models, ensuring accuracy, scalability, and efficiency.
- Communicate findings and insights effectively through dashboards and reports.
- Stay up-to-date with emerging trends in synthetic data, generative modeling, and AI/ML techniques.
Skills & Qualifications :
Must-Have Skills :
- Strong Python programming skills with experience in SDGym, Synthpop, Faker, Synthea.
- Proficient in pandas, NumPy, scikit-learn, and other data wrangling libraries.
- Experience with Jupyter-based experimentation and data visualization tools (matplotlib, seaborn, Plotly).
- Familiarity with structured output tooling (Pydantic, LangChain) and integration with LLM pipelines.
Nice-to-Have Skills :
- Experience with PyTorch or TensorFlow for developing custom generative models.
- Comfortable with notebook-based environments (Jupyter, Databricks) for modeling and experimentation.
- Familiarity with BI tools such as Power BI, Streamlit, Dash for stakeholder-facing dashboards.
Education :
- Bachelors or Masters degree in Computer Science, Data Science, Statistics, or related field
Did you find something suspicious?