- Develop and maintain Python-based data solutions using libraries such as SDGym, Synthpop, Faker, Synthea.

- Perform data wrangling, cleaning, and preprocessing for structured and unstructured datasets.

- Conduct experimentation and visualization using Jupyter notebooks, leveraging matplotlib, seaborn, Plotly, and similar tools.

- Integrate structured output tooling (Pydantic, LangChain) with LLM pipelines.

- Collaborate with cross-functional teams to deliver data-driven insights and solutions.

- Build and optimize data models, ensuring accuracy, scalability, and efficiency.

- Communicate findings and insights effectively through dashboards and reports.

- Stay up-to-date with emerging trends in synthetic data, generative modeling, and AI/ML techniques.

Skills & Qualifications :

Must-Have Skills :

- Strong Python programming skills with experience in SDGym, Synthpop, Faker, Synthea.

- Proficient in pandas, NumPy, scikit-learn, and other data wrangling libraries.

- Experience with Jupyter-based experimentation and data visualization tools (matplotlib, seaborn, Plotly).

- Familiarity with structured output tooling (Pydantic, LangChain) and integration with LLM pipelines.

Nice-to-Have Skills :

- Experience with PyTorch or TensorFlow for developing custom generative models.

- Comfortable with notebook-based environments (Jupyter, Databricks) for modeling and experimentation.

- Familiarity with BI tools such as Power BI, Streamlit, Dash for stakeholder-facing dashboards.

Education :

- Bachelors or Masters degree in Computer Science, Data Science, Statistics, or related field