Data Engineering :

- Design and maintain robust ETL/ELT pipelines for ML datasets.

- Implement real-time data streaming for inference (Kafka, Flink).

- Implement data lakes and warehouses optimized for AI workloads.

- Implement partitioning, indexing, and caching strategies for high-performance data retrieval.

- Optimize data storage and retrieval for AI workloads (data lakes, warehouses).

- Establish data validation and lineage tracking to ensure integrity and compliance.

- Apply best practices for handling sensitive data in AI contexts (GDPR, CCPA).

Machine Learning Integration :

- Collaborate on feature engineering and dataset preparation.

- Implement automated data preprocessing for ML models (normalization, encoding, augmentation).

- Collaborate with data scientists to create feature stores and reusable feature pipelines.

- Deploy ML models using MLOps practices (CI/CD, model versioning).

- Maintain version control for datasets, models, and pipelines.

- Monitor model performance and automate retraining workflows.

Required Skills :

- 5+ years in data engineering or ML engineering roles.

- Experience building end-to-end ML pipelines.

- Familiarity with vector databases (Pinecone, Weaviate) and embedding techniques.

- Exposure to generative AI and LLM-based applications.

- Programming : Python, SQL, Spark.

- ML Frameworks : TensorFlow, PyTorch, Scikit-learn.

- Data Tools : Airflow, dbt, Kafka.

- MLOps Tools : MLflow, Kubeflow, TensorFlow Serving.

- Cloud Platforms : AWS, Azure, GCP.

Working Conditions :

- This position may require evening and weekend work for time-sensitive project implementations