We are looking for a Senior Data Engineer with strong expertise in Azure Databricks, PySpark, and distributed computing to develop and optimize scalable ETL pipelines for manufacturing analytics. The role involves working with high-frequency industrial data to enable real-time and batch data processing.

KRA :

Build scalable real-time and batch processing workflows using Azure Databricks, PySpark, and Apache Spark.

- Perform data pre-processing, including cleaning, transformation, deduplication, normalization, encoding, and scaling to ensure high-quality input for downstream analytics.

- Design and maintain cloud-based data architectures, including data lakes, lakehouses, and warehouses, following Medallion Architecture.

- Deploy and optimize data solutions on Azure (preferred), AWS, or GCP with a focus on performance, security, and scalability.

- Develop and optimize ETL/ELT pipelines for structured and unstructured data from IoT, MES, SCADA, LIMS, and ERP systems.

- Automate data workflows using CI/CD and DevOps best practices, ensuring security and compliance with industry standards

- Monitor, troubleshoot, and enhance data pipelines for high availability and reliability.

- Utilize Docker and Kubernetes for scalable data processing.

- Collaborate with automation team, data scientists and engineers to provide clean, structured data for AI/ML models.