Key Responsibilities :

- Design, develop, and maintain scalable data ingestion and data pipeline solutions

- Build and optimize batch and real-time data processing pipelines

- Work extensively with PySpark and SQL for data transformation and analytics

- Handle structured and semi-structured data from multiple sources

- Implement real-time data processing using streaming frameworks

- Ensure data quality, reliability, and performance of pipelines

- Collaborate with analytics, data science, and product teams

- Monitor, troubleshoot, and optimize data workflows

- Follow best practices for data engineering, security, and governance

Required Skills & Experience :

- 3+ years of experience as a Data Engineer

- Strong proficiency in SQL (complex queries, performance tuning)

- Hands-on experience with PySpark / Apache Spark

- Solid experience in building data ingestion and ETL/ELT pipelines

- Experience with real-time / streaming data processing

- Strong understanding of data warehousing and data modeling concepts

- Experience working on cloud platforms AWS or GCP

- Familiarity with version control tools (Git)

- Good problem-solving and analytical skills

Good to Have :

- Experience with streaming tools like Kafka, Spark Streaming, or Pub/Sub

- Knowledge of workflow orchestration tools (Airflow, Cloud Composer, etc.)

- Exposure to Big Data ecosystems (Hive, HDFS, BigQuery, Redshift)

- Experience with Docker and CI/CD pipelines

- Knowledge of Python beyond PySpark