Python Data Engineer

Location : Bengaluru / Kochi, India

Experience : 5-8 Years

Employment Type : Full-time

Job Overview :

We are seeking a skilled Python Data Engineer responsible for building and maintaining scalable, reliable data pipelines using Python and orchestration tools like Mage.AI or Airflow. You will be a key contributor to data ingestion, transformation, warehousing, and cloud-based deployment, playing a crucial role in our data initiatives.

Key Responsibilities :

- Design, build, and maintain robust and scalable data pipelines using Python.

- Utilize orchestration and workflow management tools, specifically Mage.AI, Airflow, or similar platforms, to automate and schedule data processes.

- Contribute across the entire data lifecycle, including data ingestion from various sources (APIs, databases, files), data transformation, and data warehousing.

- Manage and connect to data effectively in data warehouses such as PostgreSQL, AWS Redshift, or comparable systems.

- Ensure the reliability and efficiency of data flow and processing.

- Apply strong problem-solving and analytical skills to resolve complex data challenges.

- Collaborate effectively with cross-functional teams, demonstrating excellent communication abilities.

Required Skills & Qualifications

- Proven experience as a Python Developer with a strong focus on data engineering.

- Proficiency in Mage.AI, Airflow, or any similar data pipeline orchestration and workflow management tool.

- Solid understanding of ETL processes and data warehousing concepts and modeling techniques.

- Experience with connecting to and managing data in data warehouses like PostgreSQL, AWS Redshift, or similar.

- Familiarity with data ingestion from various sources, including APIs, databases, and files.

- Strong problem-solving and analytical skills.

- Excellent communication and collaboration abilities.

- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Preferred Qualifications :

- Experience with deploying data pipelines in cloud environments, such as AWS EMR or Kubernetes.

- Hands-on experience with PySpark for big data processing, transformation, analytics, and its machine learning libraries (Spark MLlib).

- Knowledge of real-time data processing and stream analytics.

- Familiarity with other data engineering tools and technologies.

- Contributions to open-source data projects.