Description :

Role : Data Engineer (MongoDB & Databricks)

Location : Hyderabad or Mumbai (Remote)

Experience : 5+ Years

Role Summary :

The Data Engineer will be responsible for the architecture, development, and governance of high-performance data pipelines within a hybrid Big Data environment.

This role focuses on the seamless integration between NoSQL systems (MongoDB) and unified analytics platforms (Databricks).

You will leverage Python and Pandas for sophisticated data manipulation while ensuring that all data assets adhere to rigorous governance standards.

The ideal candidate is a systems-thinker who can reconcile complex, disparate datasets into a single source of truth for downstream analytics and business intelligence.

Responsibilities :

- Design and implement scalable data pipelines that ingest, transform, and load data from MongoDB into Databricks for advanced processing and analytics.

- Develop complex data transformation logic using Python and Pandas, ensuring optimal performance and memory management for large-scale datasets.

- Lead data governance systems analysis, ensuring data lineage, metadata management, and compliance standards are maintained across all data flows.

- Execute deep-dive data reconciliation to identify and resolve inconsistencies between transactional NoSQL databases and analytical data lakes.

- Build and maintain high-performance SQL queries for data extraction, profiling, and validation within Databricks SQL warehouses.

- Optimize MongoDB collections and indexing strategies to ensure efficient data retrieval for real-time and batch-processing pipelines.

- Collaborate with data architects to refine data models that support both the flexibility of document-based storage and the structure of relational analytics.

- Implement automated data quality checks and monitoring alerts to proactively identify pipeline failures or data drift.

- Utilize Databricks features such as Delta Lake to manage ACID transactions and versioning for massive-scale data environments.

- Partner with business stakeholders to translate governance requirements into technical validation rules and automated reconciliation workflows.

Technical Requirements :

Languages :

- Advanced proficiency in Python (specifically for data engineering) and SQL.

Data Libraries :

- Deep hands-on experience with Pandas for complex data cleaning, merging, and analysis.

NoSQL :

- Proven experience in managing and querying MongoDB (aggregations, indexing, and schema design).

Big Data Platforms :

- Hands-on experience with Databricks, including Spark SQL, Delta Lake, and notebook-based development.

Governance & Analysis :

- Demonstrated ability to perform systems analysis for data governance, focusing on data reconciliation and quality frameworks.

Environment :

- Familiarity with the Databricks ecosystem or similar cloud-based Big Data platforms (Azure, AWS, or GCP equivalents).

Preferred Skills :

- Experience with PySpark for distributed data processing in Databricks.

- Familiarity with orchestration tools such as Airflow or Azure Data Factory to manage multi-stage pipelines.

- Knowledge of CI/CD practices for data (e.g., using Git, Azure DevOps, or GitHub Actions).

- Certification in Databricks (Data Engineer Associate/Professional) or MongoDB (Developer Associate).

- Understanding of financial or highly regulated data environments where governance is critical.

- Excellent problem-solving skills with the ability to navigate "messy" data and create structured, reliable outputs.