Description :

Experience : 8-9 Years

Location : Regional Tech Hub / Remote

Industry : Technology Consulting & AI Solutions

Education : B.E. / B.Tech / MCA in Computer Science, Data Science, or a related field.

Role Summary :

We are seeking a high-caliber Data Engineer to join our elite data team at CoffeeBeans. In this role, you will act as a "Data Architect & Pipeline Specialist," responsible for designing, building, and maintaining the scalable data infrastructure that powers our high-end consulting services and AI-based products.

You will leverage modern tools like Apache Spark, Airflow, and Databricks/Snowflake to transform raw data into actionable insights. The ideal candidate is a technical powerhouse who excels at optimizing complex data flows across multi-cloud environments while ensuring the highest standards of data quality and system reliability.

Responsibilities :

- Scalable Pipeline Architecture : Design, develop, and maintain efficient and reliable data pipelines using modern engineering best practices to support high-volume data processing.

- Data Flow Optimization : Build and optimize complex data flows between diverse internal and external sources and destinations, ensuring low latency and high throughput.

- Quality Governance & Monitoring : Implement and maintain rigorous data quality checks, automated monitoring, and alerting systems to ensure the integrity of data-driven decision-making.

- High-Quality Engineering : Write efficient, maintainable, and production-grade code in Python or Java, adhering to strict coding standards and performance benchmarks.

- Orchestration & Workflow Management : Utilize Apache Airflow to orchestrate complex task dependencies and manage end-to-end data workflows seamlessly.

- Infrastructure Troubleshooting : Proactively identify and resolve pipeline bottlenecks, performance issues, and data discrepancies to maintain 24/7 system availability.

- Technical Documentation & Review : Participate in peer code reviews and maintain comprehensive technical documentation to foster a culture of transparency and continuous improvement.

- Large-Scale Processing : Leverage Apache Spark for distributed data processing, transforming massive datasets into structured formats for analytics and AI models.

Technical Requirements :

- Data Professionalism : 8-9 years of experience in Data Engineering, specifically within high-end consulting or product-centric environments.

- Query & Modeling Mastery : Strong proficiency in advanced SQL and complex Data Modeling (Star Schema, Snowflake Schema, Data Vault).

- Programming Depth : Hands-on expertise in Python or Java for data manipulation and automation.

- Distributed Systems : Proven track record of using Apache Spark for large-scale data engineering tasks.

- Platform Expertise : Deep knowledge of Databricks and/or Snowflake; relevant certifications are highly desirable.

- Cloud Proficiency : Hands-on experience with at least one major cloud provider (AWS, GCP, or Azure).

Preferred Skills :

- Certifications : Databricks Certified Data Engineer Professional or Snowflake Pro Core Certification.

- AI/ML Integration : Experience in building feature stores or pipelines specifically for AI-based product categories.

- Real-time Streaming : Familiarity with Kafka or Spark Streaming for real-time data ingestion.

Core Competencies :

- Analytical Rigor : Ability to deconstruct complex business requirements into elegant technical data solutions.

- Problem Solving : A methodical approach to troubleshooting performance bottlenecks in distributed systems.

- Result Driven : A focus on delivering impactful data products that drive measurable business outcomes for clients.

- Impactful Collaboration : Ability to work within a high-end consulting framework, aligning technical delivery with client business goals.