Description :
This role is for a highly skilled and experienced Data Pipeline Architect/Lead Engineer responsible for setting the technical direction and leading the development of robust, scalable, and optimized data solutions. The ideal candidate will possess deep expertise in distributed computing, data warehousing, and modern data architectures, with a focus on Databricks and Spark environments.
Key Responsibilities and Duties :
Data Pipeline Leadership & Design :
- Lead the design, development, and optimization of complex, high-volume data pipelines (ETL/ELT) using technologies like Databricks, Apache Spark, and advanced SQL.
- Ensure data pipelines are scalable, efficient, fault-tolerant, and meet strict performance requirements for real-time and batch processing.
Architectural Strategy & Governance :
- Drive and influence architectural decisions for the data platform, advocating for best-in-class solutions (e.g., Lakehouse architecture).
- Establish and enforce engineering best practices, coding standards, and quality assurance processes for data processing and data modeling.
- Evaluate and implement new technologies and tools to enhance data platform capabilities and performance.
Technical Mentorship & Team Leadership :
- Act as a subject matter expert in distributed data processing, data warehousing, and cloud-native data services.
- Mentor and guide junior and mid-level engineers on Spark tuning, Databricks features, data modeling techniques, and performance optimization.
- Contribute to the continuous improvement of development and deployment processes (CI/CD) for data solutions.
Cross-Functional Collaboration & Delivery :
- Collaborate closely with cross-functional partners (Data Scientists, Analytics Teams, Product Managers) to translate complex business requirements into high-quality, reliable, and secure data solutions.
- Ensure the delivered data solutions accurately support critical analytics, reporting, and machine learning initiatives.
- Own the technical delivery and stability of core data assets, ensuring data quality, governance, and reliability.
Required Skills and Qualifications :
- Experience : 5+ years of experience in data engineering, software engineering, or a related field, with 5+ years focused on distributed data processing leadership/architecture.
- Technical Mastery : Expert-level proficiency in Apache Spark (Scala, Python/PySpark, or Java) and deep practical experience with the Databricks platform (Delta Lake, notebooks, jobs, clusters).
- Data Languages : Exceptional command of Advanced SQL for complex data manipulation and performance tuning.
- Architecture : Strong understanding of Cloud Data Warehousing concepts (e.g., Snowflake, Google BigQuery, AWS Redshift) and experience implementing Lakehouse patterns.
- Cloud Experience : Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP) and their respective data services.
- Soft Skills : Excellent communication, collaboration, and problem-solving skills, with a proven ability to lead technical discussions and mentor team members.