Posted on: 18/03/2026
About NStarX :
NStarX is an AI-first, Cloud-first engineering services provider built and led by practitioners. We specialize in transforming businesses through cutting-edge technology solutions. With years of expertise, we deliver scalable, data-driven systems that empower our clients to make smarter, faster decisions.
Role Summary :
We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist.
The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges.
This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.
Key Responsibilities :
Data Engineering :
- Design, build, and maintain scalable ETL/ELT pipelines for large-scale data processing.
- Develop and optimize data architectures supporting analytics and ML workflows.
- Ensure data integrity, security, and compliance with organizational and industry standards.
- Collaborate with DevOps teams to deploy and monitor data pipelines in production environments.
Data Science & AI/ML :
- Build predictive and prescriptive models leveraging AI/ML techniques.
- Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit- learn.
- Perform feature engineering, statistical analysis, and data pre-processing.
- Continuously monitor and optimize models for accuracy and scalability.
- Integrate AI-driven insights into business processes and strategies.
Client Interaction :
- Serve as the technical liaison between NStarX and client teams, ensuring clear communication and alignment on deliverables.
- Participate in client discussions, requirement gathering, and design reviews.
- Provide status updates, insights, and recommendations directly to client stakeholders.
- Work flexibly with customers based on US time zones to support real-time collaboration and delivery.
Required Qualifications :
- Experience : 4+ years in Data Engineering and AI/ML roles.
- Education : Bachelor's or Master's degree in Computer Science, Data Science, or a related field.
Technical Skills (Necessary) :
- Languages/Libraries : Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
- Compute : Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
- Storage : Amazon S3 (Parquet), lifecycle to Glacier
- Catalog : AWS Glue (Catalog & Crawlers)
- Orchestration/Serverless : AWS Step Functions, AWS Lambda, Amazon EventBridge
- Ingestion : CloudWatch Logs/Metrics, Kinesis Data Firehose (or Kafka/MSK)
- Warehouse : Amazon Redshift + Redshift Spectrum
- Security/Access : IAM (least privilege), Secrets Manager / SSM
- Ops/Collab : Git + CI (Jenkins/GitHub/GitLab), CloudWatch logging/metrics
Nice to Have :
- Scala, Docker, Kubernetes (Spark-on-K8s), k9s
- Fast stores (DynamoDB/MongoDB/Redis) for side lookups/indices
- Databricks, Jupyter
- FinOps exposure (cost baselines, dashboards)
Core Skills (Hands-on Responsibilities) :
Data Lake to Data Mart Design :
- Design layered data lake to data mart models (raw - processed - merged - aggregated).
- Implement hive-style partitioning (year/month/day), with retention and archival strategies.
- Define schema contracts, decision logic, and state machine handoffs.
Spark ETL Development :
- Author robust PySpark/Scala jobs for parsing, flattening, merging, and aggregation.
- Tune performance via broadcast joins, partition pruning, and shuffle control.
- Implement atomic, overwrite-by-partition writes and idempotent operations.
Warehouse Synchronization :
- Perform idempotent DELETE+INSERT/MERGE into Redshift using enumerated partition filters.
- Maintain audit-friendly SQL (deterministic predicates, counts of deleted/inserted/affected rows).
Data Quality, Reliability & Observability :
- Build repeatable, scalable, automated ETL pipelines with idempotency and cost efficiency.
- Implement schema drift checks, duplicate prevention, and partition reconciliation.
- Monitor EMR/K8s lifecycle, cluster right-sizing, and cost tracking (FinOps awareness).
Ingestion & Storage :
- Build log/event pipelines (CloudWatch/Kinesis/Firehose) into S3 using gzip + date partitions.
- Manage bucket layout, lifecycle rules (hot - Glacier), and data catalog consistency.
- Understand compression types (GZip, Snappy) and Hive-style directory structures.
Orchestration & Automation :
- Implement AWS Step Functions with Choice/Map/Parallel, retries, and backoff mechanisms.
- Automate scheduling via Event Bridge and deploy guardrail Lambdas.
- Parameterize pipelines for environments (dev/stage/prod) and selective recomputations.
Soft Skills :
- Strong analytical and problem-solving capabilities.
- Excellent communication for client engagement and stakeholder presentations.
- Proven ability to work flexibly with global teams, especially US-based customers.
- Team-oriented, proactive, and adaptable in fast-paced environments.
Preferred Qualifications :
- Experience with MLOps and end-to-end AI/ML deployment pipelines.
- Knowledge of NLP and Computer Vision.
- Certifications in AI/ML, AWS, Azure, or GCP.
Benefits :
- Competitive salary and performance-based incentives.
- Opportunity to work on cutting-edge AI/ML projects.
- Exposure to global clients and international project delivery.
- Continuous learning and professional development opportunities.
The job is for:
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Engineering
Job Code
1621548