HamburgerMenu
hirist

Analytics Engineer - Python/SQL

Dhruvini Consulting Pvt Ltd
2 - 6 Years
Mumbai

Posted on: 17/03/2026

Job Description

Role Summary :


This role will be responsible for owning the end-to-end data-structuring layer across the organisation. The individual will transform large volumes of raw, unstructured, and semi-structured data (such as SMS, device, bureau, and app data) into clean, standardised, and analysis-ready datasets. These structured datasets will directly power risk analytics, fraud detection, marketing insights, collections strategy, and policy decisioning.


Key Objective of the Role :


Ensure all raw lending data (SMS, Bureau, Device, AA, App logs) is captured, parsed, structured, and stored in a clean analytics-ready format inside databases (PostgreSQL, DynamoDB, AWS stack) so that the Risk and Data Science team can directly use it for feature creation, policy building, and portfolio monitoring.


Core Responsibilities :


1. End-to-End Data Ownership :


- Design, build, and maintain end-to-end data pipelines (batch + streaming) using AWS native services (Glue, Lambda, Step Functions, Kinesis, S3, Athena, Redshift, EMR/Spark, etc.) : ingestion ? parsing ? structuring ? storage

- Work closely with Tech, Product, and Data Science to define what data should be captured

- Maintain data documentation, data dictionaries, and schema governance

- Ensure data quality, consistency, and version control


2. Unstructured Data Processing (Highest Priority) :


- Parse raw SMS dumps and categorise into salary, EMI, loan apps, collections, credits, debits, OTP, etc.

- Process device fingerprint, behavioural logs, and vendor data (FinBox, AA, Bureau APIs)

- Convert JSON, logs, and raw API responses into structured feature tables

- Build regex/keyword-based parsers for financial SMS classification


3. Feature Implementation (From Risk & Data Science Team) :


- Implement feature creation logic provided by Risk/Data Science team

- Translate business and policy logic into SQL/Python pipelines

- Create reusable feature layers for underwriting, fraud, collections, and monitoring

- Maintain a feature store for consistent model and policy usage


4. Lending Data Understanding (Domain-Specific Requirement) :


- Work with Bureau data

- Structure SMS-derived financial variables (income, stress, EMI signals)

- Work with Account Aggregator and bank transaction datasets

- Understand fintech alternate data used in underwriting and fraud detection


5. Data Pipelines & Automation :


- Build and maintain ETL/ELT pipelines using Python & SQL

- Create cron jobs for automated data ingestion and feature refresh

- Automate vendor data pulls (Bureau, SMS SDK, AA, device data)

- Ensure low-latency pipelines for real-time underwriting use cases


6. Database Structuring & Storage Architecture :


- Structure clean datasets in PostgreSQL (analytics layer)

- Manage raw data storage in DynamoDB / S3 data lake

- Design normalized and denormalised tables for risk analytics

- Optimise database performance for large-scale query workloads


7. Dashboards & Readable Data Layer :


- Create analytics-ready datasets, implement & write Metabase queries and convert it into dashboards (Metabase / Power BI )

- Enable self-serve data access for Risk, Business, and Founders

- Support ad-hoc analysis requirements from leadership


8. Cross-Functional Collaboration (Very Important) :


- The role requires close collaboration with data science, tech, product, and business teams to ensure reliable data pipelines, well-defined schemas, API integrations,logging architecture and high data quality, enabling faster and more accurate decision-making across lending workflows.


Tech Stack (Current Environment) :


- AWS SERVICES

- PostgreSQL (Primary analytics DB)

- DynamoDB (Raw/NoSQL storage)

- Python (Pandas, NumPy, ETL frameworks)

- Advanced SQL

- APIs, JSON, and Log Data Handling


Must-Have Skills :


- 2 to 6 years experience in Data Engineering / Analytics Engineering / Fintech Data roles

- Strong Python and SQL (production level)

- Experience handling unstructured data (SMS, logs, JSON, APIs)

- Experience building data pipelines, schedulers, and cron jobs

- Strong database design and data modelling skills

- Ability to work in a startup environment with high ownership

- Familiarity with modern platforms like Snowflake, Google BigQuery, or Amazon Redshift


Good to Have (Highly Preferred) :


- Experience in Lending / NBFC / Fintech domain

- Experience working with Bureau, SMS, Device, or Banking data

- Experience with streaming (Kafka/Kinesis) and orchestration (Airflow or Step Functions)

- Experience with feature stores and risk analytics datasets

- Knowledge of regex, NLP basics for SMS parsing

- Experience supporting real-time decision engines / underwriting systems


info-icon

Did you find something suspicious?

Similar jobs that you might be interested in