Posted on: 17/03/2026
Role Summary :
This role will be responsible for owning the end-to-end data-structuring layer across the organisation. The individual will transform large volumes of raw, unstructured, and semi-structured data (such as SMS, device, bureau, and app data) into clean, standardised, and analysis-ready datasets. These structured datasets will directly power risk analytics, fraud detection, marketing insights, collections strategy, and policy decisioning.
Key Objective of the Role :
Ensure all raw lending data (SMS, Bureau, Device, AA, App logs) is captured, parsed, structured, and stored in a clean analytics-ready format inside databases (PostgreSQL, DynamoDB, AWS stack) so that the Risk and Data Science team can directly use it for feature creation, policy building, and portfolio monitoring.
Core Responsibilities :
1. End-to-End Data Ownership :
- Design, build, and maintain end-to-end data pipelines (batch + streaming) using AWS native services (Glue, Lambda, Step Functions, Kinesis, S3, Athena, Redshift, EMR/Spark, etc.) : ingestion ? parsing ? structuring ? storage
- Work closely with Tech, Product, and Data Science to define what data should be captured
- Maintain data documentation, data dictionaries, and schema governance
- Ensure data quality, consistency, and version control
2. Unstructured Data Processing (Highest Priority) :
- Parse raw SMS dumps and categorise into salary, EMI, loan apps, collections, credits, debits, OTP, etc.
- Process device fingerprint, behavioural logs, and vendor data (FinBox, AA, Bureau APIs)
- Convert JSON, logs, and raw API responses into structured feature tables
- Build regex/keyword-based parsers for financial SMS classification
3. Feature Implementation (From Risk & Data Science Team) :
- Implement feature creation logic provided by Risk/Data Science team
- Translate business and policy logic into SQL/Python pipelines
- Create reusable feature layers for underwriting, fraud, collections, and monitoring
- Maintain a feature store for consistent model and policy usage
4. Lending Data Understanding (Domain-Specific Requirement) :
- Work with Bureau data
- Structure SMS-derived financial variables (income, stress, EMI signals)
- Work with Account Aggregator and bank transaction datasets
- Understand fintech alternate data used in underwriting and fraud detection
5. Data Pipelines & Automation :
- Build and maintain ETL/ELT pipelines using Python & SQL
- Create cron jobs for automated data ingestion and feature refresh
- Automate vendor data pulls (Bureau, SMS SDK, AA, device data)
- Ensure low-latency pipelines for real-time underwriting use cases
6. Database Structuring & Storage Architecture :
- Structure clean datasets in PostgreSQL (analytics layer)
- Manage raw data storage in DynamoDB / S3 data lake
- Design normalized and denormalised tables for risk analytics
- Optimise database performance for large-scale query workloads
7. Dashboards & Readable Data Layer :
- Create analytics-ready datasets, implement & write Metabase queries and convert it into dashboards (Metabase / Power BI )
- Enable self-serve data access for Risk, Business, and Founders
- Support ad-hoc analysis requirements from leadership
8. Cross-Functional Collaboration (Very Important) :
- The role requires close collaboration with data science, tech, product, and business teams to ensure reliable data pipelines, well-defined schemas, API integrations,logging architecture and high data quality, enabling faster and more accurate decision-making across lending workflows.
Tech Stack (Current Environment) :
- AWS SERVICES
- PostgreSQL (Primary analytics DB)
- DynamoDB (Raw/NoSQL storage)
- Python (Pandas, NumPy, ETL frameworks)
- Advanced SQL
- APIs, JSON, and Log Data Handling
Must-Have Skills :
- 2 to 6 years experience in Data Engineering / Analytics Engineering / Fintech Data roles
- Strong Python and SQL (production level)
- Experience handling unstructured data (SMS, logs, JSON, APIs)
- Experience building data pipelines, schedulers, and cron jobs
- Strong database design and data modelling skills
- Ability to work in a startup environment with high ownership
- Familiarity with modern platforms like Snowflake, Google BigQuery, or Amazon Redshift
Good to Have (Highly Preferred) :
- Experience in Lending / NBFC / Fintech domain
- Experience working with Bureau, SMS, Device, or Banking data
- Experience with streaming (Kafka/Kinesis) and orchestration (Airflow or Step Functions)
- Experience with feature stores and risk analytics datasets
- Knowledge of regex, NLP basics for SMS parsing
- Experience supporting real-time decision engines / underwriting systems
Did you find something suspicious?
Posted by
Posted in
Data Analytics & BI
Functional Area
Data Analysis / Business Analysis
Job Code
1621353