This role will be responsible for owning the end-to-end data-structuring layer across the organisation. The individual will transform large volumes of raw, unstructured, and semi-structured data (such as SMS, device, bureau, and app data) into clean, standardised, and analysis-ready datasets. These structured datasets will directly power risk analytics, fraud detection, marketing insights, collections strategy, and policy decisioning.

Key Objective of the Role :

Ensure all raw lending data (SMS, Bureau, Device, AA, App logs) is captured, parsed, structured, and stored in a clean analytics-ready format inside databases (PostgreSQL, DynamoDB, AWS stack) so that the Risk and Data Science team can directly use it for feature creation, policy building, and portfolio monitoring.

Core Responsibilities :

1. End-to-End Data Ownership :

- Design, build, and maintain end-to-end data pipelines (batch + streaming) using AWS native services (Glue, Lambda, Step Functions, Kinesis, S3, Athena, Redshift, EMR/Spark, etc.) : ingestion ? parsing ? structuring ? storage

- Work closely with Tech, Product, and Data Science to define what data should be captured

- Maintain data documentation, data dictionaries, and schema governance

- Ensure data quality, consistency, and version control

2. Unstructured Data Processing (Highest Priority) :

- Parse raw SMS dumps and categorise into salary, EMI, loan apps, collections, credits, debits, OTP, etc.

- Process device fingerprint, behavioural logs, and vendor data (FinBox, AA, Bureau APIs)

- Convert JSON, logs, and raw API responses into structured feature tables

- Build regex/keyword-based parsers for financial SMS classification

3. Feature Implementation (From Risk & Data Science Team) :

- Implement feature creation logic provided by Risk/Data Science team

- Translate business and policy logic into SQL/Python pipelines

- Create reusable feature layers for underwriting, fraud, collections, and monitoring

- Maintain a feature store for consistent model and policy usage

4. Lending Data Understanding (Domain-Specific Requirement) :

- Work with Bureau data

- Structure SMS-derived financial variables (income, stress, EMI signals)

- Work with Account Aggregator and bank transaction datasets

- Understand fintech alternate data used in underwriting and fraud detection

5. Data Pipelines & Automation :

- Build and maintain ETL/ELT pipelines using Python & SQL

- Create cron jobs for automated data ingestion and feature refresh

- Automate vendor data pulls (Bureau, SMS SDK, AA, device data)

- Ensure low-latency pipelines for real-time underwriting use cases

6. Database Structuring & Storage Architecture :

- Structure clean datasets in PostgreSQL (analytics layer)

- Manage raw data storage in DynamoDB / S3 data lake

- Design normalized and denormalised tables for risk analytics

- Optimise database performance for large-scale query workloads

7. Dashboards & Readable Data Layer :

- Create analytics-ready datasets, implement & write Metabase queries and convert it into dashboards (Metabase / Power BI )

- Enable self-serve data access for Risk, Business, and Founders

- Support ad-hoc analysis requirements from leadership

8. Cross-Functional Collaboration (Very Important) :

- The role requires close collaboration with data science, tech, product, and business teams to ensure reliable data pipelines, well-defined schemas, API integrations,logging architecture and high data quality, enabling faster and more accurate decision-making across lending workflows.

Tech Stack (Current Environment) :

- AWS SERVICES

- PostgreSQL (Primary analytics DB)

- DynamoDB (Raw/NoSQL storage)

- Python (Pandas, NumPy, ETL frameworks)

- Advanced SQL

- APIs, JSON, and Log Data Handling

Must-Have Skills :

- 2 to 6 years experience in Data Engineering / Analytics Engineering / Fintech Data roles

- Strong Python and SQL (production level)

- Experience handling unstructured data (SMS, logs, JSON, APIs)

- Experience building data pipelines, schedulers, and cron jobs

- Strong database design and data modelling skills

- Ability to work in a startup environment with high ownership

- Familiarity with modern platforms like Snowflake, Google BigQuery, or Amazon Redshift

Good to Have (Highly Preferred) :

- Experience in Lending / NBFC / Fintech domain

- Experience working with Bureau, SMS, Device, or Banking data

- Experience with streaming (Kafka/Kinesis) and orchestration (Airflow or Step Functions)

- Experience with feature stores and risk analytics datasets

- Knowledge of regex, NLP basics for SMS parsing

- Experience supporting real-time decision engines / underwriting systems

Did you find something suspicious?

Similar jobs that you might be interested in

Posted by

Dhruvini Consulting Pvt Ltd

Recruiter at Dhruvini Consulting Pvt Ltd

Last Active: 18 Mar 2026

Job Views:
557

Applications: 492

Recruiter Actions: 71

Posted in

Data Analytics & BI

Functional Area

Data Analysis / Business Analysis

Job Code

1621353

Jobs by location

Interview Questions for you

View All

How to Write Leave Application for Urgent Work: Format & Samples (2025)

Top 90+ Machine Learning Interview Questions and Answers

Top 40+ Deep Learning Interview Questions and Answers