Posted on: 01/04/2026
Description :
Position Overview :
We are looking for an engineer with strong experience in web scraping and data extraction to build systems that collect legal data from various public websites.
The role involves building reliable crawlers that extract court judgments, tribunal orders, and regulatory decisions and store them in structured form.
You will work closely with the leadership to build the core data acquisition infrastructure of the platform.
Key Responsibilities :
- Design and build crawlers to extract data from websites.
- Crawl listing pages and extract case metadata.
- Download judgment PDFs and maintain structured storage.
- Build automated pipelines to monitor websites and detect new judgments.
- Extract structured data such as case title, case number, court/bench, date, and judgment document.
- Store scraped data in formats suitable for further processing and search.
Skills and requisites :
- Strong experience with Python.
- Experience with web scraping and crawler development.
- Familiarity with browser automation tools such as Playwright or Scrapy.
- PDF data extraction pdfplumber, PyMuPDF, Apache Tika or equivalent
- Strong understanding of HTML parsing, pagination handling, and file downloads.
- Knowledge of anti-bot techniques rate limiting, session rotation, proxy management
Preferred Skills :
- Experience with large-scale crawlers.
- Experience working with document datasets.
- Familiarity with PDF extraction tools such as Apache Tika.
- AWS S3 storing and managing large volumes of raw documents
- Exposure to search systems such as Elasticsearch.
- Experience with AWS MSK / Kafka for event-driven pipelines
Experience :
- 3 to 7 years of experience in backend or data engineering roles.
- Prior experience building web crawlers or scraping systems is highly preferred
Did you find something suspicious?
Posted by
Posted in
Data Engineering
Functional Area
Data Mining / Analysis
Job Code
1625403