Data Scientist - Optical Character Recognition

InfoTrellis India Pvt Ltd
Bangalore
5 - 8 Years

Posted on: 22/03/2025

Job Description

Job Description : Data Scientist

Company : Mastech Digital

Location : Bangalore Urban, Karnataka, India

Position Type : Full Time

Duration : Permanent

Notice Period : Immediate Joiner / Serving Notice / Less than 30 Days

Experience : 5+ Years

About the Role :

Mastech Digital is seeking a highly skilled and experienced Data Scientist to join our dynamic team. In this role, you will be responsible for developing and deploying advanced AI models, with a focus on OCR, LLMs, and computer vision. You will work within the AWS ecosystem, adhering to best practices for code quality, data security, and model deployment. This position requires a strong understanding of machine learning techniques, cloud technologies, and the ability to collaborate effectively with cross-functional teams.

Responsibilities/Duties :

AI Model Development and Deployment :

- Train and fine-tune AI models using OCR and Large Language Models (LLMs).

- Develop and implement computer vision models for object detection and segmentation.

- Deploy and maintain models in production, collaborating with software engineers.

Cloud Infrastructure and Architecture :

- Utilize AWS services, including SageMaker, Bedrock, Lambda, S3, and API Gateway, for model development and deployment.

- Adhere to the AWS Well-Architected Framework for robust and scalable solutions.

Data Management and Security :

- Perform data cleaning and preprocessing to ensure high-quality training data.

- Ensure data confidentiality and implement HIPAA compliance measures.

Software Development Practices :

- Follow internal best practices for code monitoring, testing, and version control.

- Implement CI/CD pipelines using Jenkins and other relevant tools.

- Conduct thorough QA and application testing.

Model Evaluation and Optimization :

- Perform robust testing of models to ensure accuracy and reliability.

- Compare the feasibility of different models and select the most appropriate solution.

- Fine-tune LLMs (Mistral, Llama, and other open-source models) and perform prompt tuning.

Collaboration and Communication :

- Collaborate with other data scientists to divide work and ensure timely project completion.

- Meet deadlines for weekly/bi-weekly meetings and provide regular updates.

- Create data visualizations to communicate results to non-technical stakeholders.

- Testing and implementing NER models.

Huggingface and Related Technologies :

- Familiarity with huggingface packages.

Skills :

Programming and Data Science :

- Proficient in Python.

- Strong SQL skills.

- Experience with data cleaning and big data processing.

- Experience with OCR and NER models.

Cloud Technologies (AWS) :

- Extensive experience with AWS SageMaker, Bedrock, Lambda, S3, and API Gateway.

- Proficiency in using Textract API.

Machine Learning and AI :

- Experience with training and fine-tuning LLMs (Mistral, Llama, etc.).

- Proficiency in prompt tuning.

- Experience with computer vision models for object detection and segmentation.

DevOps and CI/CD :

- Experience with CI/CD pipelines and version control systems.

- Proficiency in using Jenkins.

Huggingface :

- Familiarity with huggingface packages.

Qualifications :

- 5+ years of experience as a Data Scientist.

- Bachelor's or Master's degree in Computer Science, Data Science, or a related field.

- Strong understanding of machine learning algorithms and techniques.

- Excellent problem-solving and analytical skills.

- Strong communication and collaboration skills.

- Ability to work independently and as part of a team.

Preferred Qualifications :

- Experience with healthcare data and HIPAA compliance.

- AWS certifications.

- Experience with advanced computer vision techniques.

info-icon

Did you find something suspicious?

Posted By

Monika

Hr Manager at InfoTrellis India Pvt Ltd

Last Login: NA as recruiter has posted this job through third party tool.

Job Views:  
147
Applications:  36
Recruiter Actions:  0

Posted in

AI/ML

Functional Area

Data Science

Job Code

1453004