Dell Technologies - Senior Consultant - AI/MLOps Platform

Dell Technologies

8 - 12 Years

Bangalore

AIOps MLOps OpenShift Red Hat HPC Service Orchestration GPU Native Cloud IT Automation Python Monitoring Tools

Posted on: 23/04/2026

Job Description

Role Overview :

As a Consultant within the Dell AI & Data CoE, you will lead the architecture and implementation of large-scale AI platforms for our most strategic global customers. You are not just a builder; you are a technical visionary who helps clients navigate the complexities of the Dell AI Factory with NVIDIA and OSS.

You will bridge the gap between Dells world-class hardware (PowerEdge, PowerScale, PowerSwitch) and the advanced software orchestration layers (NVAIE, Kubernetes, Slurm) required to turn raw silicon into business value.

Key Responsibilities :

CLIENT ADVISORY & ARCHITECTURAL DESIGN :

- Lead technical workshops to design Sovereign AI and Private Cloud AI platforms using Dell Validated Designs (DVD).

- Act as a Subject Matter Expert (SME) on the integration of NVIDIA AI Enterprise (NVAIE) with Dell PowerEdge XE servers (H100/H200/B200).

- Develop high-level and low-level designs (HLD/LLD) that incorporate GPU/Network Operators and high-speed InfiniBand/RoCE fabrics.

ADVANCED ORCHESTRATION & HPC INTEGRATION :

- Deploy and optimize Red Hat OpenShift and upstream Kubernetes in air-gapped or hybrid-cloud enterprise environments.

- Implement advanced workload scheduling and fractional GPU slicing using Run :ai or Slurm to maximize client ROI on hardware.

- Guide customers in choosing and implementing the right orchestration layer (e.g., BCM for bare metal vs. Kubernetes for microservices).

MLOPS ECOSYSTEM DELIVERY :

- Architect end-to-end MLOps pipelines utilizing Kubeflow, MLflow, or ClearML to streamline the "data-to-model" lifecycle.

- Enable distributed training and fine-tuning (LLMs/GenAI) for clients using Ray and PyTorch on Dell infrastructure.

- Integrate Rafay for clients requiring decentralized or multi-cluster AI management across edge and core data centres.

PRACTICE DEVELOPMENT & THOUGHT LEADERSHIP :

- Contribute to the CoE by developing reusable IP, deployment playbooks, and automated Ansible/Helm/Terraform scripts.

- Mentor junior consultants and lead technical proof-of-concepts (PoCs) that demonstrate the performance of Dell-NVIDIA stacks.

Technical Requirements :

1. Infrastructure : Deep expertise in Dell PowerEdge (XE/R series), PowerScale, and PowerSwitch networking.

2. GPU Orchestration : Mastery of NVIDIA GPU Operator, Network Operator, and NVIDIA Base Command Manager (BCM).

3. Cloud-Native : Expert-level Kubernetes (CKA/CKS) or Red Hat OpenShift skills, including complex security, CNI (Cilium/Multus) and storage (CSI) configurations.

4. Workload Management : Experience with Run:ai, Slurm, or Altair PBS for high-concurrency AI environments.

5. ML Platforms : Hands-on experience with Kubeflow, MLflow, Ray, and ClearML.

6. Automation : Advanced Ansible, Helm, Terraform, and Python skills for "Infrastructure as Code" delivery.

Qualifications

1. Education : Bachelors or Masters degree in Computer Science, Software Engineering, or a related technical field.

2. Experience : 10+ years in professional services or consulting, with a heavy focus on AI, Big Data, or HPC infrastructure.

3. Communication : Exceptional client-facing e.g., ability to explain complex GPU-to-GPU communication (NVLink/NVSwitch) to C-level stakeholders.

4. Travel : Willingness to travel to client sites as needed to lead deployments.

5. Preferred Certifications : CKA or Red Hat Certified Specialist, NVIDIA Certified Associate/Professional, Dell PowerEdge/PowerScale Proven Professional.