Cluster Networking & Observability Engineer - AI Inference Infrastructure

Nuvia

Nuvia

Software Engineering, Other Engineering, Data Science
Hyderabad, Telangana, India
Posted on Nov 3, 2025

Job Description

Job Posting Date

2025-11-03


Company:

Qualcomm India Private Limited

Job Area:

Engineering Group, Engineering Group > Software Engineering

General Summary:

We are seeking a **Cluster Networking & Observability Engineer** to specialize in high-performance networking and observability for AI inference clusters. This role ensures low-latency communication and robust telemetry systems.
##### **Key Responsibilities**
- Design and maintain RoCE/RDMA-based networking for AI clusters.
- Configure and troubleshoot datacenter network components.
- Implement and maintain telemetry systems using Prometheus and OpenTelemetry.
- Manage **Kubernetes and Slurm cluster networking aspects**.
- Develop automation for network configuration and monitoring.
##### **Required Qualifications**
- Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 3–5 years of experience in networking or HPC environments.
- Solid understanding of datacenter networking and RoCE/RDMA.
- Understanding of IPMI, SNMP and Hardware management protocols
- Experience with telemetry and observability tools (Prometheus, OpenTelemetry).
- Proficiency in **Python and Shell scripting**.
- Familiarity with Linux networking stack and performance tuning.
- Exposure to cloud platforms (AWS, Azure, GCP) and hybrid deployments.
- **Hands-on experience managing Kubernetes and Slurm clusters**.
- Strong software engineering background.

Design and maintain RoCE/RDMA-based networking for AI clusters.
- Configure and troubleshoot datacenter network components.
- Implement and maintain telemetry systems using Prometheus and OpenTelemetry.
- Manage **Kubernetes and Slurm cluster networking aspects**.
- Develop automation for network configuration and monitoring.

##### **Required Qualifications**
Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 3–5 years of experience in networking or HPC environments.
- Solid understanding of datacenter networking and RoCE/RDMA.
- Understanding of IPMI, SNMP and Hardware management protocols
- Experience with telemetry and observability tools (Prometheus, OpenTelemetry).
- Proficiency in **Python and Shell scripting**.
- Familiarity with Linux networking stack and performance tuning.
- Exposure to cloud platforms (AWS, Azure, GCP) and hybrid deployments.
- **Hands-on experience managing Kubernetes and Slurm clusters**.
- Strong software engineering background.

Minimum Qualifications:

• Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience.
OR
Master's degree in Engineering, Information Systems, Computer Science, or related field and 1+ year of Software Engineering or related work experience.
OR
PhD in Engineering, Information Systems, Computer Science, or related field.

• 2+ years of academic or work experience with Programming Language such as C, C++, Java, Python, etc.

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.

Applicants: Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).

Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.

To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.

If you would like more information about this role, please contact Qualcomm Careers.