Company:
Qualcomm India Private Limited
Job Area:
Engineering Group, Engineering Group > Systems Engineering
General Summary:
You will be part of System Performance team that is responsible for profiling and optimizing the System Performance on Snapdragon Data center chipsets. This role will require a strong knowledge of AI/ML performance using NPU cores. The knowledge of CPU, DDR, NoCs will be an added advantage.
Skills Required:
-
NPU: Understanding of NPU micro-architecture, including its specialized processing elements for operations like matrix multiplications and convolutions.
A strong grasp of deep learning fundamentals, including neural network architectures like Convolutional Neural Networks (CNNs), transformers, GenAI etc.
Understanding the distinction between the compute-intensive training of a model (often done on GPUs) and running inference (making predictions) on NPUs.
Knowledge of Model optimization and compression techniques (Quantization, Pruning, Distillation etc.)
CPU caches (L1, L2, L3), instruction pipelines, branch prediction, memory hierarchy (including register, cache, and main memory) and multi-core/multi-threaded processing. You need to understand how memory access, instruction dependencies, and contention for shared resources can impact performance.
DDR: Understanding of JEDEC specifications LPDDR4, LPDDR5, LPDDR6, including command and timing parameters (e.g., 𝑡𝐶𝐿, 𝑡𝑅𝐶𝐷, tRP), memory organization (rows, columns, banks), and basic view of training and initialization sequences, how a memory controller works and its specific features like command queue, port arbitration, and various control schemes.
• Familiarity with frameworks like TensorFlow Lite, PyTorch Mobile, and ONNX Runtime for preparing models for deployment. Experience with NPU-specific compilers, such as those from Qualcomm (QNN), Arm (Vela), Intel (OpenVINO), to optimize and orchestrate AI workloads.
• Evaluating hardware and software to determine the best fit for specific AI tasks and measure performance metrics like TOPS/W (trillions of operations per second per watt).
• Using tools like the Windows Performance Toolkit and Qualcomm's Snapdragon Profiler to analyze and optimize NPU and system-level performance
• A core understanding of how to use parallel processing architectures for efficient AI computations.
• Expertise in how operating systems manage processes, threads, memory, MMU, and interrupt handling. This knowledge is crucial to understanding software for the kernel scheduler and system-level bottlenecks.
• Good understanding of Benchmarks CPU (like GeekBench, SpecInt, CoreMark etc) and DDR (like lat_mem, stream, bw_mem etc.) and how they exercise the underlying CPU/GPU/DDR architecture.
Experience with a variety of performance monitoring tools like Intel VTune, Linux perf, and Utilities like top, vmstat, iostat, and netstat to monitor system resources like CPU, memory, and I/O. Experience with software tools to monitor system hotspots, command bus utilization, and identify memory traffic patterns is critical. This includes validating that the traffic generated by software is as expected.
• Good understanding of memory allocation policies, prefetching, and caching to minimize latency and maximize bandwidth. Understanding how an application accesses memory is vital. Skills in profiling code to analyze memory access patterns and then optimizing the code for better data locality.
• Analyzing large datasets from performance tests requires strong statistical skills. This can involve creating histograms of transaction latencies and deriving performance metrics to understand a system's behavior.
Responsibilities include:
• Drive Performance analysis on silicon using various System and Cores (i.e. NPU, AI/ML, CPU, Memory) benchmarks like Dhrystone, GeekBench, SPECInt, CNN/GenAI ML networks etc.
• Use of Performance tools to analyze the load patterns across IPs and identify any performance bottlenecks in system.
• Analyzing Perf KPIs of SoC subsystems like NPU, CPU, Memory, and corelate performance with projection
• Evaluate and characterize performance at various junction temperatures and optimize running at high ambient temperatures.
• Analyze and optimize the System performance parameters of SoC infrastructure like NoC, LP5 DDR, etc.
• Collaborate with cross-functional global teams to plan and execute performance activities on chipsets as well as make recommendations for next generation chipsets.
Minimum Qualifications:
6+ years of industry experience in the following:
• Experience working on any ARM/x86 based platforms, mobile/automotive/Data center operating systems and/or performance profiling tools.
• Experience in application or driver development in Linux\QNX and ability to create/customize make files with various compiler options is a plus.
• Must be quick learner and should be able to adapt to new technologies.
• Must have excellent communication skills.
Preferred Qualifications:
Additional skills in the following areas are preferred:
• Knowledge of Computer architecture, LP5 DDR, Bus/NOC profiling is a big plus.
• Fundamentals on any operating system like Linux/QNX/Hypervisor & experience working on any Data center applications.
• Experience in creating professional quality reports and slides using MS Office or any advanced visualization tools.
• Experience in PoC development and competitive analysis Knowledge on Voltage/Power/ Thermal domain is plus.
Education Requirements:
• Required: Bachelor's, Computer Engineering, and/or Electrical Engineering
• Preferred: Master's, Computer Engineering, and/or Electrical Engineering
Keywords:
• Architecture Performance, System Performance, Data Center, Benchmarks, Thermal Performance, Operating systems, Linux, QNX, Hypervisor, Application Development
Minimum Qualifications:
• Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Systems Engineering or related work experience.
OR
Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Systems Engineering or related work experience.
OR
PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Systems Engineering or related work experience.
Applicants: Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).
Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.
To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.
If you would like more information about this role, please contact Qualcomm Careers.