Senior AI Performance Architect
MSCCN - San Diego, CA
Apply NowJob Description
Responsibilities:- Understand trends in ML network design through customer engagements and latest academic research and determine how this will affect both SW and HW design- Work with customers to determine hardware requirements for AI training systems```{=html}```- Analysis of current accelerator and GPU architectures- Architect enhancements required for efficient training of AI models- Design and architecture of:- Flexible Computational Blocks - Involving a variety of datatypes : floating point, fixed point, microscaling - Involving a variety of precision : 32/16/8/4/2/1 - Capable of optimally performing dense and sparse GEMM, GEMV- Memory Technology and subystems that are optimized for a range of requirements - Capacity - Bandwidth - Compute in Memory, Compute near memory- Scale-Out and Scale-Up Architectures - Switches, NoCs, Codesign with Communication Collectives- Optimized for Power- Ability to perform Competitive Analysis- Codesign HW with SW/GenAI (LLM) requirements- Define performance models to prove effectiveness of architecture proposals- Pre-Silicon prediction of performance for various ML training workloads- Perform analysis of performance/area/power trade-offs for future HW and SW ML algorithms including impact of SOC components (memory and bus impacts)Requirements:- Master's degree in Computer Science, Engineering, Information Systems, or related field- 3+ years Hardware Engineering experience defining architecture of GPUs or accelerators used for training of AI models- In-depth knowledge of nVidia/AMD GPU capabilities and architectures- Knowledge of LLM architectures and their HW requirementsPreferred Skills and Experience:- Knowledge of computer architecture, digital circuits and hardware simulators- Knowledge of communication protocols used in AI systems- Knowledge of Network-on-Chip (NoC) designs used in System-on-Chip (SoC) designs- Understanding of various memory technologies used in AI systems- Experience in modeling hardware and workloads in order to extract performance and power estimates- High-level hardware modeling experience preferred- Knowledge of AI Training systems such as NVIDIA DGX and NVL72- Experience training and finetuning LLMs using distributed training framework such as DeepSpeed, FSDP- Knowledge of front-end ML frameworks (i.e.,TensorFlow, PyTorch) used for training of ML models- Strong communication skills (written and verbal)- Detail-oriented with strong problem-solving, analytical and debugging skills- Demonstrated ability to learn, think and adapt in a fast-changing environment- Ability to code in C++ and Python- Knowledge of a variety of classes of ML models (i.e. CNN, RNN, etc)Minimum Qualifications:Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.ORMaster's degree in Computer Science, Engineering, Information Systems, or related field and 1+ year of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.ORPhD in Computer Science, Engineering, Information Systems, or related field.Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail[]{target=
Created: 2026-01-23