Principal Software Engineering - Performance
Company: Microsoft
Location: Redmond
Posted on: February 27, 2026
|
|
|
Job Description:
The Artificial Intelligence Cloud Inference team at Microsoft
develops AI software that enables running AI models everywhere,
from world’s fastest AI supercomputers, to servers, desktops,
mobile phones, IoT devices and internet browsers. We collaborate
with our hardware teams and partners, both internal and external,
and operate at the intersection of AI algorithmic innovation,
purpose-built AI hardware, systems, and software. We are a team of
highly capable and motivated people that pride themselves on a
collaborative and inclusive culture. We own inference performance
of OpenAI and other state of the art LLM models and work directly
with OpenAI on the models hosted on the Azure OpenAI service
serving some of the largest workloads on the planet with trillions
of inferences per day in major Microsoft products, including
Office, Windows, Bing, SQL Server, and Dynamics. As a Principal
Engineer on the team, you will have the opportunity to work on
multiple levels of the AI software stack, including the fundamental
abstractions, programming models, runtimes, libraries and APIs to
enable large scale training and inferencing of models. You will
benchmark OpenAI and other LLM models for performance on GPUs and
Microsoft HW, debug and optimize performance at all levels of
abstraction including kernel, model, algorithm and system level,
monitor performance and drive efficiencies that contribute to
achieving Microsoft Azures capex goals. This is a hands-on
technical role requiring software design and development skills.
We’re looking for someone who has a demonstrated history of solving
technical problems and is motivated to tackle the hardest problems
in building a full end-to-end AI stack. An entrepreneurial approach
and ability to take initiative and move fast are essential.
Microsoft’s mission is to empower every person and every
organization on the planet to achieve more. As employees we come
together with a growth mindset, innovate to empower others, and
collaborate to realize our shared goals. Each day we build on our
values of respect, integrity, and accountability to create a
culture of inclusion where everyone can thrive at work and beyond.
SF Bay Area preferred, remote locations considered for very strong
candidates. Responsibilities As a Principal Software Engineer on
the team the common tasks of the job would include, but not be
limited to: Identify and drive improvements to end-to-end inference
performance of OpenAI and other state of the art LLMs Measure,
benchmark performance on Nvidia/AMD GPUs and first party Microsoft
silicon Optimize and monitor performance of LLMs and build SW
tooling to enable insights into performance opportunities ranging
from the model level to the systems and silicon level, help reduce
the footprint of the computing fleet and achieve Azure AI capex
goals Enable fast time to market of LLMs/models and their
deployments at scale by building SW tools that afford velocity in
porting models on new Nvidia, AMD GPUs and Maia silicon Design,
implement, and test functions or components for our AI/DNN/LLM
frameworks and tools Speeding up/reducing complexity of key
components/pipelines to improve performance and/or efficiency of
our systems Communicate and collaborate with our partners both
internal and external Embody Microsofts Culture and Values
Qualifications Required Qualifications: Bachelors Degree in
Computer Science or related technical field AND 6 years technical
engineering experience with coding in languages including, but not
limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent
experience. Other Requirements Ability to meet Microsoft, customer
and/or government security screening requirements are required for
this role. These requirements include, but are not limited to the
following specialized security screenings: Microsoft Cloud
Background Check:This position will be required to pass the
Microsoft Cloud background check upon hire/transfer and every two
years thereafter. Preferred Qualifications: Bachelors Degree in
Computer Science or related technical field AND 8 years technical
engineering experience with coding in languages including, but not
limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent
experience. 4 years’ practical experience working on high
performance applications and performance debug and optimization on
CPUs/GPUs. Technical background and solid foundation in software
engineering principles, computer architecture, GPU architecture, HW
neural net acceleration. Experience in end-to-end performance
analysis and optimization of state of the art LLMs, HPC
applications including proficiency using GPU profiling tools.
Experience in DNN/LLM inference and experience in one or more DL
frameworks such as PyTorch, Tensorflow, or ONNX Runtime and
familiarity with CUDA, ROCm, Triton. Software Engineering IC5 - The
typical base pay range for this role across the U.S. is USD
$139,900 - $274,800 per year. There is a different range applicable
to specific work locations, within the San Francisco Bay area and
New York City metropolitan area, and the base pay range for this
role in those locations is USD $188,000 - $304,200 per year.
Keywords: Microsoft, Bellevue , Principal Software Engineering - Performance, IT / Software / Systems , Redmond, Washington