Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI jobs in United States
cer-icon
Apply on Employer Site
company-logo

Microsoft · 2 weeks ago

Senior Software Engineer - AI Infrastructure (Scheduler) - CoreAI

Microsoft is a leading technology company, and they are seeking a Senior Software Engineer - AI Infrastructure (Scheduler) to join their AI Platform organization. This role involves designing and developing core AI infrastructure services that support large-scale AI training and inferencing, with a focus on optimizing GPU and NPU capacity management.

Agentic AIApplication Performance ManagementArtificial Intelligence (AI)Business DevelopmentDevOpsInformation ServicesInformation TechnologyManagement Information SystemsNetwork SecuritySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Work on the design and development of the core AI Infrastructure distributed and in-cluster services that support large scale AI training and inferencing
Develop, test, and maintain control plane services written in C#, hosted on Service Fabric or Kubernetes (AKS) clusters
Enhance systems and applications to ensure high stability, efficiency and maintainability, low latency, tight cloud security
Provide operational support and DRI (on-call) responsibilities for the service
Develop and foster a deep understanding of the machine learning concepts, use cases, and relevant services used by our customers
Collaborate closely with service engineers, product managers, and internal applied research and data science teams within Microsoft to build better solutions together
Provide vision, expertise, and technical leadership to other team members
Help to grow talent in these areas
Embody our culture and values

Qualification

C#Distributed systemsCloud servicesAI infrastructureOOP proficiencyData structuresUnit testingPerformance engineeringKubernetesTechnical communicationTeam collaborationProblem-solving

Required

Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java, Scala, Rust, Go, TypeScript | OR equivalent experience

Preferred

Master's degree in Computer Science or a related technical field
OOP proficiency and practical familiarity with common code design patterns
3+ years of experience with large-scale services in a distributed environment, including concurrency management and stateful resource management
Hands-on experience with public cloud services at the IaaS level
Advanced knowledge of C# and .Net
Proficiency with use of complex data structures and algorithms, preferably in the setting of a resource allocator/scheduler, workflow/execution orchestration engine, database engine, or similar
Experience with managing the evolution of a large, complex codebase
Proficiency and thoroughness in unit testing and testability techniques
Knowledge of AI infrastructure, major use cases, and AI workload management
Demonstrated major design contributions and technical leadership
Excellent technical communication skills: verbal and written; product documentation experience
First-hand experience with building large-scale, multi-tenant global services with high availability
Experience with building and operating 'stateful' and critical control plane services; handling challenges with data size and data partitioning; advanced use of a NoSQL cloud database
Experience with mapping complex object models to relational and non-relational datastores
Dev-ops experience with microservices architecture in a complex infrastructure and operational environment
Service reliability and fundamentals engineering; instrumentation for KPIs or performance analysis; demonstrated service and code quality mindset
Performance engineering: work on scalability, profiling; CPU, memory and I/O use optimization techniques
Applied cryptography and compliant handling of customer data
Network security: endpoint protection, federated authentication, RBAC
Applied knowledge of Kubernetes: service model, workload packaging and deployment, programmatic extensibility (CRDs, operators); or equivalent knowledge of Service Fabric; experience with any service mesh
Server-side Windows programming and performance engineering
Data analytics skills, in particular with Kusto
Work in a geo-distributed team

Company

Microsoft

company-logo
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services.

H1B Sponsorship

Microsoft has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (9192)
2024 (9343)
2023 (7677)
2022 (11403)
2021 (7210)
2020 (7852)

Funding

Current Stage
Public Company
Total Funding
$1M
Key Investors
Technology Venture Investors
2022-12-09Post Ipo Equity
1986-03-13IPO
1981-09-01Series Unknown· $1M

Leadership Team

leader-logo
Satya Nadella
Chairman and CEO
linkedin
leader-logo
Vukani Mngxati
Chief Executive Officer - Microsft South Africa
linkedin
Company data provided by crunchbase