Software Development Engineer, AI Platforms jobs in United States
cer-icon
Apply on Employer Site
company-logo

Advanced Microdevices Pvt. Ltd. (India) · 13 hours ago

Software Development Engineer, AI Platforms

Advanced Micro Devices, Inc is a company focused on building innovative products that enhance computing experiences across various domains. They are seeking a Principal Software Development Engineer for their AI Platforms team, responsible for optimizing Generative AI training and inference at scale, while collaborating with talented specialists to innovate and implement efficient architectures for AI models.

BiotechnologyIndustrialPharmaceuticalManufacturingBiopharma
badNo H1Bnote

Responsibilities

Propose and apply innovative techniques to support both training and inferencing including innovative communication architectures, parallelism strategies to train on large clusters
Implement novel efficient architectures for Generative AI models for training and inference and showcase benefits on AMD
Work with open-source framework and community (e.g., PyTorch, SGLang, Hugging Face) to integrate AMD optimized models, libraries and publish training recipes
Collaborate with software and hardware team to E2E co-optimize performance on current and future AMD solutions
Publish and promote your work within AMD and at external venues

Qualification

Generative AIDistributed trainingCommunication middlewareDeep learning frameworksPerformance optimizationSlurmKubernetesCommunication skillsTeam collaborationPresentation skills

Required

Deep technical understanding of image/video generation system
LLM parallelism
Distributed inference framework
Hands-on experience with communication middleware, e.g., NCCL / RCCL, MPI and RoCE v2
Experience training models at scale
Passionate about innovating efficient approaches to enable distributed training and inference at scale on AMD devices
Propose and apply innovative techniques to support both training and inferencing including innovative communication architectures, parallelism strategies to train on large clusters
Implement novel efficient architectures for Generative AI models for training and inference and showcase benefits on AMD
Work with open-source framework and community (e.g., PyTorch, SGLang, Hugging Face) to integrate AMD optimized models, libraries and publish training recipes
Collaborate with software and hardware team to E2E co-optimize performance on current and future AMD solutions
Publish and promote your work within AMD and at external venues
PhD or master's degree with major in Computer Science Engineering, Electrical Engineering, Electronics Engineering, Mathematics, or a related field

Preferred

Strong technical expertise in communication middleware (e.g. NCCL/RCCL and MPI)
Familiarity working with deep learning frameworks (e.g. Pytorch)
Strong technical expertise in benchmarking and performance optimization of distributed training and inference systems
Expertise/publications in one of the areas preferred - efficient model architectures, optimized training, innovative parallelism strategies or communication framework
Experience in Slurm and Kubernetes for managing the training and inference jobs over a cluster
Excellent written, verbal, and presentation skills, ability to coordinate internally and externally
Several years of experience in AI, deep learning and related software development

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

twittertwittertwitter
company-logo
Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Nalini Kant Gupta
Founder & Managing Director
Company data provided by crunchbase