Advanced Microdevices Pvt. Ltd. (India) · 13 hours ago
Product Application Engineer - Data Center Deployment
Advanced Micro Devices, Inc is dedicated to building innovative products that enhance next-generation computing experiences. The Product Application Engineer role focuses on executing data center cluster projects, providing technical guidance, and collaborating with customers on large scale GPU deployments.
BiopharmaBiotechnologyIndustrialManufacturing
Responsibilities
Develop a strong understanding of the client’s business to assist with ensuring an impactful and effective task completion in bringup and validation of CSP and customer clusters
Provide technical guidance and support at an advisory level to customers for server clusters, focused on large scale GPU deployments
Build out datacenter GPU cluster environments for customer testing and deployment
Assist development teams in identifying and resolving hardware/software technical issues throughout the cluster lifecycle, from initial bring-up to entering service for running workloads
Provide technical guidance to internal teams based on customer feedback
Qualify and assess new cluster automation software functionality to ensure compatibility with customer requirements and datacenters
Resolve technical issues for customers utilizing AMD Instinct™ server products in clusters
Mentor junior members of the technical staff
Follow procedures to communicate, report, and escalate incidents to AMD management
Collaborate with program managers to maintain project schedules, track action items, ensure deliverables are met, and provide project status updates to customers and AMD management
Qualification
Required
Exceptional skills in AI GPU hardware, software, systems management, and networking, especially with high-speed data fabrics
Strong communication skills, with the ability to tactfully interface with both technical and program management resources at CSP and customer sites
Highly analytical, detail-oriented, self-motivated, and maintain a positive, results-driven attitude
Ability to work closely with customers in an advisory role to provide guidance during large scale cluster bringup and validation
Experience as a data center systems engineer, site reliability engineer lead, or platform engineer with experience in large scale system bringup
Strong customer focus
Self-motivated and capable of working effectively within a team environment
Ability to communicate concisely at all levels within an organization
Bachelor's degree in Computer or Electrical Engineering
Preferred
Data center customer support hands-on management roles during cluster system bringup
Advisory PM and technical roles in large-scale data center cluster bringup
Data center customer support tool skills using automation tools and frameworks such as Ansible, bash, Python and others
Bringup of data center servers and racks, server architecture and functionality, including remote management via BMC, network topologies, and graphics software/hardware subsystems
Linux installation, setup, usage, tuning, and debugging
Virtual environments (e.g., VMWare, Citrix, KVM, Microsoft) and virtual machine setup/management
Familiarity with datacenter GPU software stacks such as AMD ROCm™ or Nvidia CUDA
Familiarity with distributed network libraries (e.g., NCCL/RCCL, MPI) with GPU accelerators in distributed memory systems and high-speed network protocols/topologies
Strong skill in high-performance fabrics for HPC and AI, such as RDMA/RoCE and InfiniBand
Some familiarity with AI and machine learning workloads, frameworks, and models
Strong debugging, problem-solving, and analysis skills
Strong verbal and written communication skills for conveying technical information
Self-starter with attention to detail, organizational skills, and the ability to multitask in a fast-paced environment
Master's degree preferred
Benefits
AMD benefits at a glance.
Company
Advanced Microdevices Pvt. Ltd. (India)
Advanced Microdevices (mdi) is a leader in innovative membrane technologies.
Funding
Current Stage
Late StageLeadership Team
Nalini Kant Gupta
Founder & Managing Director
Recent News
2024-10-18
2024-10-16
Company data provided by crunchbase