GPU Accelerator Returns Debug Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 3 hours ago

GPU Accelerator Returns Debug Engineer

AMD is a leading company focused on building innovative products that enhance computing experiences. They are seeking an experienced GPU PCBA Debug and Failure Analysis Engineer to perform board level failure analysis on GPU Accelerators, collaborating with various engineering teams to identify and resolve product failures, thereby improving product quality and customer satisfaction.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
check
H1B Sponsor Likelynote
Hiring Manager
Bill S.
linkedin

Responsibilities

Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures for continuous yield & quality improvements, and customer quality support within expected timelines
Develop and execute DOE's that run targeted tests to reproduce and isolate hard to find failures
Develop Automation and tools to run tests and analyze results/logs
Perform triage and communicate with the contract manufacturer and/or internal AMD teams (such as Design, BIOS, firmware, memory, I/O, display, diagnostics, Test Engineering, Board operations, etc.) as needed to converge on failure reproduction efforts and root cause identification
Document all findings into FA database and create a complete failure analysis report for customer consumption as needed
Present findings to key stakeholders, including senior management
Implement ongoing continuous improvements of failure analysis process & techniques and create procedures of the steps to follow
Oversee the set-up of new products and test stations for Failure Analysis operations

Qualification

GPU architecturePCBA diagnosticsPythonHardware validationFailure analysisSystem integrationFirmware tuningHigh-speed digital designMS ExcelCommunication skillsLeadership skillsDocumentation skillsPresentation skills

Required

Experience in GPU PCBA Debug and Failure Analysis
Ability to perform board level (PCBA) failure analysis on customer and factory failures of GPU Accelerators
Experience in reproducing reported failures and isolating the cause of failure
Ability to work closely with cross-functional teams including design, validation, FW and manufacturing
Strong analytical mindset and hands-on approach to technical problem-solving
Ability to excel in both collaborative and independent environments
Demonstrated initiative, adaptability, and a drive to tackle new challenges in fast-paced settings
Experience in system integration and High Performance Computing
Ability to manage multiple tasks with limited supervision
Excellent communication skills for effective teamwork and documentation
Curiosity and persistence to deliver high-quality solutions through thorough failure analysis and repair
Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures
Develop and execute DOE's that run targeted tests to reproduce and isolate hard to find failures
Develop Automation and tools to run tests and analyze results/logs
Perform triage and communicate with the contract manufacturer and/or internal AMD teams
Document all findings into FA database and create a complete failure analysis report
Present findings to key stakeholders, including senior management
Implement ongoing continuous improvements of failure analysis process & techniques
Oversee the set-up of new products and test stations for Failure Analysis operations

Preferred

Deep expertise in GPU architecture, including debug, validation, and stress/functional test development
Skilled in using lab equipment (oscilloscopes, logic analyzers, custom test tools) for hardware validation
Strong background in PCBA diagnostics, failure analysis, and debug techniques, from NPI through production
Proficient in Python, shell scripting, and working across Windows and Linux environments
Solid understanding of firmware, drivers, and hardware interactions
Extensive experience in hardware verification and system integration
Familiarity with PCBA manufacturing processes and IPC-A-610 quality standards
Hands-on experience assembling, installing, and configuring computer systems and servers
Strong leadership, communication, documentation, and presentation skills
Able to read schematics, interpret datasheets, identify components, and perform soldering/rework for debug
Proficient in MS Excel for data analysis and reporting
Knowledge of high-speed digital design, memory interfaces (HBM, GDDR), PCIe, and display outputs (DP, HDMI)
Experience with GPU data center infrastructure and AI/ML technologies

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase