Performance Benchmarking Engineer - Cluster Networking and AI jobs in United States
cer-icon
Apply on Employer Site
company-logo

NetSuite · 2 weeks ago

Performance Benchmarking Engineer - Cluster Networking and AI

NetSuite, a world leader in cloud solutions, is seeking a Performance Benchmarking Engineer to join their team. The role focuses on conducting performance studies on GPU clusters, designing benchmarking solutions, and troubleshooting performance issues to enhance AI/ML workload performance.

Cloud ComputingComputerCRMiOSSaaSSoftware

Responsibilities

Carry out performance studies on GPU clusters with focus on AI/ML workload performance, network performance and tuning
Design and code solutions for performance benchmarking
Troubleshoot performance problems on RDMA clusters and perform cluster performance validation, including on very novel and not fully understood systems
Document new tools and procedures to a high standard
Write whitepapers to disseminate findings of performance studies
Participate in architecture design and review, code review, and contribute to roadmap development
Mentor junior engineers
Participate in operational rotations

Qualification

Performance benchmarkingRDMA NetworkingHPC/AI workloadsCodingScriptingNetworking backgroundLinux skillsProject leadershipCollaborationEffective communication

Required

BS or MS degree in CS or related engineering or science field with 5+ years of relevant experience
Experience with benchmarking and troubleshooting or optimizing performance of a system
Experience with coding, scripting, and automation
Background in Networking
General Linux skills
Demonstrated ability to lead complex projects, independently resolve ambiguity, collaborate with stakeholders across teams, and communicate effectively

Preferred

Experience working on clusters, e.g., running HPC/AI workloads, or maintaining an HPC/AI system
Experience troubleshooting or tuning performance on distributed systems
Familiarity with elements of the AI/HPC software stack such as job schedulers (e.g., Slurm); NCCL, RCCL, or MPI; or ML frameworks
Experience with RDMA Networking, i.e., RoCE or Infiniband
Experience architecting or developing solutions on a public cloud platform

Benefits

Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance

Company

NetSuite

company-logo
NetSuite is cloud computing company dedicated to delivering business applications over the internet.

Funding

Current Stage
Public Company
Total Funding
$157.79M
Key Investors
Meritech Capital PartnersTako VenturesStarVest Partners
2016-07-28Acquired
2007-12-20IPO
2007-02-05Secondary Market· $17.87M

Leadership Team

leader-logo
Brian Chess
SVP Technology and AI
linkedin
E
Eli Johnson
Vice President, Global Sales Productivity
linkedin
Company data provided by crunchbase