Cisco · 7 hours ago
AI Infrastructure Site Reliability Engineer (remote USA)
Cisco is at the forefront of integrating artificial intelligence into their platforms, transforming collaboration, security, networking, and more. They are seeking an AI Infrastructure Site Reliability Engineer to leverage SRE practices, automate operational capabilities, and ensure the availability and efficiency of AI platforms. The role involves working with top AI experts to contribute to ethical AI products and solutions.
Telecom & CommunicationsEnterprise SoftwareHardwareSoftwareCommunications Infrastructure
Responsibilities
Leverage SRE practices to reduce toil and maintain Service Level Objectives (SLOs) for internal AI platforms
Lead, build, and run fully automated pipelines through CI/CD systems for operational excellence and continuous improvements
Ensure the availability, scalability, latency, and efficiency of NVIDIA DGX and Cisco-UCS infrastructure using fault-tolerant engineering approaches
Drive capacity planning, performance analysis, instrumentation, and other non-functional requirements
Automate operational capabilities using Python, Ansible, Terraform, Go, and related technologies
Deliver automation through CI/CD pipelines and chatbot integrations
Implement metrics-driven processes to maintain high service quality
Qualification
Required
Bachelor's degree in Computer Science, Information Technology, or a related field; or equivalent years of IT experience
5+ years Experience deploying and administering NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g., Cray, HPE, IBM)
5+ years coordinating and supporting Linux-based operating systems
5+ years Proficiency in programming languages such as Python, Go, C/C++; experience with Git and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins)
5+ years experience deploying enterprise-grade Kubernetes clusters (RedHat OpenShift preferred) and/or Google Anthos
Advanced knowledge of Kubernetes, Docker, Terraform, Ansible, Jenkins, GitOps, Git, and Linux
5+ years Experience with the software development lifecycle: design, development, testing, packaging, and deployment (preferably using Python or Go)
Preferred
Master's degree or equivalent experience in a relevant field
Certifications in Linux, networking, cloud, or related technologies
Previous experience as a compute or site/systems reliability engineer
Experience with hybrid cloud, virtualization, and container technologies
Familiarity with Agile and DevOps operating models, including project tracking tools (e.g., Jira, Rally)
Excellent collaboration, leadership, and communication skills
Benefits
Medical, dental and vision insurance
A 401(k) plan with a Cisco matching contribution
Paid parental leave
Short and long-term disability coverage
Basic life insurance
10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
1 paid day off for employee’s birthday
Paid year-end holiday shutdown
4 paid days off for personal wellness determined by Cisco
16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees
Flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use
80 hours of sick time off provided on hire date and each January 1st thereafter
Up to 80 hours of unused sick time carried forward from one calendar year to the next
Additional paid time away may be requested to deal with critical or emergency issues for family members
Optional 10 paid days per full calendar year to volunteer
Annual bonuses subject to Cisco’s policies
Company
Cisco
Cisco develops, manufactures, and sells networking hardware, telecommunications equipment, and other technology services and products. It is a sub-organization of Cisco Press.
H1B Sponsorship
Cisco has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1238)
2024 (1231)
2023 (1273)
2022 (2127)
2021 (1991)
2020 (1173)
Funding
Current Stage
Public CompanyTotal Funding
unknown1990-02-13IPO
Leadership Team
Recent News
2026-02-12
Company data provided by crunchbase