Cisco · 7 hours ago
Staff Machine Learning Engineering (Remote)
Cisco is revolutionizing how data and infrastructure connect and protect organizations in the AI era. They are seeking a Staff Machine Learning Engineer to lead the architecture for their AI Platform, design high-scale inference services, and provide mentorship to engineers.
Communications InfrastructureEnterprise SoftwareHardwareSoftware
Responsibilities
Lead the end-to-end architecture for key areas of the AI Platform: multi-tenant LLM serving (vLLM/Ray), routing and orchestration layers, VectorDB/RAG integration, and agentic/SDK surfaces used by product teams
Design and drive implementation of high-scale inference services, including parallelism strategies (TP/PP/EP/MoE), autoscaling policies, and cross-region capacity management for GPU/CPU workloads
Optimize latency, throughput, and cost for large-scale LLM and generative workloads using techniques such as batching, chunked prefills, caching, and mixed precision
Design and tune distributed inference configurations (TP/PP/EP/MoE), across multi-GPU and multi-node clusters and modern GPU architectures
Implement platform capabilities such as telemetry, metering & throttling, guardrails, and rollout/rollback to ensure AI services are safe, observable, and multi-tenant by default
Lead the design of GenAI application services—chat assistants, and automation APIs, grounded in robust RAG pipelines, agentic workflows (LangChain/LangGraph or similar), and MCP-based tool ecosystems
Drive operational excellence with runbooks, readiness checklists, CI/CD safeguards, on-call rotations, and post-incident improvements
Provide technical mentorship and leadership for senior and mid-level engineers: review designs, guide trade-offs around quality/latency/COGS, and help grow the next generation of tech leads
Collaborate closely with applied scientists to productionize new models and techniques, ensuring that research prototypes become robust, observable, and cost-efficient services
Qualification
Required
Bachelor's degree in computer science, Engineering, or equivalent practical experience
8+ years of hands-on experience building and operating backend or distributed systems in production or 5+ years of experience with a Master's degree, or 3+ years with a PhD
Proven track record as a technical lead for complex systems: driving architecture, aligning stakeholders, and delivering high-impact projects end-to-end
Strong proficiency in at least one modern programming language (e.g., Python, Go, or Java) and deep experience with software design, debugging, and performance tuning
Significant experience with cloud-native architectures (containers, Kubernetes, service discovery, configuration management, CI/CD) and building reliable microservices (REST/gRPC)
Demonstrated ownership of production services at scale, including on-call participation, incident response, and post-incident/RCAs that led to concrete improvements
Preferred
Hands-on experience running LLM or deep learning inference at scale using frameworks such as vLLM, TensorRT-LLM, Triton Inference Server, or similar
Deep understanding of GPU and distributed systems performance: latency/throughput trade-offs, pipelining, model parallelism (TP/PP/EP/MoE), mixed precision (BF16/FP8/nvFP4), and profiling tools
Experience designing and operating RAG systems and GenAI application layers: document ingestion, chunking/embedding strategies, metadata design, hybrid retrieval, context ranking, and evaluation of retrieval quality
Practical experience with agentic frameworks (LangChain, LangGraph, LlamaIndex, Semantic Kernel, or similar) and multi-agent coordination, including integration with MCP tools and internal/external APIs
Background building platform or Developer experiences capabilities—shared services, SDKs, templates, micro-frontends—that are adopted by multiple product teams
Familiarity with LangSmith or similar evaluation platforms, including experiment design, offline/online evals, hallucination/groundedness metrics, and feedback loops
Strong knowledge of AWS or Azure or GCP (EC2/VMs, IAM roles/ARNs/principals, VPC networking, security best practices) for AI workloads
Experience defining and monitoring dashboards, and alerts for high-availability systems using Prometheus, Grafana, or cloud-native tooling
Excellent communication and collaboration skills, comfortable influencing cross-functional partners and other senior engineers, and explaining trade-offs between quality, latency, and cost to both technical and non-technical audiences
Benefits
Medical, dental and vision insurance
A 401(k) plan with a Cisco matching contribution
Paid parental leave
Short and long-term disability coverage
Basic life insurance
10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
1 paid day off for employee’s birthday
Paid year-end holiday shutdown
4 paid days off for personal wellness determined by Cisco
16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees
Cisco’s flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations)
80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next
Additional paid time away may be requested to deal with critical or emergency issues for family members
Optional 10 paid days per full calendar year to volunteer
Eligible to earn annual bonuses subject to Cisco’s policies
Earn performance-based incentive pay on top of their base salary
Company
Cisco
Cisco develops, manufactures, and sells networking hardware, telecommunications equipment, and other technology services and products. It is a sub-organization of Cisco Press.
H1B Sponsorship
Cisco has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1238)
2024 (1231)
2023 (1273)
2022 (2127)
2021 (1991)
2020 (1173)
Funding
Current Stage
Public CompanyTotal Funding
unknown1990-02-13IPO
Leadership Team
Recent News
2026-02-05
Company data provided by crunchbase