Apply on Employer Site

Wells Fargo · 2 hours ago

Lead Software Engineer - Gen AI Inferencing Services and Agentic AI

CONCORD, NC

Full-time

Hybrid

Senior Level, Lead/Staff

5+ years exp

Wells Fargo is seeking a Lead Software Engineer — LLM Inferencing & Agentic AI within Digital Technology’s AI Capability Engineering organization. In this role, you will design, build, and operate the GenAI Platform’s GPU infrastructure and LLM/SLM serving systems, ensuring highly performant, reliable, and secure model inferencing at scale.

BankingFinancial ServicesFinTechInsurancePayments

No H1B

Responsibilities

Lead complex technology initiatives including those that are companywide with broad impact

Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines

Design, code, test, debug, and document for projects and programs

Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors

Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives

Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals

Lead projects, teams, or serve as a peer mentor

Engineer GPU clusters and node pools; configure NVLink/NVSwitch, NVIDIA GPU Operator, MIG profiles, container runtime, and kernel/driver baselines for high‑throughput LLM/SLM workloads

Design and implement OpenAI‑compatible APIs (Responses, Interactions) behind the AI Gateway: define OpenAPI contracts, authN/Z (OAuth2/mTLS), rate limits/quotas, SLAs, versioning/deprecation, and SDK generation

Build and support MCP servers and tool adapters; manage agent/tool identity and capability metadata; integrate with agent registries and execution flows

Develop Agentic AI capabilities (tools/agents/workflows) including disaggregated prefill/decode patterns; contribute to runbooks, guardrails, and safe tool usage

Build UI surfaces (developer/ops consoles) for endpoint onboarding, prompt testing, evaluations, observability dashboards, and incident response workflows

Apply prompt engineering and evaluation best practices; create golden test suites, regression harnesses, and measurable SLO‑aligned criteria for production promotion

Qualification

GenAI engineeringLLM/SLM operationsGPU infrastructureOpenAI-compatible APIsPythonMCP serversUI developmentPrompt engineeringTeam leadershipCollaborationProblem-solving

Required

5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Preferred

5+ years of experience in Python for backend/services development, packaging, instrumentation, and automation

5+ years of experience building modern web UI for developer/ops workflows, including dashboards, wizards, and prompt/eval tooling, with strong testing and accessibility practices

1+ years of experience building MCP servers, tool adapters, and agent workflows, with an understanding of agent identity, permissions, and governance metadata

2+ years of experience in GenAI engineering, including LLM/SLM operations, fine‑tuning/evaluation, per‑model performance recipes, and prompt engineering and evaluation harnesses

1+ years of experience with LLM API exposure, including AI Gateway — OAuth2/mTLS, rate limits/quotas, OpenAPI/SDKs, SLAs, versioning/deprecation, and OpenAI‑compatible API design for responses and interactions

1+ years of experience with serving large language models (LLM/SLM), including vLLM, Triton, TensorRT‑LLM/MII, KV cache strategies, FP8/INT4 AWQ/GPTQ, and certified disaggregated prefill/decode

1+ years of experience with orchestration tools for GPU workload management, such as Run:AI (Collections/queues, quotas, preemption, fair share), OpenShift AI (RHOAI), and OCP/GKE administration

1+ years of experience with GPU Inference Layer, including NVIDIA and CUDA technologies such as CUDA, cuDNN, NVLink/NVSwitch, MIG, NIXL, GPU profiling, and H100/H200 performance tuning

Company

Wells Fargo

Glassdoor3.6

Wells Fargo & Company is a financial services firm that provides banking, insurance, investments, and mortgage services.

Founded in 1852

San Francisco, California, USA

10001+ employees

http://www.wellsfargo.com

Funding

Current Stage

Public Company

Total Funding

unknown

1978-10-06IPO

Leadership Team

Charlie Scharf

CEO

Fernando Rivas

CEO of Corporate & Investment Banking

Recent News

The Real Deal

Kolter pays $26M for condemned Miami Beach hotel as redevelopment play

2026-01-23

Morningstar.com

Wells Fargo Currently Down Seven Consecutive Days, on Pace for Longest Losing Streak Since January 2024 — Data Talk

2026-01-22

Business – Latest Financial & Stock Market News | New York Post

Wells Fargo moves wealth-management unit to Palm Beach, joining Florida rush

2026-01-22

Company data provided by crunchbase