CloudIngest · 12 hours ago
Sr. Director/AVP – Infrastructure & Cloud (IMT/Infra Leader)
CloudIngest is seeking a senior leader to own their infrastructure and cloud strategy, drive AI-first operations, and build next-generation automation and ITSM capabilities across their global delivery organization. The ideal candidate will combine deep hands-on experience in Cloud and infrastructure with strong leadership and stakeholder management skills.
Responsibilities
Define and own the infrastructure, cloud, and AIOps strategy aligned to overall business and product goals
Act as primary technology partner to senior client stakeholders, internal business leaders, and product teams; translate business outcomes into a clear technical roadmap
Build, mentor, and scale a high-performing global (onshore/offshore) team across cloud, SRE, infra, and automation disciplines
Drive an “AI-first” and “automation-first” culture in operations and service delivery, setting standards, playbooks, and best practices
Own the architecture, reliability, scalability, and security posture of cloud and on-premise infrastructure, with strong focus on AWS
Establish best practices for high-availability, DR, backup, capacity planning, performance, and observability
Define and enforce SLOs/SLAs, error budgets, and resilience practices in partnership with SRE and product teams
Ensure robust security, compliance, identity/access management, and governance across environments
Define and drive the roadmap for AIOps, Agentic Operations, and automation of infra/operations workflows (incident triage, root cause analysis, remediation, capacity management, change management)
Evaluate, select, and implement AI and AIOps platforms/tools (e.g., observability, log analytics, anomaly detection, predictive alerting, intelligent runbooks)
Lead the design of agentic workflows that use LLMs/AI agents to automate common operational tasks, knowledge retrieval, and ticket handling
Industrialize automation across infra, cloud, and ITSM (self-healing, auto-remediation, ChatOps, runbook automation, infrastructure-as-code)
Own the vision and roadmap for modern ITSM capabilities (ITIL aligned but automation-driven) across Incident, Problem, Change, Request, CMDB, and Knowledge Management
Integrate ITSM with monitoring, observability, AIOps, and collaboration tools to deliver end-to-end, automated service workflows
Provide architectural guidance and technical leadership to offshore engineering and operations teams building AI-enabled solutions on AWS
Review and approve solution architectures for AI/ML workloads (e.g., LLM integration, data pipelines, MLOps, vector databases, model hosting) on AWS
Establish standards, reference architectures, and reusable components so teams can rapidly build compliant, scalable AI solutions for clients
Coach and upskill team members on cloud-native AI services, infrastructure-as-code, DevOps, and SRE practices
Qualification
Required
12–18+ years in Infrastructure / Cloud / Operations roles, with at least 5–7 years in senior leadership (Director/Head/VP) capacity
Strong, hands‑on background in AWS (multi‑account strategy, networking, security, containerization, serverless, observability)
Demonstrated experience leading large distributed teams (including offshore/nearshore) in a software product or IT services/vendor environment
Proven track record of implementing AIOps/observability platforms and driving automation of operations at scale
Experience designing or transforming ITSM organizations and processes, with focus on automation and AI‑assisted workflows
Strong understanding of SRE principles, reliability engineering, and modern DevOps practices
Deep expertise in AWS cloud architecture, networking, security, and cost optimization
Strong knowledge of observability and AIOps tools (e.g., Datadog, New Relic, Dynatrace, Elastic, Splunk, PagerDuty, ServiceNow, or similar)
Practical experience with automation and IaC tools (Terraform, CloudFormation, Ansible, CI/CD pipelines)
Understanding of ITIL, ITSM platforms, and process design; ability to modernize and automate ITSM
Ability to define vision and roadmap, then translate into actionable plans for cross‑functional, global teams
Excellent stakeholder management, communication, negotiation, and executive presentation skills
Strong problem‑solving mindset, with bias towards simplification, automation, and measurable outcomes
Preferred
Exposure to building or integrating AI/ML solutions in production (LLMs, chatbots, agentic workflows, or predictive analytics)
Familiarity with AI/ML concepts, LLMs, vector stores, and MLOps practices; comfort working with data and AI engineers