Denvr ยท 1 month ago
AI Platform Engineer
Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada, focused on providing foundational compute infrastructure and services for the AI ecosystem. The AI Platform Engineer will be responsible for designing, implementing, and operating AI compute architectures, collaborating with cross-functional teams to deliver exceptional customer products and experiences.
Artificial Intelligence (AI)Cloud ComputingCloud Data ServicesCloud InfrastructureGenerative AIMachine LearningNatural Language ProcessingPrivate Cloud
Responsibilities
Architect and optimize high-performance AI Platform solutions for AI training and inferencing, leveraging NVIDIA systems (H200/H200/A100/GH200) and distributed training optimizations (NCCL, RDMA/Infiniband)
Administer RKE Kubernetes clusters, including custom operator development (KOPF), CNIs (Kube-OVN), and KubeVirt, alongside managing traditional virtualization (VMware ESXi/vCenter) and bare-metal provisioning (Metal3, Ironic)
Perform advanced OS management (Ubuntu), including kernel parameter optimization and hardware-level troubleshooting on Supermicro/Dell platforms
Manage high-throughput network fabrics using BGP EVPN, SONiC, and leaf/spine topologies, while maintaining network security via firewalls, VPNs, internet gateways, and granular policy management
Deploy and maintain scalable, high-performance storage fabrics for data-intensive workloads using technologies such as WEKA, Ceph (Rook), Qumulo, and Dell PowerStore
Design and build critical backend APIs and microservices using Python (FastAPI, asyncio, Pydantic) or Golang, including the development of Kubernetes Operators and integration with relational/NoSQL databases
Drive infrastructure consistency and repeatability through Terraform, CloudFormation, and Ansible, integrated within robust CI/CD pipelines
Adherence to change/release management, incident/problem management, documentation standards, cross-team architectural reviews, post-sales L3 support, and customer-facing technical engagement for both public cloud and private platform deployments
Work cross-functionally with vendors, engineering, and platform operations to define requirements, document processes, and continuously improve platform reliability and performance
Support business development and customer success teams by providing clear technical guidance, translating complex concepts, and aligning solutions to customer requirements
Opportunities to meet directly with customers to design and review complex platform integrations, custom architectures, and workload-optimized AI solutions
Collaborate with vendors to evaluate and validate new GPU and ASIC hardware, firmware, and system architectures, providing feedback for integration and improvement
Provide L3 engineering support for advanced troubleshooting, root-cause analysis, and performance evaluation across compute, storage, networking, and AI systems
Stay up to date with industry trends, attend workshops, seminars, and conferences
Pursue relevant certifications and continuous learning in cloud, AI/ML infrastructure, networking, storage, and security domains
Engage in internal knowledge sharing through documentation, demos, tech talks, and mentorship of peers
Qualification
Required
Post secondary education in Computer Science, Engineering, Information Technology, or related technical discipline
3+yrs experience with AI/ML solutions engineering, cloud infrastructure, or a related field (preferred)
Background in software development, system design, or technical consulting is highly valued
Excellent written and verbal communication, with the ability to simplify and explain complex technical concepts
Strong customer empathy and discovery skills to uncover real needs and guide solution direction
Confident presenter who can engage both technical and non-technical audiences
Highly organized, able to manage multiple priorities, and comfortable shifting focus as business needs evolve
Creative problem-solver with a structured approach to diagnosing issues and designing solutions
Strong sense of ownership, accountability, and alignment with company vision and direction
Familiarity with AI industry trends, cloud and data center infrastructure, and secure, reliable operations at scale
Understanding of AI/ML workflows (training, multi-GPU/multi-node scaling, inferencing) and distributed storage fundamentals
General awareness of competitive landscape and emerging technologies in AI infrastructure and cloud services
Effective collaborator across cross-functional teams including Sales, Marketing, Product, and Engineering
Comfortable working in customer-facing technical roles where clarity, empathy, and responsiveness are critical
Strong analytical mindset for evaluating complex systems and diagnosing issues across compute, storage, and networking
Ability to design, articulate, and innovate technical solutions aligned with customer and business requirements
Company
Denvr
Denvr AI Platforms provide foundational AI services for the AI ecosystem and end users of AI, comprising of cloud-enabled services for inferencing, computing, data processing & storage, and software toolsets for the accelerated development, operations, adoption, and integration of AI technologies, delivered through the public Denvr AI Cloud, and also through Denvr AI Platform Services for private, fully dedicated, sovereign, and highly secure AI Services, including private platform infrastructure deployments that consist of advanced data centers, compute architectures, data processing & storage fabrics, with integrated platform operations software.
Funding
Current Stage
Growth StageRecent News
linkedin.com
2025-09-09
SiliconANGLE
2024-12-03
2024-11-20
Company data provided by crunchbase