Principal Data Engineer - AI (REMOTE) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Upbound · 2 months ago

Principal Data Engineer - AI (REMOTE)

Upbound is redefining how modern infrastructure is built and is seeking an exceptional Principal Data Engineer to serve as the technical leader for data infrastructure supporting AI initiatives. The role involves architecting and developing sophisticated data platforms that power AI-driven features and creating data pipelines for training models to enhance control planes.

Cloud ComputingInformation ServicesInformation TechnologySoftware

Responsibilities

Define and drive the technical vision for data platforms that support AI-powered features in Crossplane and Upbound Spaces
Lead the design of data pipelines that transform infrastructure and data into training datasets for ML models
Architect vector search and RAG systems that leverage Crossplane Control Planes & Upbound Marketplace as a knowledge store
Build data infrastructure that processes resources, extensions, and compositions for semantic search
Establish frameworks for collecting, processing, and analyzing infrastructure configuration data
Design data pipelines that handle Crossplane-specific data
Create infrastructure for indexing and searching Upbound Marketplace content, documentation, and community patterns
Develop metrics and monitoring for AI features integrated with Upbound's control plane architecture
Design data systems that power AI agents for infrastructure provisioning & operations, helping users generate and optimize Crossplane compositions
Create feature engineering platforms that extract signals from control plane operations, resource status, and reconciliation patterns
Implement data infrastructure for training models that predict infrastructure failures, optimize resource allocation, and suggest configuration improvements
Drive the development of knowledge graph representations of infrastructure dependencies and relationships

Qualification

Data engineeringMachine learning infrastructureVector databasesSemantic searchKubernetesCloud-native architectureFeature storesData pipelinesInfrastructure-as-codeGraph databasesTime-series data processingKnowledge basesTechnical leadership

Required

10+ years of software/data engineering experience with at least 4 years in technical leadership roles
Proven track record building data platforms that support production systems at scale
Deep expertise in both traditional data engineering (Spark, Airflow, data lakes) and ML-specific infrastructure (feature stores, model serving)
Experience with vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector, Opensearch, ElasticSearch)
Demonstrated experience with LLM applications, including RAG architectures and semantic search implementations
Understanding of Kubernetes, cloud-native architectures, and infrastructure-as-code principles
Strong understanding of data requirements for AI/ML systems: training pipelines, feature stores, and inference infrastructure
Hands-on experience building knowledge bases and semantic search systems for technical documentation and code
Experience with embedding models for code and technical documentation
Knowledge of time-series data processing for infrastructure metrics and events
Understanding of graph databases and their application to infrastructure dependency modeling
Exceptional technical judgment with the ability to navigate both the AI and cloud-native landscapes
Demonstrate a positive attitude and foster an environment of experimentation and innovation
Strong ability to translate infrastructure management problems into data engineering solutions
Passion for making infrastructure management more intelligent and accessible through AI
Deep empathy for platform engineers and understanding of their operational challenges

Preferred

Have direct experience with Crossplane and Upbound products
Experience building AI features for developer tools or infrastructure platforms
Understanding of enterprise compliance requirements for infrastructure platforms
Knowledge of policy engines

Company

Upbound

twittertwittertwitter
company-logo
Upbound is an infrastructure management platform that runs, scales, and optimizes services across multiple cloud environments.

Funding

Current Stage
Growth Stage
Total Funding
$69M
Key Investors
Altimeter CapitalGoogle Ventures
2021-11-29Series B· $60M
2018-05-02Series A· $9M

Leadership Team

leader-logo
Bassam Tabbara
Founder and CEO
linkedin
leader-logo
Sarah Strobhar
Chief Revenue Officer (CRO)
linkedin
Company data provided by crunchbase