Platform Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Quantix, Inc. ยท 3 weeks ago

Platform Engineer

Quantix, Inc. is seeking a Platform Engineer to enhance their infrastructure reliability and observability. The role involves ensuring a 99.5% uptime SLA for production customers while optimizing costs and maintaining security compliance.

ConsultingInformation TechnologyStaffing Agency
check
Senior Management
Hiring Manager
Elias Cobb
linkedin

Responsibilities

Own infrastructure reliability, observability, and cost optimization to support 9 production customers with 99.5% uptime SLA
99.5% uptime SLA enforcement across all services
Multi-region deployment for geographic redundancy (Premium tier)
Automated failover: database replicas, load balancer health checks
Disaster recovery: automated backups, point-in-time recovery (7-day window)
Incident response: 15-minute detection SLA, 2-hour resolution for P0 issues
Real-time dashboards: uptime, latency, error rates per customer
Application metrics: API response times, invoice processing speed, validation accuracy
Infrastructure metrics: CPU, memory, database performance, queue health
Cost tracking: AWS spend per customer, AI API costs, storage costs
Alerting: PagerDuty integration, on-call rotation, escalation policies
Log management: SIEM tools (Splunk/Sumo Logic), 90-day retention
Vulnerability scanning: Qualys/Tenable continuous scanning
Penetration testing: annual external security audits
SOC 2 Type II preparation: 6-month observation period (Q2 2026)
DLP implementation: automated PHI/PII scanning in documents
Infrastructure cost per customer: <$8,500/month target
Right-size compute instances based on usage patterns
Implement auto-scaling for variable workloads
Optimize storage: lifecycle policies, data archiving
Monitor AI API costs: model selection, caching strategies

Qualification

AWS architectureInfrastructure-as-CodeKubernetesObservability toolsSecurity complianceOn-call experienceCost optimization

Required

99.5% uptime SLA enforcement across all services
Multi-region deployment for geographic redundancy (Premium tier)
Automated failover: database replicas, load balancer health checks
Disaster recovery: automated backups, point-in-time recovery (7-day window)
Incident response: 15-minute detection SLA, 2-hour resolution for P0 issues
Real-time dashboards: uptime, latency, error rates per customer
Application metrics: API response times, invoice processing speed, validation accuracy
Infrastructure metrics: CPU, memory, database performance, queue health
Cost tracking: AWS spend per customer, AI API costs, storage costs
Alerting: PagerDuty integration, on-call rotation, escalation policies
Log management: SIEM tools (Splunk/Sumo Logic), 90-day retention
Vulnerability scanning: Qualys/Tenable continuous scanning
Penetration testing: annual external security audits
SOC 2 Type II preparation: 6-month observation period (Q2 2026)
DLP implementation: automated PHI/PII scanning in documents
Infrastructure cost per customer: <$8,500/month target
Right-size compute instances based on usage patterns
Implement auto-scaling for variable workloads
Optimize storage: lifecycle policies, data archiving
Monitor AI API costs: model selection, caching strategies
Cloud expertise : AWS architecture (ECS, EKS, RDS, Lambda, S3, CloudFront)
Infrastructure-as-Code : Terraform or CloudFormation mastery
Container orchestration : Kubernetes production experience
Observability tools : Datadog, New Relic, Prometheus, Grafana
Security : SIEM, DLP, vulnerability management, SOC 2 compliance
On-call experience : Incident response, postmortem processes
Cost optimization : Cloud financial management, FinOps practices

Company

Quantix, Inc.

twittertwittertwitter
company-logo
*We at Quantix prioritize your privacy.

Funding

Current Stage
Growth Stage
Total Funding
unknown
2025-05-08Acquired

Leadership Team

leader-logo
Michael Haase
Co-Owner | CEO
linkedin
Company data provided by crunchbase