Infrastructure & Observability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Drawbridge Digital · 12 hours ago

Infrastructure & Observability Engineer

Drawbridge Digital is a veteran-owned company seeking an experienced Infrastructure & Observability Engineer to design and implement centralized monitoring, logging, and alerting systems across a hybrid environment. This role involves supporting day-to-day operations in a 24-hour production environment and contributing to strategic planning for future infrastructure initiatives.

AdvertisingAsset ManagementContentIT Management

Responsibilities

Architect and deploy centralized monitoring and log aggregation solutions across cloud and on-premises infrastructure
Design and implement alerting systems for critical infrastructure events, ensuring the right people are notified at the right time
Support day-to-day operations of infrastructure serving a 24/7 production environment, including troubleshooting, maintenance, and capacity management
Establish observability standards, dashboards, and runbooks to support operations and incident response
Analyze monitoring data to identify performance bottlenecks, inefficiencies, and opportunities for optimization
Contribute to long-term infrastructure planning, including capacity forecasting, technology roadmaps, and operational improvements
Create and maintain technical documentation, including system architecture diagrams, standard operating procedures, and emergency response playbooks
Partner with infrastructure, operations, and engineering teams to implement improvements based on observability insights
Drive continuous improvement initiatives that enhance system reliability, reduce costs, and improve performance
Evaluate and integrate tooling that fits our hybrid environment needs
Participate in an on-call rotation, including occasional overnight shifts, to respond to critical infrastructure incidents

Qualification

Monitoring platformsLinux server administrationNetworking fundamentalsAlerting frameworksTechnical writingCollaboration skillsCommunication skills

Required

3+ years of experience in infrastructure, site reliability, or systems engineering roles
Hands-on experience with monitoring and observability platforms (e.g., Prometheus, Grafana, Datadog, ELK stack, Splunk, or similar)
Strong understanding of networking fundamentals and server infrastructure in both cloud and physical datacenter environments
Experience with Linux server administration, Ceph storage clusters and highly available database clusters
Experience building alerting frameworks that balance signal quality with noise reduction
Demonstrated ability to translate monitoring insights into actionable infrastructure improvements
Strong technical writing skills—you'll be documenting systems, procedures, and emergency protocols
Strong collaboration and communication skills—you'll be working across teams to drive change
Comfortable working independently in a remote environment while collaborating effectively with distributed teams
Must reside in the greater NJ/NYC metropolitan area
Ability to commute to New Jersey 1–2 days per month for team meetings
Able to occasionally travel to customer locations to support on-site projects

Benefits

Health insurance
401(k)
On-call compensation

Company

Drawbridge Digital

twittertwittertwitter
company-logo
Drawbridge Digital builds and manages full spectrum content services from complex workflows to digital asset management and archive systems.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Jennifer Pottheiser
Founding Partner
linkedin
Company data provided by crunchbase