Glydways · 21 hours ago
Distributed Systems Engineer - High-Availability Dispatch
Glydways is reimagining public transit to enhance mobility and accessibility. They are seeking a Senior Distributed Systems Engineer to lead the design and implementation of their Dispatch system, ensuring the reliability and robustness of their autonomous vehicle fleet.
Clean EnergyElectric VehicleManufacturingTransportation
Responsibilities
Design and implement state sharing and replication between multiple Dispatch instances (tickets, journeys, vehicle state, restrictions)
Build leader election and failover mechanisms (active/standby, hot/warm backup) that guarantee a single authoritative Dispatch at a time and clean handoff on failures
Harden Dispatch behavior for restart-safety and idempotency, ensuring retries, replays, and partial failures do not cause double assignment, inconsistent state, or unsafe conditions
Design and run stress, load, and fault-injection tests (including chaos experiments) to validate Dispatch behavior under high load, network issues, and process crashes
Improve system hardening and recovery flows, defining how Dispatch enters safe modes, recovers from faults, and resumes normal operation in a controlled way
Extend and tune observability for Dispatch (logs, metrics, traces, SLOs) so state divergence, failover events, and backlog issues are visible and diagnosable
Collaborate with autonomy, product, and ops teams to translate algorithmic and operational requirements into concrete guarantees around state, failover, and robustness
Participate in on-call and incident response for Dispatch, lead root-cause analysis for reliability issues, and drive long-term fixes into the application code and architecture
Qualification
Required
Proven experience designing and shipping stateful distributed services that stay correct under failures
Strong programming background in a systems language (C++ strongly preferred) and comfort working at the application layer (routing, tickets, vehicle state, safety envelopes)
Hands-on experience with leader election / primary–secondary patterns, active/standby or similar, and state replication / recovery (snapshots, event logs, replay, or equivalent)
Deep understanding of idempotent operations and message semantics (retries, duplicates, out-of-order messages) in networked, message-driven systems (TCP/UDP, gRPC, pub/sub, etc.)
Experience designing and running stress, load, soak, and fault-injection/chaos tests for distributed systems, and using their results to drive system hardening
Strong observability and incident-response skills: defining SLOs, instrumenting metrics/traces, debugging complex failure modes, and leading postmortems for stateful services
Preferred
Safety-critical or mission-critical mindset: familiarity with failure-mode analysis and designing for fail-safe / fail-operational behavior is a plus
Experience with cloud platforms is a plus, but this is not a pure DevOps or CI/CD role; candidates must have meaningful ownership of application-level behavior and state
Company
Glydways
Glydways designs and implements personal rapid transit systems using self-driving vehicles.
H1B Sponsorship
Glydways has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (5)
2023 (2)
Funding
Current Stage
Growth StageTotal Funding
$212.54M2025-09-29Series Unknown· $101.31M
2024-05-14Series B· $20M
2023-10-05Series B· $56M
Recent News
2025-11-14
2025-11-01
Company data provided by crunchbase