Staff SRE (Site Reliability Engineer) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Gigster · 2 months ago

Staff SRE (Site Reliability Engineer)

Gigster is a dynamic company that connects top-tier IT engineers with exciting projects in software development and cloud services. They are seeking a highly skilled Staff Site Reliability Engineer to ensure the reliability, scalability, and performance of critical systems, while collaborating with teams to drive infrastructure improvements and automation initiatives.

AnalyticsAppsSaaSSoftware
check
Diversity & Inclusion
check
H1B Sponsor Likelynote

Responsibilities

Design, build, and maintain scalable and reliable infrastructure
Collaborate with engineering teams to ensure systems are designed with reliability and scalability in mind
Evaluate and integrate new technologies to enhance our infrastructure
Implement and maintain monitoring and alerting systems to detect and respond to issues promptly
Lead incident response efforts, ensuring quick resolution and effective communication
Conduct post-incident reviews and drive improvements based on findings
Architect & Build innovative automation projects (preferably in Python/GoLang) from scratch to help reduce day-to-day SRE toil
Create Bash scripts to automate manual activities like upgrades, status checks, and deployment
Develop and maintain infrastructure as code (IaC) using tools such as Terraform, Ansible, or similar
Automate repetitive tasks and processes to improve efficiency and reduce manual intervention
Collaborate with cross-functional teams to deliver high-quality products and services
Mentor and guide junior SREs and other team members
Advocate for best practices in reliability engineering across the organization
Drive initiatives to improve service reliability, capacity, and performance
Participate in capacity planning and disaster recovery exercises
Stay current with industry trends and emerging technologies

Qualification

System DesignArchitectureMonitoringIncident ManagementAutomationOptimizationCloud PlatformsLinux/Unix SystemsProgramming LanguagesBash ScriptingContainer OrchestrationMonitoring ToolsCI/CD PipelinesProblem-solvingCommunication SkillsCollaboration SkillsLeadership Abilities

Required

Design, build, and maintain scalable and reliable infrastructure
Collaborate with engineering teams to ensure systems are designed with reliability and scalability in mind
Evaluate and integrate new technologies to enhance our infrastructure
Implement and maintain monitoring and alerting systems to detect and respond to issues promptly
Lead incident response efforts, ensuring quick resolution and effective communication
Conduct post-incident reviews and drive improvements based on findings
Architect & Build innovative automation projects (preferably in Python/GoLang) from scratch to help reduce day-to-day SRE toil
Create Bash scripts to automate manual activities like upgrades, status checks, and deployment
Develop and maintain infrastructure as code (IaC) using tools such as Terraform, Ansible, or similar
Automate repetitive tasks and processes to improve efficiency and reduce manual intervention
Collaborate with cross-functional teams to deliver high-quality products and services
Mentor and guide junior SREs and other team members
Advocate for best practices in reliability engineering across the organization
Drive initiatives to improve service reliability, capacity, and performance
Participate in capacity planning and disaster recovery exercises
Stay current with industry trends and emerging technologies
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
8+ years of minimum experience in the industry as a Software Engineer, SRE, or Platform Engineer
Minimum 3+ years of experience as a Platform Engineer or SRE
Proven experience in managing large-scale, mission-critical infrastructure
Deep understanding of Linux/Unix systems and networking
Proficiency in at least one or more programming languages (e.g., Python, Go, Java)
Intermediate to Expert level skill in bash scripting
Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Docker, Kubernetes)
Strong knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI)
Excellent problem-solving skills and a proactive attitude
Strong communication and collaboration skills
Ability to work independently and as part of a team
Demonstrated leadership and mentoring abilities
Candidates must be able to work during Pacific time hours 8am - 5pm PST, open to on-call rotation

Company

Gigster is the first team intelligence engine, enabling software development teams to achieve 30% higher efficiency.

H1B Sponsorship

Gigster has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2020 (1)

Funding

Current Stage
Growth Stage
Total Funding
$32.62M
Key Investors
RedpointAndreessen Horowitz
2024-03-26Acquired
2017-08-29Series B· $20M
2015-12-07Series A· $10M

Leadership Team

leader-logo
Andy Tryba
CEO
linkedin
leader-logo
Cory Hymel
Head of Academic Partnerships
linkedin
Company data provided by crunchbase