cFocus Software Incorporated · 1 day ago
EOP - Site Reliability Engineer - TS/SCI Required
cFocus Software seeks a Site Reliability Engineer to join their program supporting the United States Secret Services (USSS). The role involves monitoring system health, resolving incidents, and automating operational tasks to ensure system resilience and performance.
ChatbotGovernmentInformation TechnologySoftware
Responsibilities
Monitor system health, availability, and performance using centralized monitoring and logging tools
Respond to, troubleshoot, and resolve incidents in production environments and provide root cause analysis
Conduct after-action reporting and post-incident reviews to improve system resilience
Automate repetitive operational tasks including deployments, monitoring, and incident response
Administer user accounts, access controls, and authentication mechanisms
Maintain and configure workflow templates, user fields, and application configurations
Maintain test environments that mirror production and support pre-deployment testing
Design and maintain backup, high availability (HA), and disaster recovery (DR) solutions
Develop and maintain incident response and disaster recovery plans for supported applications
Configure and support integrations with complementary enterprise systems
Architect, build, and maintain on-premise and cloud infrastructure supporting applications
Administer production, staging, and development environments
Manage system logs and monitor for security and operational events
Maintain and improve CI/CD pipelines and DevSecOps processes
Apply configuration management disciplines including patching, hardening, and documentation
Create and maintain dashboards, SLIs, SLOs, and service health metrics
Support operational readiness boards and weekly service reviews
Provide on-call support for outages, upgrades, and emergency maintenance as required
Support surge activities, including Presidential Transition-related data analysis if required
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or related technical field (or equivalent experience)
Minimum of 2 years of experience in systems engineering, DevOps, or Site Reliability Engineering roles
Strong proficiency with Linux/Unix operating systems
Experience with scripting and automation using Python, Bash, or similar languages
Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or equivalent
Experience supporting CI/CD tools such as GitLab, Jenkins, or ArgoCD
Experience with containerization and orchestration platforms including Docker and Kubernetes
Understanding of SRE principles including SLIs, SLOs, and error budgets
Strong troubleshooting, problem-solving, and documentation skills
Ability to obtain and maintain a TS/SCI clearance
Company
cFocus Software Incorporated
cFocus Software automates FedRAMP compliance and develops government chatbots for the Azure Government Cloud, Office 365, and SharePoint.