Inside Higher Ed · 1 day ago
InfraOps Reliability Administrator
Inside Higher Ed is seeking an InfraOps Reliability Administrator to join FSU’s Department of Information Technology Services. The role involves designing, building, and managing infrastructure and servers to support IT teams and users, with a strong focus on automation, reliability, and security in a hybrid cloud environment.
Digital MediaEducationHigher EducationJournalismRecruiting
Responsibilities
Design, build, automate, and optimize infrastructure using modern tools and site reliability engineering practices
Manage primarily Windows servers in a hybrid cloud environment, with a focus on reliability, observability, security, and continuous improvement
Collaborate across teams and leverage automation, scripting, data-informed decision-making, and self-directed professional development to deliver secure, scalable, and customer-focused solutions
Use tools such as Terraform, Azure DevOps, Visual Studio Code, and scripting languages like PowerShell and Bash to manage infrastructure as code (IaC) and configuration as code (CaC), ensuring consistency, repeatability, and auditability of systems
Use observability solutions, such as Elastic, to monitor deployments and support data-informed decisions and rapid experiments, that drive continuous improvement
Work with CI/CD pipelines to automate deployment, validation, and testing processes, ensuring systems are secure by design, mitigate vulnerabilities, and are compliant with security policies and standards
Follow secure coding practices, adhere to coding standards, and leverage version control, automated testing, and test-driven development to produce high-quality, secure, and maintainable code
Use AI-assisted tools to accelerate development, validation, and troubleshooting
Participate in pair programming sessions as appropriate to write code and resolve deployment issues
Deploy and manage Windows and Linux servers across a hybrid environment that includes Microsoft Azure and over a dozen geographically dispersed on-premises locations
Ensure that all systems are secure by design, follow zero trust principles, and are scalable, observable, and aligned with business needs
Provision infrastructure with reliability, maintainability, and consistency in mind, and implement observability prior to production to support proactive monitoring and data-informed decisions
Collaborate with cross-functional teams and stakeholders throughout the infrastructure lifecycle to ensure solutions align with customer needs; prioritize high-value work, assess feasibility, and conduct security reviews of new systems and applications; deliver exceptional customer service and maintain clear communication to support successful outcomes
Design and implement solutions that make work easier, reduce manual effort, improve system reliability, and streamline operations across provisioning, configuration, monitoring, and remediation
Use AI, scripting, workflow automation, or robotic process automation (RPA) tools to reduce operational overhead and accelerate delivery
Use observability tools to monitor automation performance, ensure reliability, and identify data-informed opportunities for continuous improvement
Collaborate with peers and stakeholders to prioritize high-value automation opportunities and ensure that solutions are effective, secure, and aligned with business needs
Manage and troubleshoot enterprise-grade network infrastructure, including wireless access points, switches, routers, load balancers, and next-generation firewalls
Diagnose and resolve network issues using packet captures, OS command outputs, diagnostic consoles, logs, or other tools
Leverage network observability tools to make data-informed decisions and identify opportunities for improvement
Implement and maintain security measures to protect data, systems, and network availability
Collaborate with network and security teams to validate new systems and configurations, expand observability, reduce exploitable vulnerabilities, implement security controls, and enhance system resilience and usability for customers
Create and maintain clear, concise documentation for knowledge sharing, process repeatability, and operational continuity
Develop system diagrams, deployment guides, and standard operating procedures (SOPs) that support usability, compliance, and reliability
Continuously refine documentation and processes as systems evolve, incorporating feedback and lessons learned
Ensure all procedures align with FSU ITS Security Policies and Standards
Participate in peer reviews to validate documentation for accuracy, clarity, and usability
Respond to system alerts, outages, and support requests in accordance with established incident management procedures, collaborating with peers and stakeholders to ensure rapid resolution
Use observability tools to support rapid diagnosis and resolution, and create new monitoring as needed to improve visibility
Participate in post-incident reviews, highlighting key data points and observability insights to identify root causes and opportunities for system or process improvements
Implement improvements to prevent the recurrence of issues and to enhance system reliability
Participate in an on-call rotation, typically one week per month, which includes after-hours support for deployments, changes, or incidents, including on holidays and weekends
Actively work to reduce the need for after-hours assistance by leveraging automated deployment solutions, improving system reliability, and lowering the risk and complexity of changes
Assist with IT security investigations as needed
Ensure incident response processes align with the expectations of IT management, technical teams, and customers
Complete both assigned and self-directed professional development to stay current with evolving technologies, tools, and practices
Explore technical subjects that interest you, even beyond current projects
Use provided learning platforms, such as LinkedIn Learning
Participate in the ITS Professional Development Bonus Plan by completing manager-approved certifications
Pursue relevant training, certifications, and conferences aligned with team goals, subject to approval
Research and validate emerging tools, including AI, automation, observability, and other innovations, to assess their value for our organization
Apply a mindset of rapid experimentation using data to guide decisions, improvements, and the next experiment
Participation in knowledge-sharing sessions, communities of practice, and collaborative learning opportunities is encouraged
Qualification
Required
Bachelor's degree in Computer Science, MIS, or other appropriate degree and two years experience or a high school diploma or equivalent and six years of experience. (Note: or a combination of appropriate post high school education and experience equal to six years.)
Preferred
Proven ability to learn new tools and technologies quickly, with a track record of self-directed learning and adaptability in fast-paced environments
Demonstrated commitment to continuous learning and professional development
Proficient in scripting for infrastructure automation using PowerShell, with the ability to write, debug, and maintain scripts independently or with tools like GitHub Copilot; familiarity with Python or Bash is a plus
Experience using infrastructure and configuration as code tools such as Terraform, Ansible, PowerShell, or similar, with version control practices using Git, and integrated development environments like Visual Studio Code
Experience creating and troubleshooting CI/CD pipelines using tools such as Azure DevOps, GitHub Actions, or GitLab to automate infrastructure deployment and configuration
Experience provisioning and managing infrastructure in cloud environments such as Azure, AWS, or Google Cloud, with an understanding of repeatable deployment processes, and troubleshooting network connectivity with next-generation firewalls
Experience deploying containers and familiarity with container orchestration technologies such as Kubernetes
Proficient using observability tools such as Elastic, Dynatrace, Prometheus, Grafana, Splunk, Datadog, or others, to ingest new types of data, build dashboards and alerts, and derive insights for performance tuning and incident response
Experience improving infrastructure design, automation, or troubleshooting by testing ideas, learning from results, and making thoughtful adjustments over time
Experience supporting Windows and Linux systems in an Active Directory domain, including deployment, configuration, and troubleshooting, as well as managing virtual infrastructure using platforms such as Hyper-V or VMware
Experience leveraging AI tools to accelerate task completion and improve operational efficiency
Demonstrated ability to write and troubleshoot firewall rules and quickly diagnose issues across firewalls, switches, and wireless access points from vendors such as Palo Alto, Juniper, Aruba, Arista, Fortinet, Extreme, Brocade, Cisco, or others, with a focus on identifying root causes across network, OS, and application layers
Strong understanding of secure-by-design and zero trust principles, with experience applying secure configurations and patching strategies in operational environments
Demonstrated experience in infrastructure projects by planning and executing technical tasks such as system deployments, launching new remote locations, or automating business processes. This includes prioritizing high-value work, ensuring long-term maintainability through documentation and repeatable processes, leveraging automation where appropriate, and working closely with cross-functional teams to drive project success
Strong written and verbal communication skills, including the ability to document processes, contribute in team discussions, and explain technical concepts to various audiences
Proficient in creating technical diagrams to communicate infrastructure design or operational workflows
Benefits
FSU offers a robust Total Rewards package.
Visit our website to learn more about our Compensation, Benefits, Wellness, Recognition, and Employee Development programs.
Use our interactive tool to calculate Total Compensation options based on potential salary, benefits and retirement contributions, earned leave, and other employment-related perks.
Approved training resources will be paid for by the organization.
Company
Inside Higher Ed
Inside Higher Ed is the online source for news, opinion, and jobs related to higher education.
Funding
Current Stage
Growth StageTotal Funding
unknown2022-01-10Acquired
2006-08-31Series Unknown
Recent News
Research & Development World
2025-05-03
Business Standard India
2025-04-11
Company data provided by crunchbase