Jobs via Dice ยท 8 hours ago
Engineer
Dice is the leading career destination for tech experts at every stage of their careers, and they are seeking a Software Engineer - SRE to monitor applications and respond to incidents. The role involves troubleshooting issues, managing applications through automation, and collaborating with engineers to improve service deployment and operations.
Computer Software
Responsibilities
Troubleshoot and resolve issues in live production environments and implement strategies to eliminate them with minimal support
Manage applications through automation
Support and monitor new and existing services, platforms, and application stacks
Engage in improving the lifecycle of services deployment, operations, and refinement
Provide technical expertise during service impacting events
Collaborate with other engineers on code reviews, internal infrastructure improvements and process enhancements
Use scalability testing to measure, tune and optimize system performance
Participate in periodic 24x7 on-call duties
Being accountable for resolving the outage via workaround or permanent fix
Ensuring all administration and reports are maintained and up to date including contacts information technical diagrams post major incident reviews
Responsible for communicating with various stake holders & shipping IT Communication
Responsible for the effective implementation of the process Incident, Change and Problem Management and conducts the respective reporting procedure
Monitor the incidents to ensure that the Service Level Agreement is respected
Identify initiate schedule and conduct incident reviews
Ensure the closure of all resolved and end-user confirmed Incident records
Establish continuous process improvement cycles where the process performance activities roles and responsibilities policies procedures and supporting technology is reviewed and enhanced where applicable
Headed Proof-of-Concepts on Splunk implementation, splunk indexing and plugins, mentored and guided other team members on Understanding the use case of Splunk
Knowledge on Splunk Enterprise Deployments and enable continuous integration as part of configuration using (props.conf, Transforms.conf, Input.conf & Output.conf, Deployment.conf) management
Knowledge of log parsing, complex Splunk searches, including external table lookups, Splunk data flow, components, features, and product capability
Knowledge in setting up alerts and Monitoring recipes from the Machine generated data
Qualification
Required
Bachelor's Degree or Equivalent
5 years of experience in Site Reliability engineering
Experience with one or more Cloud Platforms (Azure, AWS, Google Cloud Platform)
Experience with Container technologies: Kubernetes, Docker, PKS
Experience setting up monitoring in applications and database
Experience in third party services and third-party vendor management
Experience in ServiceNow
Excellent verbal, written, and interpersonal communication skills
Company
Jobs via Dice
Welcome to Jobs via Dice, the go-to destination for discovering the tech jobs you want.
Funding
Current Stage
Early StageCompany data provided by crunchbase