Damcosoft · 2 hours ago
Senior Site Reliability Engineer(NYC, NY)Hybrid
Damcosoft is seeking a Senior Site Reliability Engineer to join their team in NYC, NY. The role involves supporting the SRE team in enhancing workflows through automation, troubleshooting technical issues, and ensuring system reliability through effective incident and change management practices.
Responsibilities
Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements
Handle technical escalations, troubleshoot complex FIX and API connectivity issues, and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution
Adhere to and administer incident and change management policies
Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability
Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones
Coordinate Incident Post Mortems and RCA analysis
Design, implement, and maintain comprehensive monitoring, logging, and tracing solutions (observability stack) to provide deep insights into system performance and user experience
Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs), managing error budgets to ensure service reliability meets business needs
Qualification
Required
5+ years in a senior SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations
Knowledge of FIX protocol and messages, ability to read FIX logs
Familiarity with REST APIs and a strong understanding of API integration
Proficient in Python and scripting for automation and system management, with a proven track record of developing and implementing automation solutions
Expertise in SQL and transactional databases, including querying and troubleshooting
Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis
In-depth knowledge of core networking concepts including TCP/IP, routing, and DNS
Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo)
Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage
Knowledge of change management processes and risk management
Preferred
Experience in the brokerage or financial industry
Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB
Experience maintaining and supporting containerized systems, with familiarity in orchestration tools
Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation
Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow
Advanced skills in managing containerized environments using Kubernetes and OpenShift
Practical experience with Confluent Cloud, RedPanda for event streaming architectures
Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging
Company
Damcosoft
Damcosoft is a global consulting and technology solutions & services company, offering industry-specific solutions on Enterprise Applications (ERP, CRM, SCM), EAI, Database & Data warehousing, ECM, BI and Workflow/BPM for domains like BFSI, Telecom, Public Sector and Utilities.
H1B Sponsorship
Damcosoft has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (14)
2023 (6)
2022 (9)
2021 (5)
2020 (4)
Funding
Current Stage
Late StageCompany data provided by crunchbase