Site Reliability Engineering Lead (Remote) at Trustmark in Remoteother related Employment listings - Aransas Pass, TX at Geebo

Site Reliability Engineering Lead (Remote) at Trustmark in Remote

Position Overview:
Trustmark is currently seeking a Site Reliability Engineering Lead to join the centralized Software Engineering organization. This leader will hold the responsibility of establishing the strategy, roadmap and executing the Site Reliability Engineering, service assurance and delivery capabilities here at Trustmark. This strategy will include building out the target state, establishing the team, and determining measurements of success for site reliability practices and service assurance. This role will collaborate with Engineering, Application Development leaders, teams, Product areas, Infrastructure organization, and vendor partners and will have an influence up and down the management chain on production stability measures and service reliability. This individual will need to be a great team player, with the ability to influence teams inside and outside of the organization with strong communication and presentation skills to regularly share strategy and our progress against critical milestones on the roadmap. The Software Engineering organization at Trustmark is transforming what it means to work for a Benefits company with 100 years of legacy. We need people with diverse expertise and perspectives to meet the business needs while aligning to the ever changing technology world. We're looking for individuals that can adapt, learn, innovate and help us constantly improve our service levels and foster the several decades of relationships we've had with our clients. This position can be entirely virtual/remote/work from home and the individual can sit anywhere in the US. Key
Responsibilities:
Establish a comprehensive enterprise level strategy and roadmap to begin site reliability practices at Trustmark. This would involve continually defining reliability goals, measuring and working to improve services. Build a team to execute on the roadmap to enforce automation, monitoring and resiliency. Establish SLA, SLOs, formalize them and track performance against them in partnership with vendors, application teams, infrastructure teams and business stakeholders. Evaluate the current tiers of service of our applications, reliability standards and practice to define steps to continuously improve on them. Conduct blameless postmortem on priority incidents of top tier critical applications. Be a promoter of best practices to improve our service levels and present recommendations with strong justification for funding approval. Create Dashboards and reports to communicate key metrics. Establish and lead a community of practice to foster continuous improvement of system performance, reliability and share knowledge, lessons learned across the IT organization. Responsible for level one production support by working with and developing a team of Engineers in an onsite offshore model. Create and enforce site reliability standards and work with the Infrastructure organization and Application delivery teams to continuously improve production stability, resilience, while concurrently reducing our risk profile over time. Collaborate with development teams to promote the concept of reliability engineering during all phases of the software development lifecycle to detect and correct performance issues and meet availability goals. Identify, evaluate, and recommend monitoring tools and diagnostic techniques to improve system observability. Perform analytics on previous incidents to understand root causes and better predict and prevent future issues.
Qualifications:
Strong intellectual curiosity 7
years experience with Application development, Infrastructure, Technology operations and service delivery. Experience having established an SRE practice, with credible metrics that have improved resilience and lowered the overall risk profile. Prior experience establishing and monitoring SLAs, SLOs, OLA and error budgets. Demonstrated experience with DevOps principles and resiliency design. Communication skills and Presentation skills to share data, results and standards on a broad spectrum to the entire organization as well as targeted communication within IT teams and Senior leaders. Demonstrated experience with application monitoring, logging tools, techniques and concepts. Bachelor's degree in Engineering/ Information technology/ Computer science with 5
years of experience in roles with increasing leadership responsibilities. Experience in a highly regulated industry such as benefits, payor, TPA, insurance industry is a plus. Agile- Scrum, SaFe experience preferred. Current knowledge on site reliability engineering methods and trends such as observability-driven development and chaos engineering.
Salary Range:
$150K -- $200K
Minimum Qualification
DevOps & Site ReliabilityEstimated Salary: $20 to $28 per hour based on qualifications.

Don't Be a Victim of Fraud

  • Electronic Scams
  • Home-based jobs
  • Fake Rentals
  • Bad Buyers
  • Non-Existent Merchandise
  • Secondhand Items
  • More...

Don't Be Fooled

The fraudster will send a check to the victim who has accepted a job. The check can be for multiple reasons such as signing bonus, supplies, etc. The victim will be instructed to deposit the check and use the money for any of these reasons and then instructed to send the remaining funds to the fraudster. The check will bounce and the victim is left responsible.