VP / AVP, Site Reliability Engineer, Group Consumer Banking and Big Data Analytics Technology, Technology & Operations
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels. We are looking for highly motivated individuals who are interested to join our Site Reliability organization for our next generation of Investment and Insurance products as a Site Reliability Engineer. Successful candidates will be working closely with our internal users and development teams to ensure production systems stability in a highly competitive environment.
Working with our business units, development teams, and many other units to help maintain the high quality and service level objectives of our systems. Optimize the supportability of systems through automation and applying basic SRE principles such as blameless post-mortem, error budget, and automation. Provide production support for the application domain when applicable. Key Accountabilities
Manage, monitor and operate the system to ensure all business functions are running smoothly. Work across teams to continually review, provide feedback, implement best practices to improve the efficiency of the systems and drive future innovation. Manage on-going changes while retaining high levels of service availability to our customer base. Pragmatically identify root cause for production incidents and lead to implement necessary actions to prevent recurrence. Drive incident management process and support a blameless post-mortem culture. Automate the system operations to reduce Toil and attain high level of efficiency. Responsibilities
Participate in platform operations management and capacity management. Coordinate and implement platform/infrastructure upgrades and releases with technical and business teams. React to critical issues immediately - troubleshoot, investigate and apply appropriate solutions to normalise systems operations. Provide off-hour/weekend support to ensure production systems stability. Troubleshoot problems across a wide range of technical skills (development, CI/CD, infrastructure, etc) Maintain awareness of relevant technical and product trends with self-learning and job shadowing. Create and maintain the operational documents to reflect system changes and upgrades. Ability to communicate effectively, professionally and comfortably, both verbally and in writing across all levels. Requirements
Bachelor's Degree/Diploma in Computer Science, Computer Engineering, or Computer Application. Equivalent experience may be considered. 3+ years of experience working in supporting critical applications using API driven technologies 2+ years of hands on experience in Python development (preferably with RESTFUL APIs) 2+ years of working with a modern stack (AWS, PCF, containers, or Kubernetes) 1+ years of Continuous Integration and Continuous Delivery experience through Jenkins or equivalent. Experience with modern observability tools such as Grafana, Kibana, or Prometheus preferred. Experience working in an Agile (SAFE or Kanban) environment preferred. Knowledge and/or experience using SQL and Linux Shell scripting Basic understanding of firewalls, load balancers, and networking concepts. Communication skills with all levels and team work spirits are essential. Proactive with good analytical and organization skills. Ability to work independently, multi-task, prioritize and deliver in a time pressured environment. Apply Now
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.