Business Function Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels. Responsibilities:
Implement and maintain highly resilient, highly available data engineering, monitoring and analytics application clusters. Perform production support for the platform
Setup the server infrastructure as per design. Ensure implementation meets bank's security standards and industry's security standards
Perform continuous improvement for the platform covering areas such as: capacity planning, observability, monitoring, reliability, and resiliency
Design and develop data engineering pipelines
Automate repetitive tasks, optimize processes and perform thorough testing to ensure quality
Create and maintain software documentation for the platform
Perform application maintenance, patching and upgrades
Ensure on-time delivery of tasks and projects
Ensure continuous uptime of applications and services
Ensure no security or audit issues
Comply to bank standards to track and follow up on the assigned projects
Cover all areas in application and infrastructure operations of the platform
Requirements:
You should be a polytechnic or university graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages
Strong communication skills and ability to explain protocol and processes with team and management
A passion for learning and using new technologies in the open source communities
A passion for coding
Total of 6-8 years of IT work experience
Working knowledge of Grafana, Prometheus, Elastic stack (Elasticsearch / Logstash / Kibana / Beats) including data ingestion, management, monitoring & analytics. Able to perform L1/2 ELK related tasks
In-depth experience in Unix/Linux/Shell/Python scripting
Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency
Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of HTTP and DNS
Ability to contribute to discussions on design and strategy
Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL), Object Oriented Programming and web application development
Good problem diagnosis and creative problem-solving skills
Experience in NodeJS, Spring boot and Kafka and would be a plus
Experience in automation tools (e.g. Ansible) & DevOps pipelines would be a plus
Self-driven, committed, and reliable team player
Apply Now We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.