How do we ensure database performance, availability and security in a hyper-growth environment? What are efficient ways/best practices to operate critical and large database systems? At Digibank we treat Database Administration and operations as Software Engineering problems. Our mission is to build platforms which enable provisioning and managing of Databases in safe, reliable and scalable ways. We consistently challenge the status quo, use new technologies to build platforms and tooling for engineering teams.
In this role you will make significant decisions with a huge impact on building modern banking technology. You would be working closing with Infra SRE & Application Developers. You will be part of a team, responsible for designing & architecting new solutions and finding creative ways to optimise existing solutions.
If you are:
- a strong believer of automating DB Ops & SRE aspects like provisioning, deployment, observability, incident lifecycle, uptime SLA etc.
- Bold to challenge, open to get challenged, curious to learn & grow
This is the right place for you!
Duties and Responsibilities
Database Reliability Engineers (DBRE) are responsible for keeping database systems that support all user-facing services and many other Digibank’s production systems running smoothly 24/7/365. DBREs are a blend of database engineering and administration gearheads and software developers that apply sound engineering principles, operational discipline and mature software development and automation, specializing in databases (MySQL in particular). In that capacity, DBREs are peers to SREs and bring database expertise to the SRE and Infrastructure teams as well as our engineering teams.
As a DBRE you will :
- Work on database reliability and performance aspects for Digibank as well as work on shipping solutions with the application teams.
- Analyze solutions and implement best practices for supported datastores(Primarily MySQL).
- Work on the observability of relevant database metrics and make sure we reach our database objectives.
- Work with peers(SREs, Application Engineers) to roll out changes to our production environment and help mitigate database-related production incidents.
- OnCall support on rotation with the team.
- Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations).
- Work on automation of database infrastructure and help engineering succeed by providing self-service tools.
- Plan the growth and manage the capacity of Grab's database infrastructure.
- Design, build and maintain database infrastructure that allows Digibank to scale to assist hundreds of thousands of concurrent users.
- Support and debug database production issues across services and levels of the stack.
- Make monitoring and alerting alert on symptoms and SLOs, and not on outages.
- Document every action so your learnings turn into repeatable actions and then into automation.
- Review, analyze and implement solutions regarding database administration (e.g., backups, performance tuning)
- Work with Terraform, Kubernetes and other tools to build mature automation (automatic setup new replicas or testing and monitoring of backups).
- Design and develop specifications for future database requirements including enhancements, upgrades, and capacity planning; evaluate alternatives and make appropriate recommendations.
- Have at least 5 years of experience running MySQL/PostgreSQL databases in large Environments
- Have at least 1 year of experience with infrastructure automation (Ansible/Terraform)
- Have solid knowledge of SQL and PL/SQL
- Have Solid knowledge of the internals of MySQL/PostgreSQL
- Have an urge to collaborate and communicate
- Have an urge to document all the things so you don't need to learn the same thing twice.
- Have a proactive, go-for-it attitude. When you see something broken, you can't help but fix it.
- Know your way around Linux and the Unix Shell.
- Have a passion for stable and secure systems management practices.
- Possess data modeling and data structure design skills.
- Knowledge of distributed databases (Cassandra/Couchbase)
- Knowledge of caching
- Awareness about application orchestration
- Awareness of cloud infrastructure (AWS/GCP/AZURE)