SRE - Communication Channels (MSG/IB)
The Communication Channels team builds products used by the Bloomberg community for real-time communication such as exchanging quotes, trade ideas, news and other financial information. Our email (MSG) and instant message (IB) products deliver more than 2 billion messages across millions of chat rooms per day. We have a broad user-base unlike any other in the company, comprising of asset managers, brokers, traders, financial analysts and desks across all asset classes. Our users rely on these products because of their real-time performance, massive scale, ironclad security, tight integration with financial data and applications on the Bloomberg Terminal. Most important is the singular access to the Bloomberg network of 350,000 financial professionals.
To make sure we satisfy our clients' need for speed as well as stability, we have very high standards for reliability and scalability. And that's where our SREs come in! Our goal is to ensure that IB and MSG are up 24/7. We're involved from design to deployment to ensure our infrastructure is reliable, performant and scalable. What's in it for you:
Given the criticality of our products in the daily workflow of the financial community, and the scale at which they are used, our SRE team is one of the most visible teams across Bloomberg. Our products are continuously evolving and have experienced more than 100% growth in usage over the last year, which means we have to have a very sharp focus on stability and scalability. As a member of the SRE team, you'll build and standardize our performance and capacity planning environment to allow us to easily answer questions around the health and capacity of our system, as we add more features and users.
We'll trust you to define standard methodologies and standards for testing/ monitoring/ logging/ alarming/provisioning across 90+ developers and build tools to automate our release processes. We'll expect you to be passionate about using the right tool for the job, and research new tools to determine how we can best use them for our systems. You'll have the opportunity to create sophisticated dashboards for our engineers as well as our business partners. A critical part of our mission is fostering a culture of system reliability across MSG/IB
Engineering, and you'll be able to make a significant impact on the design choices and decisions that go into developing MSG and IB infrastructure.
As an SRE on our team there is an opportunity for flexibility in forging your own path and driving the SRE culture forward. Making our infrastructure best-in-class will be your main mission, so there will be many opportunities to create and implement your own improvements. We'll send you to conferences and meetups to keep up with the SRE space outside Bloomberg and apply that knowledge to building and improving our processes here at Bloomberg. You'll need to have:
We'd love to see:
- 3+ years experience with at least one object oriented language (C++ or Java preferred)
- Proven experience with a scripting language (preferably Python)
- Familiarity with design and implementation of large scale distributed systems
- Experience with one or more of: system design, production monitoring, capacity management, deployment and rollback, provisioning, configuration and orchestration
- BA, BS, MS, PhD in Computer Science, Engineering or related technology field
Our projects include:
- Experience creating and implementing new processes and workflows related to SDLC pipelines
- Exposure to monitoring and logging tools such as Graphite, Splunk and Humio
- Exposure to containers and orchestration frameworks
- A track record of open-source contributions
- Building a comprehensive performance testing framework that will be utilized by all teams in Communication Channels for stress-testing and capacity measurement of key pieces of infrastructure.
- Establishing standards and building dashboards, libraries and tools for metric collection, visualization and alarming.
- Creating black-box health testing frameworks to monitor the health of the team's products.
- Developing a "Chaos Engineering" framework that can be used by teams both within and outside our area for failure testing.
- Building tools that track the availability of our products and allow developers to quickly identify bottlenecks and dependencies of our systems.
- Establishing procedures around scalability, failover, Service Level Objectives, cluster provisioning, deployment strategies, etc. with the goal of improving the robustness of our infrastructure.
Check out more about how we work and what it means to be an SRE at Bloomberg: https://www.techatbloomberg.com/blog/bloomberg-bets-big-on-sres/
If this sounds like you, apply! We'll get in touch if we believe you're a good match and get started with a technical phone interview.
Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.