Spotify’s Vulnerability Management Platform

November 8, 2022 Published by Yukio Mizuta (Sr. Backend Engineer) and Nurit Izrailov (Software Engineer)

We started developing our vulnerability management platform (VMP) at Spotify in Q2, 2020, and now that we’ve implemented it and use the system in our day-to-day work, we wanted to take a moment to share our journey to help reduce security risks in an efficient and scalable manner.

Vulnerabilities

Preventing vulnerabilities within Spotify is the Security Tribe’s priority, and we focus a lot of our efforts on reducing the exposure of vulnerabilities. For example, in our Golden Path, we embed our security features out of the box so Spotify developers can build their apps securely without any additional effort. Inevitably, vulnerabilities arise, so it’s critical that we regularly evaluate our approach to resolving them.

Toward automation

A vulnerability is assessed and reported on a vulnerability detection service, which we call a reactive control (RC). Once it’s reported, we search for the owner of the impacted asset, notify them, and explain the details of the vulnerability, providing remediation steps and guidance. Once the owner is notified, they are responsible for fixing the vulnerability. There’s much more to it in reality, but that’s the pattern of operations — and when you see a pattern, it’s time to consider automation!

Vulnerability management platform (Kitsune)

At first, we used Spotify’s Comet notification system for automation. We built peripheral systems to ingest vulnerability information from reactive controls into Comet. Then Comet would send email notifications to asset owners. When an asset owner fixed a vulnerability, they clicked the “Resolve” button in the email to update us. 

With Comet, we had the basic automation pattern covered, but we wanted to go even further.

We reevaluated our vulnerability lifecycle by first defining the states that each vulnerability could be in, along with the actions required of the responsible team. It was at this stage we decided to add more states than what Comet offered. 

We wanted to visualize the vulnerabilities each team was assigned on Backstage — from there, teams could prioritize and act on those vulnerabilities. The last piece we included was the relational data structure, which went beyond the event format data Comet provided, making it easier for us to process the data for further analysis.

Those motivations led us to start building Kitsune, our vulnerability management platform. 

Kitsune started as a backend API service that was to become the source of truth about vulnerabilities within Spotify, with a Backstage UI plugin for user interactions.

Kitsune now works as our vulnerability lifecycle management system. Teams are automatically notified when vulnerabilities exist. And when necessary, Kitsune can escalate to appropriate stakeholders. We provide vulnerability list pages on Backstage, and teams can go there anytime to review the vulnerabilities that are assigned to them, such as before sprint planning, to include vulnerability fixes in their day-to-day work. 

Funnels of vulnerability sources

Kitsune began with vulnerabilities reported by two reactive controls, one of which is the security bug bounty program HackerOne. But we aimed to be able to handle vulnerabilities from other RCs, too. In order to keep the systems decoupled, we designed a limitation to Kitsune itself, so it handles business logic applied to vulnerabilities in general. We introduced the concept of a “mediator” that takes care of RC-specific business logic and converts the data to be compatible with Kitsune. After the launch of the mediator system, we were able to integrate the VMP with a few other RCs cleanly, and we are working to add more reactive controls to extend the coverage. Also, it’s common that a vulnerability is reported by other sources, including our employees, so we made it possible to upload manually reported vulnerabilities to Kitsune with a CSV file. 

Metrics

Sample dashboard view. Figures above are for illustrative purposes only.

Providing insights based on metrics was one big goal from the beginning of this project — driving teams to engage in fixing vulnerabilities, keeping Spotify secure, and allowing the Security teams to plan strategies. We designed the system with those goals in mind and have collected necessary data. With that relational data, we generate metrics that help Spotify employees understand the current risk of each unit within the org and how they’ve resolved those past vulnerabilities. We visualize this on Security Hub, our UI plugin on Backstage. In addition, we send quarterly reports to upper management based on the data we have collected.

What’s ahead

We started development on our vulnerability management program about a year and a half ago, and it is still evolving. There are a lot of improvements we can make, but our goal is to always enable Spotify to take risks responsibly by engaging people in an efficient and scalable way.


Tags: , ,