Data Platform Explained Part I

April 2, 2024 Published by Anastasia Khlebnikova (Senior Engineer) and Carol Cunha (Product Manager)

As engineers working at Spotify, we frequently find ourselves explaining our robust data platform to fellow professionals who are contemplating embarking on a similar venture within their organizations. Despite the number of articles, blog posts, and talks one can find online, it can be challenging to digest the information about the building blocks of a data platform, how to start building one, and the tradeoffs to consider for what is good for the business. In this blog post series, we’ll delve into what our data platform entails, its pivotal role at Spotify, and the key factors leading organizations to consider building one.

The need for a data platform at Spotify

Since the beginning, Spotify has been a data-driven company. Today, we rely on insights that are drawn from a staggering 1.4 trillion data points processed daily. This vast amount of data flows over a reliable data infrastructure containing several dimensions, components, and products — like Spotify’s event delivery infrastructure. Each is aligned to deliver on its proposition to fuel crucial use cases for different parts of the business, from payments to experimentation, with data. 

The need for a data platform at Spotify emerged organically and evolved over time into what it is today. We learned and made corrections along the way, while use cases were maturing, thanks to the company’s commitment to leveraging high-quality data for decision-making.

Triggers to consider when building a data platform

The decision to build a data platform typically arises when specific needs within an organization require a more organized and streamlined approach to handling data. 

Here are some triggers that may prompt organizations to consider investing in a data platform:

  • Having more searchable and usable data. Democratized data is easier to access and consume for data scientists, engineers, and product and business teams. 
  • Satisfying financial reporting requirements. When financial reporting is involved, organizations seek pipelines that run predictably, often incorporating reporting and alerting mechanisms.
  • Ensuring data quality and predictable results. Organizations realize the importance of guaranteeing data quality and predictable outcomes, especially when dealing with critical datasets.
  • Facilitating an efficient development lifecycle.  Experimentation becomes a first-class citizen in the organization. Processes, tools, and squads require a structured platform that can facilitate seamless experimentation.
  • Enabling machine learning capabilities with well-classified data. Machine learning thrives on well-classified data, making a structured data platform essential for ML initiatives.

The evolution of Spotify’s data platform 

At Spotify, the data platform evolution was part of the company’s growth journey. What began as a single group managing Europe’s largest Hadoop cluster eventually transformed into an entire team encompassing various product areas. The key components of this team include the following:

  • Data collection. Delivers data collected from various clients.
  • Data processing. Manages the scheduling of pipelines for efficient data processing.
  • Data management. Focuses on data attribution and privacy, ensuring the integrity and security of Spotify’s data.

By seamlessly connecting the components above — the building blocks responsible for specific data dimensions, such as data processing — we were able to successfully create the reliable data platform used across Spotify. This platform not only makes data easily searchable but also serves as a valuable source of information for running experiments, improving the development experience. 

To gain further insights into how Spotify conducts experiments leveraging Confidence, you can explore our blog post Confidence: Spotify Experimentation Platform.

Conclusion

In the ever-evolving landscape of data-driven decision-making, a well-structured data platform emerges as a critical asset. Our journey in building and evolving Spotify’s data platform can serve as a helpful example for organizations considering similar ventures. 

As the volume and complexity of data continue to grow, the role of a robust data platform becomes increasingly critical in unlocking valuable insights and driving innovation.


Stay tuned for posts that will dig deeper into the various aspects of building and maintaining Spotify’s data platform, our endeavors (not always successful), and our future plans.


Tags: