Experimenting at Scale, the Spotify Home Way

In the fast-paced world of streaming, personalization plays a vital role in enhancing user experiences. At Spotify, our Home serves as a personalized surface where users retrieve familiar content and discover new content tailored to their preferences. We are constantly trying to optimize the Home experience to provide more value to our users; however, doing so can be tricky.

The sheer scale of testing on Home sets it apart from other surfaces within our business. With more than 250 tests conducted annually, Home serves as a platform where numerous stakeholders and dozens of teams evaluate the effectiveness of their features through online experiments. Unlike your standard content tests (think, does X hero image get more click-through than Y hero image), Home experiments focus on pushing the boundaries of personalization, exploring innovative ways to tailor content and programming strategies.

With a staggering number of tests conducted on Home, experimenting at scale becomes a unique challenge that requires powerful tooling and meticulous coordination. To handle these efforts, our teams have leveraged robust tools and effective coordination to deliver a personalized and engaging user experience that drives value for our listeners.

How are we able to run all our experiments at scale?

We have two main experimentation focus areas:

Experiment with velocity: To empower teams to launch experiments as quickly as possible, we must make the process fast and simple, reducing the friction associated with experimentation and increasing our agility to make iterative enhancements. To make experimentation fast, we need powerful tooling to enable our teams to be able to set up, QA, and launch their experiments easily.
Experiment with quality: It is not enough just to launch as many experiments as possible. We must ensure that we are making quality releases that allow us to reach conclusive results that drive data-driven decisions. It’s OK if a hypothesis is incorrect; it’s not OK if we spend all that time and effort and don’t have clear results. To ensure that we maintain the quality of Spotify Home, we employ meticulous coordination methods to make sure everyone stays on the same page.

Powerful tooling for experimentation

Powerful tooling is a cornerstone of our commitment to experiment with velocity. Running A/B tests is core to how Personalization teams at Spotify ship their work. Some companies overlook homegrown tools; however, on Home we’ve decided to leverage our talented team to build bespoke products to unlock efficiencies, especially in the realm of A/B testing. Our specialized tools automate and expedite the experimentation process, enabling us to rapidly conceive, execute, and analyze tests, ensuring that we stay nimble in the face of evolving user preferences and market trends.

Configuration as a service

Home Config is a critical tool that empowers experimenters to manage configurations and fine-tune personalization strategies on Home. As a configuration-as-a-service tool, Home Config provides a user-friendly interface for experimenters to customize various aspects of Home. It allows them to define parameters related to ranking, content, visual treatments, and more, ensuring a personalized and tailored experience for users. With Home Config, creating one’s variants for a test isn’t gate-kept behind complex source code. Instead, our simplified configs allow less-technical users to make personalized experiences themselves, allowing us to set up new personalization strategies as quickly as possible.

Launching and configuring experiments

Spotify’s Experimentation Platform (EP) is a pivotal tool within the backstage ecosystem. Once our configs have been created in Home Config, EP gives us the power to release them into production. It provides experimenters with a comprehensive interface to design, launch, and monitor experiments. With EP, experimenters can define experiment parameters, set up control and treatment groups, track metrics, and analyze results. EP streamlines the experimentation process, enabling teams to iterate quickly and make data-driven decisions. Having an in-house experimentation tool like EP allows us to increase our experimentation velocity while maintaining our commitment to experiment quality.

Debugging and testing

Home QA is a dedicated tool that aids in debugging and testing Home. It’s our homegrown front-end application that allows engineers and product managers to simulate Home requests and view an end user’s Home page. By closely examining experiment variants before a broad release, Home QA facilitates our second goal, ensuring the integrity and quality of the Home before every launch. Home QA identifies issues before they are launched in production and debugs problems if they arise. Home QA enables us to release new features, confident that they meet our rigorous product requirements.

Meticulous coordination

Meticulous coordination is the bedrock of our pursuit to experiment with quality. By fostering a culture of transparency and collaboration, we can closely scrutinize each experimental variant, minimizing errors and ensuring that every release contributes valuable, actionable insights for enhancing our user experience. Our coordination efforts maximize the amount of tests we are able to launch, ensure that all tests are configured correctly, and prevent interaction effects from multiple tests interacting and invalidating results.

Centralized experiment management

With so many teams hungry to experiment on Home, it’s easy to run out of testing space. Spotify utilizes an Experiment Tracker to manage the slew of experiments that are proposed on the surface. This tracker provides a comprehensive overview of proposed, ongoing, and past experiments. Our objective here is not to block teams from experimenting, but to prioritize which experiments meet our most crucial Product objective within our given space constraints. The Experiment Tracker serves as a hub for experiment-related information and fosters transparency across teams, allowing them to select an ideal time frame for their launch.

Ensuring conclusive results

EVA(Experiment Validation Assistant) is a backend service that validates proposed A/B tests on Home. It runs various validations to reduce misconfigured tests, increase result validity, and streamline insights. By automating common checks, EVA decreases overhead and ensures experiments adhere to predefined criteria. The results of EVA’s validations are shared in a designated Slack channel, enabling swift feedback and necessary adjustments. EVA reduces our dependency on manual checks, increasing our speed to launch and our confidence in our releases. With EVA, we can ensure that every test launched will be able to provide insights on what works on Home and what doesn’t.

Conclusion

Through the combined power of robust tooling and meticulous coordination, Spotify’s Personalization team achieves remarkable results that allow us to continuously increase the speed at which we experiment, the scale at which we experiment, and improve on the quality of each experiment we launch. Experimentation on Home enables continuous optimization of the user experience, driving innovation and delivering personalized content to millions of Spotify users. Through this tooling, Spotify continues to ship unique and exciting new product features such as AI DJ.

We are always looking for new ways to grow. In 2023, we are introducing more advanced concepts to our experimentation toolkit, such as Interleaving, to allow us to preempt A/B testing and get results sooner with less impact to our end users. With our focus on constant improvement, our experimentation platform, infrastructure, and tooling will ensure that we’re moving in the right direction — creating a Home experience that delights our Spotify users. Spotify’s Home, as well as our experimentation tooling, will continue to grow!

Tags: experimentation