Leveraging Mobile Infrastructure with Data-Driven Decisions
TL;DR The pursuit wasn’t always easy, but “putting data first” has helped Spotify dramatically improve the infrastructure team’s decision-making process while improving company productivity. Raul Herbster describes the benefits, especially for mobile DevOps teams.
Unsurprisingly, good infrastructure and tools play an essential role in Spotify’s efforts to help developers improve their productivity. We are constantly on the lookout for ways to make development as frictionless as possible, so that feature developers can focus on what’s important — delivering exceptional products.
Before we dive into the role of data, we have to take a look at agile, its strengths and where tools and infra (DevOps) can help. Agile methodologies advocate adaptive planning, and continuous, early delivery of product value. The needs of the client are the main focus and agile gives plenty of room to change requirements throughout the process.
But agile lacks some qualities that protect developer productivity. For instance, most of the Spotify teams were left alone to deal with system admin tasks.
This is where DevOps comes in. Put simply, it closes the gap between the developer and IT-infrastructure functions within an organization, improving productivity and giving developers more time to focus on quality. For more on this, consider reading: Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations.
The mobile DevOps challenge
With a large monolithic code base, companies have faced challenges (to put it lightly) as they modularize their work. With longer release cycles compared to backend microservices, new versions of mobile apps are rolled out in phases (alpha, beta) and constantly monitored to catch any bugs that might compromise the user experience.
Mobile DevOps teams normally focus on two activities: app quality and developer productivity. The two are intrinsically linked — improve the development experience and feature developers will have more time to focus on delivering a high-quality application.
To improve this approach, we prepare for worst case scenarios, i.e. build time degradation, bugs, or performance degradation a user might experience, before they happen.
Data: making the difference
At Spotify, we believe that data can make an enormous difference. We have focused our efforts on putting “data first” and using it to drive business decisions, with thrilling results. When we started, we probably didn’t see the full potential of data adoption across all levels of the organization, and how this could help the infrastructure teams decision-making process while improving company productivity.
In the graphic below you can find the steps we’ve taken to shift the role of data in Spotify’s mobile DevOps teams.
Data promotion and education
Back in 2017, Spotify put renewed emphasis on data and its importance to the overall competitiveness of the business. The goal was to use data and insights to drive the business forward and maximize the opportunity for data-driven innovations.
The data infrastructure teams were tasked with making data generation, collection and analysis easy and accessible. Meanwhile, our Tech Learning team launched Data University, a series of training sessions that covered different aspects of data science and engineering, designed to help engineers solve product-related problems. Since its inception four years ago, Data University has welcomed over 500 attendees consisting of engineering managers, product managers, and more.
After many of us attended Data University, we returned to our respective squads and found that, in particular, our Android infrastructure team was facing challenges regarding build times while the local development experience needed improvement.
In an effort to improve the developer experience, we looked at how we could automate the collection of initial figures and make it easier to see historical data and gain better insights into system behavior. We also wanted to try and reduce the number of components in the build system. Being able to analyze a single piece was a desirable feature but unavailable.
At this moment, we realized that having the data infrastructure in place to collect information and support the analysis of build time would be an enormous help. We would have both an overview of the pieces within our systems as well as the more granular information we needed to help validate the changes we were making.
At Spotify, we have tribes that focus on providing stellar data infrastructure, equipping engineers around the company with building blocks that collect data and feed it into useful visualizations. However, the data generation piece depends on each platform and applications.
For example, our Android apps use Gradle (Gradle Enterprise Edition) as a build system and it already provides a seamless way to generate, collect and store all the data we need to understand our software in terms of the local development experience. In this scenario, we had to focus specifically on data pipelines and visualization (dashboards). For iOS, there isn’t an established solution that helps with data generation, collection and storage. So we built tools to help us with those tasks.
For the mobile DevOps teams, we had to actually create the support we needed to help us work with data. Once the system was in place, we evolved it to include the data we generated and stored, along with accompanying dashboards. To avoid deterioration of the data quality, we had to include checks to data consistency and data pipelines, ensuring components of our data solution were continuing to work well.
Over time, items related to our data system were prioritized by our internal teams, and we began to own data. Our product teams also saw the importance of a good data pool, with insightful visualizations, where they could validate product discussions and start using data to drive the decision making process for mobile DevOps teams.
Observability and monitoring: The key to DevOps success
One of the first ideas we brought back from Data University was the creation of dashboards for our teams, so they could constantly analyze data trends. We would give these dashboards a quick glance every now and then, but without checking them regularly, it wasn’t always effective.
To address this, we established steps and alerts that provided notifications about regressions or broken pipelines, and prompted timely risk mitigation actions. Each dashboard was also assigned an owner; owners were held accountable for ensuring dashboards were up to date.
The dashboards gained traction internally, and were used in meetings with stakeholders who recognized the importance of tracking productivity metrics for their components. To allow stakeholders to access these dashboards and data visualizations regularly, we made them available to the organization so that feature developers could monitor their metrics easily.
Data-driven decision making
Tech was the first to benefit from the data adoption process, enjoying clearer direction on how to tackle requirements.
Let’s consider the adoption of a new language or framework. Product comes to the Tech teams and presents their objectives. In this case, let’s say the ask is to “Deliver a frictionless experience for Swift development at Spotify”. By friction we mean any step in the local development process that might impact productivity, including long build times, unclear error messages and flaky tests.
With the data in place and trends illustrated in dashboards, we can accurately predict impediments and when the systems will become inefficient if no action is taken, allowing us to act accordingly. With this input, we can prioritize items that will improve the systems.
In Figure 01 below, you can see build time trends for components based on different languages used for iOS development. As part of this planning, Tech and Product can focus on efforts to improve build time for Swift, as the number of components for the language will soon be greater than ObjC.
Figure 01: Trends on build time, considering languages in our code base (ObjC and Swift)
The data we collect can be also very useful for other teams. For instance, when we need to migrate support for development environments or APIs, understanding what our developers are currently using allows us to plan more effectively.
Figure 02: Percentage of XCode users per version
With a data-driven mindset, we can split problems into two groups: I) those where we have the necessary data to take action and II) those where we need to gather additional data for future decisions. These groups are important during all phases: planning, implementation and evaluation (Figure 03).
Figure 03: Data-drive iteration
Through planning, we can identify scenarios for improvement based on historical data. The data may not describe the current scenario, per se, but it offers a good baseline for improvements. Let’s say we already know the build time for our systems considering different scenarios, we can expect to either keep the same numbers or improve them regardless of the growth of the Swift codebase.
During planning, we can also introduce tasks that collect and display the data needed to validate our changes. It can reveal questions including “are we collecting enough information to check whether developers have the remote cache on?” or “how many components do they change on average in a single PR?” We may not have the answer to these questions, but we can include tasks to collect this data in the next iteration.
In the evaluation phase, we can collect feedback quickly from our data to understand if the changes had an impact on the local development process. It’s usually a very fast cycle — we can typically see improvements within a day and can then trace some projections.
For Product, data has helped to evaluate not just the effectiveness of the solutions, but also adoption and satisfaction. Traditionally, product managers used to evaluate products based on frequent user research from the very earliest stages. While useful, this doesn’t help much during the conceptualization phase.
Continuing our efforts to make data-driven decisions
Data-driven decision making is a reality in a range of products from customer-facing features to mobile DevOps. For infrastructure teams, data can guide planning, prioritizing efforts to avoid deterioration of development productivity, and working on bottlenecks before they are even observed by engineers in the company. At Spotify, our journey with data started some years ago and we continue to see impactful, positive results in several areas of the organization through the data collected. There are still other challenges, such as the usage of data to help drive changes on the architecture, but we’re excited to see the solutions we’ll create from our data-driven decisions.