Switching Build Systems, Seamlessly

October 17, 2023 Published by Patrick Balestra, Staff Engineer

At Spotify, we have experimented with the Bazel build system since 2017. Over the years, the project has matured, and support for more languages and ecosystems have been added, thanks to the open source community and its maintainers at Google. In 2020,  it became clear that the future of our client development required a unified build system that would scale well with our polyglot, multiplatform, and multimillion-line codebase. 

So we focused more of our energy on Bazel, and we transitioned the iOS Spotify app to build completely with Bazel for our 200+ engineers — without missing a single weekly release for millions of our iOS users.

What’s a build system, anyway?

A build system is a collection of software tools used to facilitate the build process. Ever since our iOS application was first released, way back in 2008, we had relied on the Xcode build system to build and package our app for both development and release builds. It’s the default build system that ships with Xcode — the integrated development environment (IDE) for Apple platforms, developed by Apple — and it’s used by millions of developers. As mobile applications became bigger and bigger (in lines of code, number of contributors, etc.), companies began to write more modern build systems, not tied to any existing tool or IDE (i.e., Blaze at Google and Buck at Meta). Both were open source at some point over the last half decade. As our codebases grew at a rate of over 30% YoY, we saw significant decreases in developer happiness and productivity.

With the promise of much faster builds with an efficient remote cache and remote execution services, we set out to productionize our usage of Bazel at the beginning of 2020. Our objectives in adopting Bazel were clear: to decrease our continuous integration (CI) and local build times and increase developer productivity by shortening the feedback loop.

The migration

To make the migration as safe as possible without stopping development for the more than 120 teams contributing to our weekly releases, we needed to find a way to build with multiple build systems side by side until Bazel was fully validated. Thankfully, past investments in our tooling paid off, as we previously had moved our project declaration from a checked-in pxbproj file to a custom Ruby DSL and YAML format. This allowed developers to add new modules to the project declaratively. As an example, the following snippet declares a new FooKit target with some Swift source files, a resources directory, and one dependency.

FooKit:
 type: spotify_static_library
 sources:
 - Sources/**/*.swift
 resource_dirs:
 - Resources
 deps:
 - BarKit

Thanks to this abstraction layer, we wrote scripts that generated over 2,000 BUILD.bazel files and enabled our project to build with Bazel successfully in a matter of days. Only 30 to 50 BUILD.bazel files needed to be manually written, which meant that most engineers initially weren’t even aware that we were running two build systems side by side. The module described above would result in the following gitignored BUILD.bazel file:

music_swift_library(
    name = "FooKit",
    srcs = glob(["Sources/**/*.swift"]),
    data = music_glob(["Resources/**"]),
    deps = ["//Systems/BarKit"],
)

The impact

In May 2022, developers had to wait about 80 minutes for feedback on their changes before merging.

Graph: Build times decreasing over time
Figure 1: Green represents seven-day 75th percentile moving average build time in minutes.

In the following months, we moved our CI configurations to Bazel one by one, starting with the configurations that took the longest. One in particular, with more than 800 test targets containing close to 3 million lines of code, took more than 45 minutes when building with Xcode. After moving to Bazel, this time decreased to less than 10 minutes. The improvement was primarily attributed to using an efficient Bazel remote cache and parallelizing heavy build tasks using remote build execution (RBE). We use BuildBuddy to power our builds and to collect valuable insights into how our users use Bazel. Extended telemetry has been key to help us easily locate performance issues and resolve bottlenecks to improve our cache hit ratio.

As our migration progressed, related efforts around increased module isolation and improved app architecture allowed us to build and run fewer tests. This was because we could now deterministically select which part of the dependency graph each change affected through Bazel queries (and bazel-diff in particular). To summarize, in just a few months, the CI feedback loop improved by a factor of four, decreasing from 80 minutes to 20 minutes.

The cost of a migration

During the first steps of this migration, we prioritized moving pre-merge builds — where engineers wait the longest — to Bazel, with the goal of improving the developer happiness of the 200+ engineers contributing to the iOS repository every month. 

Maintaining multiple build systems during a migration can be costly. In our case, we saw differences between the two build systems running in parallel which, at times, caused interesting compilation failures in one of the two build systems. For example, developers would report builds passing locally but failing in CI. Even though we had greatly reduced the time developers spent waiting for their changes to be tested in CI, our platform engineers now spent more time helping feature engineers debug build system issues. We invested in a few approaches to unify the two systems as much as we could and reduce the time that feature developers were wasting due to having to debug issues unrelated to their work. We realized that the only way to completely remove this maintenance cost was to move to a single build system as soon as possible. 

IDE experience

iOS engineers use Xcode to build, debug, and test Apple applications. We needed to come up with a solution to introduce Bazel in the Xcode IDE and support all the existing workflows. We decided to work closely with the open source community to build a new IDE integration for Xcode users. After months of testing and contributions along with other companies on a path similar to ours, we adopted rules_xcodeproj. In November 2022, we began A/B testing our new Xcode integration. We closely monitored local build times and all problems reported.

Cloud users vs Bazel users over time
Figure 2: Dark green represents Xcode build system users, and light green represents Bazel.

In Figure 2, above, you can see our progressive rollout. We started by making Bazel for local development an opt-in option. This allowed us to gather feedback from engineers willing to test an experimental IDE integration. As we fixed more bugs and improved the experience over the following weeks, we increased the Bazel rollout. By March 2023, almost all iOS engineers were using our new Bazel integration without major issues and the 75th percentile build time was positioned at around 30 seconds.

As expected, after engineers started using Bazel to build and debug locally, under the same build system, we started seeing fewer complaints about differences between local and CI builds. In May 2023, we decided to decommission our legacy build system for local development. It was now time to tackle the last and most critical part of our infrastructure still relying on our legacy build system: moving our release builds and millions of users to a version of the application built with Bazel. 

Releasing with Bazel

Releasing a mobile application doesn’t come without complexities. We rely on several collected data points and tests to ensure the highest-quality version is released at any given time. There are numerous integrations with the build system, such as interactions with our experimentation platform, registering metadata for new builds, and storing build artifacts to parse crash reports. We needed to adapt all existing tools to work with Bazel where necessary. During the 15+ years of our mobile app existence, we’ve accumulated countless custom integrations and optimizations to our release process. The ownership and knowledge of all these steps was with teams spread out across the company, so we needed a centralized place to manage risks. We decided to write a document on how we were going to approach this rollout.

We documented our plan, risks, rollback strategy, and tentative timeline in a single document and shared it with all the interested parties. We kept track of all known differences and broke down into multiple releases certain risky changes that we depended on.

We produced fully optimized release builds both with Bazel and Xcode, side by side on the same hourly cadence. This was key to being prepared to submit hot-fix builds from our previous build system in case something went wrong. 

We wrote tools to constantly check that there were no unknown differences between the two builds. Our tooling checked all files included in the package and, depending on file type, performed deeper checks, such as the following:

  • We compared images and other assets to make sure no resources were missing or wrongly included.
  • We compared configuration files such as Info.plist and JSON for equality.
  • We compared the binaries by looking at the symbols included. This made us more confident that all the right classes were being compiled correctly.

The rollout

We designed a rollout strategy that would allow us to test the Bazel release on employee devices for about two weeks before proceeding with the rollout to external users. This gave us the chance to test our release tooling end to end without impacting the weekly release schedule. Following the employee rollout, we proceeded with our external alpha and beta testers. Because we couldn’t A/B test multiple builds, we only released one build to our users. We decided to continue producing builds with both build systems for a few more releases until we were confident enough that the rollout was successful. Thankfully, our crash and performance metrics looked good, and we received no reports of significant differences or bugs introduced due to our build system switch.

Conclusion

Many contributions to both the open source Bazel build system, its rules, and custom integrations were needed to make this migration happen. We’re proud to announce that our iOS application is our first major client fully built with Bazel.

We would like to thank everyone who contributed to this work over the last few years. A special shout-out also goes to the Bazel team and all the other open source contributors who dedicate their time every day to this wonderful project and community.