Strategies and Tools for Performing Migrations on Platform
Needing to deliver faster and more reliably while managing a growing number of contributors and a more complex codebase seems like the fate of every hyper-growth tech company. For platform teams, the challenge is not any different. How can we quickly roll out and increase the adoption of new technologies safely with a growing codebase and organization?
Spotify’s mobile codebase has been growing tremendously (by 29% in 2019, 49% in 2020, and 23% in 2021). As our business expands, our codebase evolves with new experiences, such as Spotify Live. At the same time, Spotify continues to take on more engineers, increasing the number of changes in the mobile codebase over time.
A year and a half ago, we kicked off an initiative within the Mobile Engineering Strategy program, working on multiple migrations to allow client features to be developed in an isolated environment — similarly to backend microservices.
Since then, about 2,200 Android and iOS components have been associated with a system, and most of our Android and iOS codebases have been migrated to build with Google’s open source build system (Bazel), a massive effort impacting more than 100 squads across the company.
We believe that migrations will become more challenging as our codebase and organization grow in size and complexity. We need to consider such challenging migrations to be the norm going forward, if we want to deliver platform solutions efficiently and at scale for mobile at Spotify.
Below are some challenges that we faced during our journey. We’ve provided the scenarios with symptoms (“When”), what you should avoid when facing the situation (“Don’t”), and what we suggest that you do (“Do”).
Challenge 1: Defining the scope
How can we create change and produce standardized architectural solutions that support most use cases without undermining others?
- The scope seems huge — there’s a feeling that numerous use cases need to be addressed.
- It’s difficult to know where to begin with the amount of tech debt, changes required, and use cases to support.
- Stakeholders reach out several times, not fully understanding what they are supposed to do during the migration.
- Try to identify all possible scenarios before rolling out the solution.
- Start a large migration without an estimated roadmap and definition of success.
- Reach out to stakeholders without a clear definition of what they need to do.
- Know your goals. Create a product brief that explains what you are doing, why you are doing it, and how you are going to do it.
- Focus on the values. Changing them is time-consuming, but reframe goals to better align when needed.
- Address your audience. Understand their mental models so that you can talk about what’s relevant, connect where their needs are, and find proxies to expand your reach.
- Start small. Create a proof of concept, validate it with stakeholders, and get the migration through alpha, beta, and GA product lifecycles. Add use cases as you discover them, mapping out the most common ones first. This will allow you to get enough feedback and incrementally support the different use cases.
- Collaborate! Collaboration is key in the early phases. Find early adopters who are eager to try your solutions and will become ambassadors in their own tribes. This will eventually help with scalability as well.
Challenge 2: Scaling up
How can we drive large architectural and infrastructure changes faster in a hyper-growing organization?
- Large numbers of squads are impacted by the changes (more than 100 squads).
- There is a significant amount of work — including manual refactorings that need to be done by the squads during the migration.
- Progress is slow — there’s a huge number of stakeholders and dependencies.
- Stakeholders are overwhelmed with the ongoing migrations.
- Believe that large infrastructure and architecture changes are impossible or not needed in a large organization.
- Conform with moving slower when making infrastructure/architectural changes.
- Focus on stakeholder management. Identify the different stakeholders in order of priority. Be in contact with them (e.g., Slack, email groups). Tell them about the importance of your effort.
- Communicate. Keep your audience engaged by sharing the progress through newsletters and workplace posts.
- Look to automation. Simplify the migration process by investing in automation upfront. Do you need to refactor the code? Could the same step be done automatically using a script?
- Make time for spike weeks. Partner with squads and tribes to jointly dedicate a week to work on the migration.
- Use multiple resources to increase your reach. Education programs can help onboard teams to new concepts and migrations, with the possibility of applying them right away.
- Introduce the company’s engineering practices in the onboarding programs to ensure that new joiners understand and follow the recommended best practices from the start.
Challenge 3: Competing priorities
What’s the balance between having platform teams working to reduce tech debt and adopt new technologies versus holding squads accountable for the quality of the code they own?
- Stakeholders are involved in high-priority projects and have little time to adopt your solutions.
- Stakeholders think they are slowed down by platform migrations and are not motivated to adopt new technologies.
- Projects dilute and the team(s) driving the migration feels unmotivated, possibly with some members leaving.
- Code changes are necessary and in conflict with the direction of the migration.
- Believe that stakeholders understand the importance and impact of your migration and will prioritize it — they have plenty of other work to do as well.
- Give in. “We are busy working on a higher-priority bet” is a common argument to not execute on platform migrations.
- Consider the metrics and goals to be set in stone and difficult to change.
- Motivate. Showcase the positive impact of your migration to stakeholders and encourage them to get the work done.
- Continuously evaluate. Have regular checkpoints during the quarter to evaluate the speed of the migration to reach your quarterly, half-yearly, or yearly goals.
- Manage risk. If the migration is moving slowly, how can you tweak the approach to reach the goals? Can you streamline the process? Do you need more engineers? Can you hire contractors? Can you influence other tribes to bring your efforts to their backlog?
- Take the pain on-platform. When possible, make the required changes on behalf of the organization so they can focus on creating business value for our users.
- Monitor changes that go against migration KPIs and create channels with teams to support them.
Challenge 4: Being accountable
How do we hold teams accountable and keep them engaged in the adoption of new technologies that require large infrastructure changes and refactoring?
- There’s a lack of accountability in the adoption of new technologies, hence migrations are taking longer.
- It’s hard to predict when the migration will end.
- Expect there to be internal/external alignment when driving large changes over time.
- Think that it’s hard to measure the progress and impact of infrastructure changes.
- Delineate the definition of done. Use data and trend graphs to give projections. It’s important to understand where you are starting from and how fast you are progressing so that you can estimate how fast you will move in the future and whether you need to make any adjustments to speed up the migration.
- Present your progress. Take control of the definition of success and communicate often to keep your audience engaged and onboarded with your changes.
- Use dashboards. Metrics and dashboards will communicate the progress and impact, as well as help prioritize your work at scale over time.
- Maintain a timeline. Keep your roadmap up to date. Over time, your team and stakeholders might change, and they’ll need to be aware of your migration and timeline. A roadmap also increases transparency, enables feedback and collaboration, and helps identify any roadblocks.
Migrations of this nature and magnitude might become a new normal — otherwise, it might be impossible for the company to perform certain changes. New technologies will come, and we will do the necessary migrations, though we should minimize the disruption for teams. Some platform products will inherently require migrations, and potentially in a large scale, and we should consider them part of the development lifecycle, together with testing and design.
We’ve learned a great deal from this effort and hope that this knowledge is useful for other teams driving large migrations. If you want to learn more about our challenges and how we solved them, do not hesitate to reach out to us.
Special thanks to Marvin, Foundation, BoB, and Rubik for driving this effort.