Our Multi-Agent Architecture for Smarter Advertising

Introduction

When we kicked this off, we weren’t trying to ship an “AI feature.” We were trying to fix a structural problem in how our ads business actually runs in software.

On the business side, we have multiple ways of buying—Direct, Self‑Serve, Programmatic—all sitting on top of a mostly consolidated backend. The infrastructure is shared; the behavior isn’t. Each buying channel has its own workflows, its own decision logic, and its own flavor of “what good looks like.” On the engineering side, that shows up less as “different stacks” and more as “different brains” wired into the same body:

One set of services and data powering multiple buying experiences
Channel‑specific flows that encode slightly different rules and heuristics
Surface‑specific automation (Spotify Ads Manager, Salesforce, Slack, internal tools) solving overlapping problems in slightly different ways
A steady stream of “small workflow tweaks” that are all variants of the same planning / optimization problem, but need to be implemented and maintained in multiple places

So even though we’ve done the work to consolidate services, we still end up with fragmented behavior at the workflow layer. The same core decisions—how to allocate budget, how to choose inventory, how to balance reach vs efficiency vs STR—get re‑implemented per channel and per surface. Over time, they drift.

The standard playbook here would be familiar: design a new service, define the “right” state machine for planning and managing campaigns, add some Representational State Transfer (REST) endpoints, plug it into the UIs, and call it done.

The problem is that this doesn’t really fit the shape of the work anymore:

Workflows are combinatorial. Planning, forecasting, audience selection, creative guidance, pacing, and optimization all depend on who the user is, what inventory is available,business priorities and advertiser goals. You can’t capture that in a couple of hard‑coded “happy paths” per channel and expect it to hold up as things change.
The same decisions need to show up everywhere. If we decide on a better way to allocate budget or prioritize inventory, that should consistently show up in Spotify Ads Manager recommendations, Salesforce plans, and Slack workflows. Re‑implementing the same decision logic three different times is an easy way to create tech debt and inconsistent behavior.
We’re missing an intent layer. Our systems are good at doing things (create a line item, run a forecast, fetch insights). They’re not good at taking a goal like “maximize reach in Brazil, protect video inventory, and still hit STR” and turning that into a sequence of tool calls, tradeoffs, and checks that look the same across channels.

So the core problem wasn’t “we need a new backend.” The problem was:

We don’t have a unified, programmable decision layer that can understand goals, reason over shared signals, and orchestrate our existing Ads APIs on behalf of users—consistently across buying channels and surfaces.

We also knew we didn’t want to swing to the other extreme and build a giant rules engine. Our ads logic is messy, probabilistic, and constantly changing. Forecasting, optimization, and insights already lean heavily on ML. Freezing all of that into a static decision tree would be brittle almost immediately and painful to maintain.

That’s the gap where we decided to bet on an agentic approach.

Instead of:

Hard‑wiring more deterministic workflows per channel
Burying orchestration logic inside each individual service or surface
Duplicating “smart” behavior in Spotify Ads Manager, Salesforce, Slack, and whatever comes next

We treat campaign planning and management as a set of modular agents that:

Consume the same underlying signals (inventory, audiences, STR, quality/risk, performance history)
Optimize jointly for advertiser goals and Spotify’s business constraints
Use our existing Ads services as tools instead of re‑implementing capabilities from scratch

For our org, this is new territory. Most of our AI work so far has looked like “put a model behind an endpoint” or “add a prompt‑based helper in the UI.” Here, we’re talking about:

A long‑running orchestration layer that delegates work to specialized agents
Agents with shared context and shared evaluation logic
A single agentic platform that can power all buying channels and surfaces off the same decision engine, instead of a patchwork of overlapping workflows

That’s a different mental model than “one more backend service with some workflows.” It forces us to think in terms of:

APIs designed as tools for agents, not just CRUD
Testing as behavioral evaluation, not only unit + integration tests
Observability as “what did the agent decide and why?” not just p95s and error budgets
Safety as guardrails on semi‑autonomous decisions, not just input validation

We’re taking this bet because the alternative is pretty clear: keep scaling complexity, coordination cost, and duplicated logic every time we improve planning or optimization in a new place. An agentic platform lets us centralize decision‑making once and project it everywhere—on top of the consolidated backend we already have—so the workflows can finally converge even as the products evolve.

From there, the question became: where do we prove this out first?

We chose Media Planning as the initial use case because it’s where all of this complexity shows up at once. It’s the point where sales, advertisers, inventory, pacing, and ad products collide. It’s also early enough in the lifecycle that if we get the decisions right here, everything downstream (booking, trafficking, delivery, optimization) benefits.

In the next section, we’ll go deeper on how we turned Media Planning into an agentic workflow: how we decomposed the planner’s job into tools and capabilities, how agents reason over constraints, and how we wired this into existing systems without rewriting the world.

Ads AI is our AI-powered advertising platform that leverages Google's Agent Development Kit (ADK) and Vertex AI to transform how advertisers create media plans at Spotify. By decomposing the complex media planning workflow into specialized AI agents that work in parallel, we've built a system that can understand natural language campaign requirements and generate optimized, data-driven media plans in seconds.

Goal: Build an intelligent, conversational interface that enables advertisers to generate optimized media plans through natural language interaction, backed by historical performance data.

Key Takeaway: A multi-agent architecture with parallel execution can dramatically simplify complex domain problems while improving both developer experience and system performance.

The Challenge

Media planning for advertising campaigns involves several interconnected decisions:

Goal Definition: What does the advertiser want to achieve? (brand awareness, website traffic, app installs)
Audience Targeting: Who should see the ads? (demographics, interests, geography)
Budget Allocation: How should the budget be distributed across ad sets?
Schedule Planning: When should the campaign run?
Format Selection: Which ad formats (audio, video, display) perform best?

Our previous approach required advertisers to manually configure each of these dimensions, often without insight into what historically performs well for similar campaigns.

Pain points:

Complex UI flows: Multiple screens and forms to fill out
No optimization guidance: Advertisers had to guess at optimal configurations
Slow iteration: Testing different approaches required starting over
Knowledge gap: Historical performance data wasn't easily accessible

The Solution: Approach and Implementation

Design and Architecture

We chose a multi-agent architecture where specialized AI agents handle distinct aspects of media planning. This approach offers several advantages:

Separation of concerns: Each agent has a focused responsibility and optimized prompts
Parallel execution: Independent agents can run simultaneously
Testability: Individual agents can be tested and improved in isolation
Flexibility: New capabilities can be added as new agents

Component	Technology	Purpose
Agent Framework	Google ADK 0.2.0	Provides agent orchestration, session management, and tool integration
LLM Backend	Vertex AI (Gemini 2.5 Pro)	Powers natural language understanding and generation
API Layer	gRPC	High-performance, strongly-typed service communication
Session Storage	Google Cloud	Persistent session state across conversations
Performance Data	PostgreSQL + In-Memory Cache	Historical campaign performance for optimization
Service Framework	Apollo (Spotify)	Service lifecycle, configuration, and observability

Implementation Details

Agent Breakdown

1. RouterAgent - The Traffic Controller

The RouterAgent analyzes incoming user messages and determines what information is present:

This fast routing step prevents unnecessary LLM calls and enables conditional agent execution.

2. Specialized Resolution Agents

Each resolution agent has a focused responsibility:

GoalResolverAgent: Maps user intent to campaign objectives (REACH, CLICKS, APP_INSTALLS, etc.) and searches for appropriate ad categories
AudienceResolverAgent: Extracts targeting criteria including interests (from a predefined taxonomy), geographic targets, age ranges, and gender
BudgetAgent: Parses various budget formats ($5000, 5k, €10,000) and converts to micro-units
ScheduleAgent: Handles date parsing including relative dates ("next month", "30 days")

3. MediaPlannerAgent - The Optimizer

The MediaPlannerAgent is where the magic happens. It takes all resolved information and generates optimized ad set recommendations using a heuristics-based engine backed by historical performance data.

Key Optimization Rules:

Cost optimization: Minimize cost metrics (CPM, CPC, CPI) relative to historical medians
Delivery rate optimization: Target campaigns with delivery rates close to 100%
Budget matching: Find historically successful campaigns with similar budget ranges
Duration matching: Match campaign durations to proven performers
Targeting matching: Score based on demographic and interest overlap
Unique format/goal combinations: Ensure diversity in recommendations
Budget-based scaling: Automatically adjust number of recommendations:
- €0-1,000: 1 recommendation
- €1,000-5,000: 2 recommendations
- €5,000-15,000: 3 recommendations
- €15,000+: 4-5 recommendations

Tool Integration with Function Calling

We leverage Google ADK's FunctionTool to give agents access to real data:

The @Schema annotations provide the LLM with structured information about tool parameters:

Prompt engineering for consistent output Getting LLMs to produce consistent, parseable output is challenging. We developed strict prompt guardrails:

Trade-offs considered

Single vs. multi-Agent: A single agent could handle everything, but would have a massive prompt and couldn't parallelize. Multi-agent adds complexity but improves latency and maintainability.
In-memory vs. database cache: We chose an in-memory cache for historical data to minimize latency. The tradeoff is memory usage, but campaign performance data is bounded and refreshed periodically.
Synchronous vs. streaming: We opted for synchronous responses initially for simplicity. Streaming would provide better UX for longer operations.

Results and Impact

Performance Metrics

Metric	Manual	Agentic
Media Plan Creation Time	15-30 minutes (manual)	5-10 seconds
Required User Inputs	20+ form fields	1-3 natural language messages
Optimization Data Used	None (human intuition)	Historical performance from thousands of campaigns
Agent Response Latency	N/A	~3-5s with parallel execution

Latency Breakdown

Overall Impact

Reduced cognitive load: Advertisers describe campaigns in natural language
Data-driven decisions: Every recommendation backed by historical performance
Faster iteration: Advertisers can refine by continuing the conversation
Democratized expertise: Optimization knowledge embedded in the system

Lessons Learned and Future Work

Key Learnings

Key learning 1: prompt engineering is software engineering

Treating prompts as code — with version control, testing, and iteration — was essential. Small changes in prompt wording can dramatically affect output consistency. We learned to:

Be explicit about output format requirements
Provide concrete examples in prompts
Build guardrails at both prompt and parsing layers

Key learning 2: agent boundaries matter

Drawing the right boundaries between agents is crucial. Too many agents increases latency and coordination overhead. Too few creates monolithic, hard-to-maintain prompts. Our rule of thumb: one agent per distinct skill or data source.

Key learning 3: tools enable grounding

LLMs are powerful but can hallucinate. By providing agents with tools that access real data (geo targets, ad categories, historical performance), we ground their outputs in reality. The LLM reasons about what to do; tools provide accurate data to work with.

Future Work

Streaming responses: Implement server-sent events for real-time feedback as agents process
Multi-turn refinement: Better support for iterative refinement ("frequent cycles of data evaluation")
A/B testing integration: Automatically test AI-recommended plans against baselines
Expanded agent capabilities: Creative suggestions, competitive analysis, cross-campaign optimization
Fine-tuned models: Domain-specific model fine-tuning for advertising terminology

Conclusion

Building Ads AI taught us that complex, multi-step workflows are well-suited to multi-agent architectures. By decomposing the media planning problem into specialized agents — each with focused prompts, relevant tools, and clear responsibilities — we created a system that's both powerful and maintainable.

The combination of Google's ADK for agent orchestration, Vertex AI for LLM capabilities, and our historical performance data creates a system that doesn't just understand what advertisers want — it knows what actually works.

We're excited to continue evolving Ads AI and bringing AI-powered optimization to more advertising workflows at Spotify.

Tags: Engineering Culture, Developer Experience, backend, Cloud, Machine Learning