Data

84 articles

October 23, 2017
Big Data Processing at Spotify: The Road to Scio (Part 2)

In this part we’ll take a closer look at Scio, including basic concepts, its unique features, and concrete use cases here at Spotify [...]
Published by Neville Li
October 17, 2017
TC4D: Data Quality By Engineers, For Engineers

Changing an engineering culture is one of the biggest challenges for any organization. It requires challenging an existing way of working, and introducing compelling improvements [...]
Published by Nitzan Blouin
October 16, 2017
Big Data Processing at Spotify: The Road to Scio (Part 1)

This is the first part of a 2 part blog series. In this series we will talk about Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and [...]
Published by Neville Li
June 7, 2017
Meet our engineers – Charlie Pastuszenski

What’s your name and where are you from? My name is Charlie and I come from the US and grew [...]
Published by Spotify Engineering
April 26, 2017
Reliable export of Cloud Pub/Sub streams to Cloud Storage

Every day, Spotify users are generating more than 100 billion events. Every event is being generated as a response to [...]
Published by Igor Maravić
March 31, 2017
Spotify’s Love/Hate Relationship with DNS

Forward: This blog post accompanies our presentation given at SRECon 2017 in San Francisco. The recording of the talk can be viewed here, [...]
Published by Lynn Root
August 7, 2016
Commoditizing Music Machine Learning : Services

Five years ago, music personalization at Spotify was a tiny team. The team read papers, developed models, wrote data pipelines [...]
Published by Spotify Engineering
March 25, 2016
Managing Machines at Spotify

Introduction When you log into Spotify, browse through your Discover Weekly playlist, and play a track, you’re interacting with some of our [...]
Published by Nic Cope
March 10, 2016
Spotify’s Event Delivery – The Road to the Cloud (Part III)

Whenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a [...]
Published by Igor Maravić
March 3, 2016
Spotify’s Event Delivery – The Road to the Cloud (Part II)

Whenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a [...]
Published by Igor Maravić
February 25, 2016
Spotify’s Event Delivery – The Road to the Cloud (Part I)

Whenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a [...]
Published by Igor Maravić
January 27, 2016
SDN Internet Router – Part 2

Introduction In the previous post we talked about how the Internet finds its way to reach content and users; how Internet relations [...]
Published by dbarrosop
January 26, 2016
SDN Internet Router – Part 1

Introduction This is the first part of a series of posts about a project we have been working with for [...]
Published by dbarrosop
December 9, 2015
ELS: a latency-based load balancer, part 2

What to Measure? In part 1, we already mentioned a few metrics that should be considered by the load balancer. Success [...]
Published by Lukáš Poláček
December 8, 2015
ELS: latency based load balancer, part 1

Load Balancing Most Spotify clients connect to our back-end via accesspoint which forwards client requests to other servers. In the picture below, the accesspoint has [...]
Published by Lukáš Poláček
November 17, 2015
Monitoring at Spotify: Introducing Heroic

This is the second part in a series about Monitoring at Spotify. In the previous post I discussed our history of operational [...]
Published by John-John Tedro
November 16, 2015
Monitoring at Spotify: The Story So Far

This is the first in a two-part series about Monitoring at Spotify. In this, I’ll be discussing our history, the [...]
Published by John-John Tedro
September 21, 2015
Cassandra: Data-Driven Configuration

Spotify currently runs over 100 production-level Cassandra clusters. We use Cassandra across user-facing features, in our internal monitoring and analytics [...]
Published by Noel Cody
August 27, 2015
Underflow bug

All of us are familiar with overflow bugs. However, sometimes you write code that counts on overflow. This is a [...]
Published by Lukáš Poláček
June 23, 2015
Switching user database on a running system

Introduction All Spotify users are now stored in a Cassandra database instead of Postgres. The final switch was made on [...]
Published by Marcus Vesterlund
March 9, 2015
Understanding the Spotify Web API

Six months ago, when we launched our Web API, we provided twelve endpoints through which developers could retrieve Spotify catalog [...]
Published by Chris Hughes
January 9, 2015
Personalization at Spotify using Cassandra

At Spotify we have have over 60 million active users who have access to a vast music catalog of over 30 million [...]
Published by Kinshuk Mishra and Matt Brown
January 5, 2015
How Spotify Scales Apache Storm

Spotify has built several real-time pipelines using Apache Storm for use cases like ad targeting, music recommendation, and data visualization. Each of these [...]
Published by Kinshuk Mishra
December 19, 2014
Solving MapReduce Performance Problems With Sharded Joins

Sometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in technique. [...]
Published by Noel Cody

« Previous 1 2 3 4 Next »

Data

Sign up for engineering updates