Big Data Processing at Spotify: The Road to Scio (Part 1)
This is the first part of a 2 part blog series. In this series we will talk about Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and [...]
Published by Neville LiThis is the first part of a 2 part blog series. In this series we will talk about Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and [...]
Published by Neville LiWhat’s your name and where are you from? My name is Charlie and I come from the US and grew [...]
Published by Spotify EngineeringEvery day, Spotify users are generating more than 100 billion events. Every event is being generated as a response to [...]
Published by Igor MaravićForward: This blog post accompanies our presentation given at SRECon 2017 in San Francisco. The recording of the talk can be viewed here, [...]
Published by Lynn RootFive years ago, music personalization at Spotify was a tiny team. The team read papers, developed models, wrote data pipelines [...]
Published by Spotify EngineeringIntroduction When you log into Spotify, browse through your Discover Weekly playlist, and play a track, you’re interacting with some of our [...]
Published by Nic CopeWhenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a [...]
Published by Igor MaravićWhenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a [...]
Published by Igor MaravićWhenever a user performs an action in the Spotify client—such as listening to a song or searching for an artist—a [...]
Published by Igor MaravićIntroduction In the previous post we talked about how the Internet finds its way to reach content and users; how Internet relations [...]
Published by dbarrosopIntroduction This is the first part of a series of posts about a project we have been working with for [...]
Published by dbarrosopWhat to Measure? In part 1, we already mentioned a few metrics that should be considered by the load balancer. Success [...]
Published by Lukáš PoláčekLoad Balancing Most Spotify clients connect to our back-end via accesspoint which forwards client requests to other servers. In the picture below, the accesspoint has [...]
Published by Lukáš PoláčekThis is the second part in a series about Monitoring at Spotify. In the previous post I discussed our history of operational [...]
Published by John-John TedroThis is the first in a two-part series about Monitoring at Spotify. In this, I’ll be discussing our history, the [...]
Published by John-John TedroSpotify currently runs over 100 production-level Cassandra clusters. We use Cassandra across user-facing features, in our internal monitoring and analytics [...]
Published by Noel CodyAll of us are familiar with overflow bugs. However, sometimes you write code that counts on overflow. This is a [...]
Published by Lukáš PoláčekIntroduction All Spotify users are now stored in a Cassandra database instead of Postgres. The final switch was made on [...]
Published by Marcus VesterlundSix months ago, when we launched our Web API, we provided twelve endpoints through which developers could retrieve Spotify catalog [...]
Published by Chris HughesAt Spotify we have have over 60 million active users who have access to a vast music catalog of over 30 million [...]
Published by Kinshuk Mishra and Matt BrownSpotify has built several real-time pipelines using Apache Storm for use cases like ad targeting, music recommendation, and data visualization. Each of these [...]
Published by Kinshuk MishraSometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in technique. [...]
Published by Noel CodyFor my master’s thesis, I developed and benchmarked an Apache Cassandra compaction strategy optimized for time series. The result, the [...]
Published by Björn HegerforsAll of our lovely Spotify users generate many terabytes of data every day. All the songs that are listened to, [...]
Published by davidawhiting