Tag archive: hadoop
Dec 19, 2014
Solving MapReduce Performance Problems With Sharded Joins
Sometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in...
Nov 27, 2014
Data Processing with Apache Crunch at Spotify
All of our lovely Spotify users generate many terabytes of data every day. All the songs that are listened...
May 7, 2013
Snakebite: a pure Python HDFS client
As we all know, Hadoop is great and here at Spotify we are big fans of it. We use it to process data for a...