Tag archive: hadoop

sticky
Dec 19, 2014

Solving MapReduce Performance Problems With Sharded Joins

Sometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in...
DataInfrastructure
sticky
Nov 27, 2014

Data Processing with Apache Crunch at Spotify

All of our lovely Spotify users generate many terabytes of data every day. All the songs that are listened...
DataDeveloper ToolsInfrastructure
sticky
May 7, 2013

Snakebite: a pure Python HDFS client

As we all know, Hadoop is great and here at Spotify we are big fans of it. We use it to process data for a...
DataOpen Source
hadoop Archives | Spotify Engineering