Spotify Live: From Live to Recording

April 28, 2022 Published by Robert Segura, Senior Engineer

Spotify Live (formerly Spotify Greenroom) is a platform that democratizes live audio streams. Creators are able to create rooms and stream live directly to fans who join the room. They can also interact directly with their fans either by text chatting in the app or bringing them up as speakers to interact by voice. But what about when the room ends? A creator could have had an incredible conversation they might want to preserve or redistribute, but once the room ends, the audio is lost. The biggest feedback we received from our creators was their desire to be able to natively record and distribute the live room chats to their audience after the room has ended. Knowing this, we got to work.

Spotify Live appeals to creators by offering them the ability to record a stream for rerelease as a podcast for fans who could not make the live event. This is a low-friction way creators can create high-quality conversational recordings. Without this feature, creators creating conversational content would either need to record on their own device, compromising the audio quality of the other speakers, or they would need to have every speaker record their own audio and manually sync and mix the audio tracks. In this post, you’ll be walked through how a recording starts, processes, and distributes to the room creator.

Room creation and starting the recording

Once the Host creates the room, they simply select the option to record and provide an email where we can send a copy of the recording. Then voila!, the steps below for recording are initiated (also seen in Figure 1 above):

  1. The room creation call is received by our room microservice, which uses our message broker to stream a room creation event via Google Pub/Sub to our ecosystem of microservices.
  2. One service that is listening for this message is our Recording service. The Recording service is a microservice that receives a message from our event system when a room is created and ended.
  3. Next, the Recording Service connects to our cloud recording API to record our rooms. When the room is created, the Recording service will store the information about the room and send a request to our cloud recording API to begin recording the room’s audio.
requestBody, err := json.Marshal(&CloudRequestBody{
  Cname: RoomId,
  Uid:   recorderUid,
  ClientRequest: CloudRecordStartClientRequest{
    Token: token,
    RecordingConfig: &CloudRecordingConfig{
        ChannelType: cloudChannelType,
        StreamTypes: cloudStreamTypes,
    },
    StorageConfig: &CloudStorageConfig{
        Vendor:         service.StorageConfig.Vendor,
        Region:         service.StorageConfig.Region,
        Bucket:         service.StorageConfig.Bucket,        

AccessKey:      service.StorageConfig.AccessKey,
        SecretKey:      service.StorageConfig.SecretKey,
        FileNamePrefix: append(service.StorageConfig.FileNamePrefix, RoomDir),
    },
  },
})


recordStartUrl := fmt.Sprintf("%s/resourceid/%s/mode/mix/start", service.BaseUrl, resourceId)

recordStarReq, err := http.NewRequest("POST", recordStartUrl, bytes.NewBuffer(requestBody))
if err != nil {  
return
}

service.addHeaders(recordStarReq)

resp, err := client.Do(recordStarReq)
if err != nil {  
return
}

defer resp.Body.Close()

body, err := ioutil.ReadAll(resp.Body)
if err != nil {  
return
}

var response *StartRecordResponse

err = json.Unmarshal(body, &response)
if err != nil {  
return

}
recordId = response.RecordId

return
  1. Cloud recording API acts as a listener, entering the room but invisible to actual users in the room. This listener will stream the raw audio into a storage bucket. The recorder stores this audio in .ts files, and each file will correspond to an interval of audio time. Along with the ts files, we will also have an .m3u8 file that will act as a playlist of sorts, telling us the order in which the ts files should be played.

While the room is running, our listener should be silently recording in the background. But as we have seen in the past, many things can go wrong. Recordings can be corrupted due to intermittent connection issues to the host. Recordings might not have the audio of the full room. We have even seen situations where a recording contains the audio of only one speaker. We discovered that these errors would often occur due to issues in the real-time audio layer. In order to solve this, we implemented various callbacks from the real-time audio service into the Recording Service to handle these events. For example, if we see the real-time audio in a room is stopped and recreated, then we immediately restart the recording routine to make sure we don’t lose any of the recording.

Room ending and delivery of the recording

Figure 2

Once the room is over, the next phase of the recording process begins, including the following six steps as seen on Figure 2:

  1. Clients tell our API service to end the room, which sends a message through Pub/Sub that our room has ended.
  2. Our Recording Service receives the message that the room ends. This starts the process of ending the recording.
  3. Cloud recording API gets notified to remove the listener from the room and finalize any audio files in our storage bucket.
  4. Once the Cloud recording API  has successfully done so, the Recording Service will download the ts and .m3u8 files onto the service. We use the .m3u8 data to verify that we have obtained each of the required ts files.
  5. Then we use FFmpeg to stitch all the audio files in the order described in the playlist file. We also encode our unique roomID into the metadata of our audio. It is here we also have the option to add an intro and/or outro to the recording.
encryptedMetadata, iv := encryptRoomID(service.metadataEncryptionKey, RoomId)

base64IV := base64.StdEncoding.EncodeToString(iv)
base64Metadata := base64.StdEncoding.EncodeToString(encryptedMetadata)

args := []string{
  "-i", inputFile,
  // Discard Corrupted packets
  "-fflags",
  "+discardcorrupt",
  // Strip off an silence at the beginning of the track
  "-af",
  "silenceremove=start_periods=1:start_duration=0.01:start_threshold=0.01",
  // Don't include video information
  "-vn",
  "-movflags",
  "+use_metadata_tags",
  "-metadata",
  fmt.Sprintf("comment=greenRoom:%s:%s", base64IV, base64Metadata),
  finalOutputFile,
}
ffmpegScript := exec.Command("ffmpeg", args...)

err := ffmpegScript.Run()
if err != nil {
  service.logger.LogError(ctx, "ffmpeg error stripping silence: "+err.Error())
  return
}
  1. The process then uploads this newly created .mp4 file back to our S3 storage. Afterwards, we produce a link to the recording and email it directly to the host. 

As the audio space grows and expands, the need to be able to easily record live audio will become an important feature for content creators. Spotify Live puts creators first by offering an easy-to-use recording feature to produce high-quality audio for creators’ live rooms. As Spotify Live grows, we will continue to listen to our creators and community about what they desire within the recording process and how to improve the experience.

The live audio team is actively recruiting engineers for iOS and backend services, as well as engineering managers. If you are interested in working with live audio, whether for recordings or otherwise, check out our open roles!


Tags: