We will re-implement Scio* from scratch, step by step, in 15 self-contained monofile mini frameworks. By doing so, we will learn about common patterns in Scala API design, distributed data processing frameworks, Scala-Java interop, and some under the hood optimization. This talk will focus on code and have minimal slides.
*Scio is a Scala API for Apache Beam and Google Cloud Dataflow for unified batch and streaming data processing. It’s used by 300+ developers within Spotify for 1500+ production batch and streaming data pipelines, plus many other companies world wide.
Code for the talk: https://github.com/nevillelyh/scio-deep-dive Scio: https://github.com/spotify/scio
voted / votable