This is a candidate session. ScalaMatsuri selects sessions using as a reference participants voting later.

日本語

Start playing with Spark Structured Streaming locally and build up a stream processing pipeline

Spark is a framework for parallel stream processing, however, that probably sounds like something difficult to work with even if you are interested. In reality, Spark is a data processing tool which allows you to execute it on your local machine as well as a large-scale cluster. At the same time, Scala users will feel familiar with it as it is OSS in Scala. Thus, you can start it quickly with your local machine to run a data processing pipeline in Scala and then keep building upon it. Moreover, processing structured data types like string and numeral types, as well as stream processing are both possible. This session will introduce how to build up a stream processing pipeline easily using Spark’s new feature, Spark Structured Streaming, which allows you to concisely write stream data processing as structured data.

Session length: 40 minutes
Language of the presentation: Japanese
Target audience: Beginner: No need to have prior knowledge
Who is your session intended to: People who are interested in Spark and want to learn basics
People who want to know practical use cases of Spark
People who want to build a stream processing system quickly
Speaker: Sotaro Kimura (DWANGO Co., Ltd.)

Candidate sessions