(字数超過のため、和訳せず原文のまま掲載しています)
Since Apache Spark became one of the leading top Apache community project with Scala, Requirement of scala skillset changed drastically at global level. As entire Apache Spark framework has been written in scala from ground, it’s real pleasure to understand beauty of functional Scala DSL with Spark.
This talk is intent to present :
primary data structures (RDD, DataSet, Dataframe) usage in universal large scale data processing with Hbase (Data lake), Hive (Analytical Engine).
We will go through importance of physical data split up techniques such as Coleanse, Partition, Repartition in Scaling TB’s of data with ~17 billions of transactional data as a use case / case study.
Also, We will understand crucial part and very interesting way of understanding parallel & concurrent distributed data processing – tuning memory, cache, Disk I/O, Leaking memory, Internal shuffle, spark executor, spark driver etc.
トークの長さ
40分
発表言語
英語
聴衆の対象
初心者: 分野の事前知識を必要としない
こんな人におすすめ
Scalaにおける関数型プログラミングの基礎と、Javaについて理解している人
Java / Scalaにおける並行プログラミングとマルチスレッドについて理解している人
Big Data及びFast Dataの領域で業務経験のある人、もしくは強い関心がある人
発表者
CHETANKUMAR KHATRI
(Lead - Data Science at Accion labs.)