The Web contains the most comprehensive source of data available, however, not all of the data is easily accessible. In this lightning talk, I will cover different techniques used to extract web data by using structured data such as microdata, json-ld and social media formats. Moreover, I will present the different Scala libraries used to parse the different structured data types. This talk has two aims:
To highlight the importance of the data collection phase by showcasing original extraction techniques of web data
To present the benefits of using Scala to deal with such a sensitive task to guarantee fault-tolerance and reliability of the collection pipeline
Session length
10 minutes
Language of the presentation
English
Target audience
Intermediate: Requires a basic knowledge of the area
Who is your session intended to
People who are interested in web data collection techniques
Speaker
Kevin Eid
(Software Engineer at Zalando Ireland Ltd)