H2020 IOStack

Skip to content

Download details

Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop

Yosef Moatti, Eran Rom, Raúl Gracia Tinedo, Dalit Naor, Doron Chen, Josep Sampé, Marc Sánchez Artigas, Pedro García López, Filip Gluszak, Eric Deschdt, Francesco Pace, Daniele Venzano and Pietro Michiardi

IEEE International Conference on Data Engineering (ICDE '17)

Extracting value from data stored in object stores,such as OpenStack Swift and Amazon S3, can be problematicin common scenarios where analytics frameworks and objectstores run in physically disaggregated clusters. One of the mainproblems is that analytics frameworks must ingest large amountsof data from the object store prior to the actual computation;this incurs a significant resources and performance overhead. Toovercome this problem, we present Scoop. Scoop enables analyticsframeworks to benefit from the computational resources of objectstores to optimize the execution of analytics jobs. Scoop achievesthis by enabling the addition of ETL-type actions to the dataupload path and by offloading querying functions to the objectstore through a rich and extensible active object storage layer. Asa proof-of-concept, Scoop enables Apache Spark SQL selectionsand projections to be executed close to the data in OpenStackSwift for accelerating analytics workloads of a smart energy gridcompany (GridPocket). Our experiments in a 63-machine clusterwith real IoT data and SQL queries from GridPocket show thatScoop exhibits query execution times up to 30x faster than thetraditional “ingest-then-compute” approach.

Data
Version
Size
Downloads	0.00
Language
License
Author
Website
Price
Created	2018-02-15
Created by	Super User
Changed
Changed by
This is only a simple document without a file.

Back

Powered by jDownloads

You are here: Home

Publications

Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop

Top

Skip to content