IOStack results and impact

The H2020 IOStack project has come to an end in December 2017 and has had a successful Final Review in Brussels in February 2018.

The project has produced outstanding results in the field of Software Defined Storage technologies to improve Big Data Analytics services. In particular, IOStack has proposed the first Software Defined Storage architecture (FAST’17) including a well defined filter framework in the data plane. We created open source SDS toolkits for Object Storage (Crystal) and Block Storage (Konnector) implementing the filter framework at both ends of the I/O stack.

Storage Filters are a relevant outcome of this project.  We can define a filter as a piece of code that a system administrator can inject into the data plane to perform custom computations on incoming/outgoing I/O requests. Filters can be used to implement computation close to the data (compression, selections, ETLs), data management (caching, prefetching), or even resource management like bandwidth differentiation. 

We demonstrated filter technologies in the server side (Storlets in Object Stores) and in the client side (Block storage filters, Spark File Filters, File System filters). Thanks to computation close to the data, data reduction and data indexing techniques, the project showcased significant performance improvements (50x, 35x, 18x) in data analytics. We outline in D2.4 a list of Key Performance Indicators like data access throughput, application time speedup and communication costs among others.

Another relevant result is related to data analytics over Cloud Object storage repositories. Capacity Stores are becoming the standard infrastructure for data analytics in cloud environments thanks to their scalability and low cost. Our improvements in data connectors (Stocator) and data throughput (filters) will become mainstream in the next years in many different settings. Object Stores augmented with indexing technologies may effectively compete with NoSQL databases in many scenarios of unstructured or semi-structured data with flexible schemas.

The project has three major use cases where we validated our results: Energy data management from smart meters (GridPocket), Automotive sector (Idiada), and Infrastructure providers (Arctur). But the potential scenarios for IOStack technologies go beyond these three ones. 

Let’s enumerate some settings where IOStack results may become very relevant in the next years:

  • Mobile Edge Computing, Smart Cities, IoT: Computation in edge devices is an increasing trend that is completely aligned with the filtering mechanisms on data flows implemented in IOStack. Many devices close to the edge receive data from sensors, that can be filtered there before communication with cloud services. The transparent interception on flows thanks to the data plane simplifies the management of this edge data flows and computations. Smart cities and Internet of Things scenarios are, in this line, ideal candidates for the combination of Cloud technologies and edge computing devices. These technologies will benefit from IOStack contributions in the next years.
  • BioInformatics, GeoSpatial data:  In settings with large unstructured data sets like genomics, bioinformatics, geospatial, or satellite data, the technologies of IOStack (object stores, SDS, computation close to the data, filters) will become pivotal in the next years. Many of these fields ever growing datasets face challenging data management and analytics problems that IOStack is already addressing. Downloading this information will be no longer feasible, and computation close to the data located in cloud object stores will become the prevalent data model in the next years.
  • Digital Video, Image Recognition and Streaming: The multimedia ecosystem is already accounting from the majority of traffic in Internet. In the next years, we will see UltraHD and Immersive media content formats that all require resolutions beyond UHD. This will require expensive computing resources for encoding and transcoding large data streams. Again, IOStack is in a privileged position to offer advanced solutions in this challenging ecosystem.
