IOStack results and impact

The H2020 IOStack project has come to an end in December 2017 and has had a successful Final Review in Brussels in February 2018.

The project has produced outstanding results in the field of Software Defined Storage technologies to improve Big Data Analytics services. In particular, IOStack has proposed the first Software Defined Storage architecture (FAST’17) including a well defined filter framework in the data plane. We created open source SDS toolkits for Object Storage (Crystal) and Block Storage (Konnector) implementing the filter framework at both ends of the I/O stack.

Storage Filters are a relevant outcome of this project.  We can define a filter as a piece of code that a system administrator can inject into the data plane to perform custom computations on incoming/outgoing I/O requests. Filters can be used to implement computation close to the data (compression, selections, ETLs), data management (caching, prefetching), or even resource management like bandwidth differentiation. 

We demonstrated filter technologies in the server side (Storlets in Object Stores) and in the client side (Block storage filters, Spark File Filters, File System filters). Thanks to computation close to the data, data reduction and data indexing techniques, the project showcased significant performance improvements (50x, 35x, 18x) in data analytics. We outline in D2.4 a list of Key Performance Indicators like data access throughput, application time speedup and communication costs among others.

Another relevant result is related to data analytics over Cloud Object storage repositories. Capacity Stores are becoming the standard infrastructure for data analytics in cloud environments thanks to their scalability and low cost. Our improvements in data connectors (Stocator) and data throughput (filters) will become mainstream in the next years in many different settings. Object Stores augmented with indexing technologies may effectively compete with NoSQL databases in many scenarios of unstructured or semi-structured data with flexible schemas.

The project has three major use cases where we validated our results: Energy data management from smart meters (GridPocket), Automotive sector (Idiada), and Infrastructure providers (Arctur). But the potential scenarios for IOStack technologies go beyond these three ones. 

Let’s enumerate some settings where IOStack results may become very relevant in the next years:

  • Mobile Edge Computing, Smart Cities, IoT: Computation in edge devices is an increasing trend that is completely aligned with the filtering mechanisms on data flows implemented in IOStack. Many devices close to the edge receive data from sensors, that can be filtered there before communication with cloud services. The transparent interception on flows thanks to the data plane simplifies the management of this edge data flows and computations. Smart cities and Internet of Things scenarios are, in this line, ideal candidates for the combination of Cloud technologies and edge computing devices. These technologies will benefit from IOStack contributions in the next years.
  • BioInformatics, GeoSpatial data:  In settings with large unstructured data sets like genomics, bioinformatics, geospatial, or satellite data, the technologies of IOStack (object stores, SDS, computation close to the data, filters) will become pivotal in the next years. Many of these fields ever growing datasets face challenging data management and analytics problems that IOStack is already addressing. Downloading this information will be no longer feasible, and computation close to the data located in cloud object stores will become the prevalent data model in the next years.
  • Digital Video, Image Recognition and Streaming: The multimedia ecosystem is already accounting from the majority of traffic in Internet. In the next years, we will see UltraHD and Immersive media content formats that all require resolutions beyond UHD. This will require expensive computing resources for encoding and transcoding large data streams. Again, IOStack is in a privileged position to offer advanced solutions in this challenging ecosystem.

Horizon 2020: Project SUNFISH

Horizon 2020 includes a large group of innovative projects with one common objective: to develop infrastructures, methods and tools for high performance, adaptive cloud applications and services that go beyond the current capabilities, strengthening the competitive position of the European industry.
Within the scope of federated cloud services, H2020 SUNFISH project offers a service to federate private and public clouds, enabling them to exchange data and services in a secure and controlled manner, based on a “democratic” governance model: no federation member rules on others. More in details, SUNFISH conceives, designes and implementes Federation-as-a-Service (FaaS), a secure-by-design cloud interoperability solution based on blockchain technology.
This service is realised via a software platform, named “SUNFISH Platform”, whose forming components represent essential parts of the overall functioning. The SUNFISH Platform is a modular software solution that enables the dynamic and secure creation of cloud federations and their management.
Its main functionalities are:
  • Dynamic cloud management. A dynamic federation of clouds and their related services, with optimal service level and workload;
  • Democratic governance. An innovative cloud federation governance supporting trustless coalitions, as none of the federated organisations rules on the others, thanks to the service ledger empowering the governance;
  • Data security. Advanced, innovative privacy-preserving services enforcing access control and monitoring to protect provisioning of federated services and sharing of data.

The SUNFISH project has developed three concrete deployment examples, through three different use cases. These use cases are based on data and infrastructures made available by 3 public sector Consortium partners, which belong respectively to Italy, Malta and the UK.

For more information, please give a look to CloudWatch's Service Offer Catalogue.

IOStack exploitation: Use cases

We are happy to announce that the IOStack toolkit prototype is starting its path towards exploitation. We invite you to see the videos from our use-case companies to discover how IOStack is making their daily operation with analytics frameworks more simple and efficient.

 

GridPocket Use case: Smart Energy Grid

 

Idiada Use case: Automotive Company

 

OpenIO collaborates with IOStack

We are now entering the 3rd year of the project. It is time to disseminate the outcomes of IOStack and let the European Big Data and storage ecosystem to benefit from them.

This is the case of a new collaborator with the IOStack project: OpenIO. OpenIO is a French company specialized in software-defined storage and scalability challenges. OpenIO converges an application grid to run computations on top of an object storage platform. But wait, there is even more. They adopt an open source model; this means that the code of OpenIO is available to the open source community!

As you can see, there are many points in common with our objectives in the project. But now that OpenIO is seriously approaching Big Data workloads, our collaboration on object storage technologies was unavoidable (e.g., Crystal, Stocator). For a great explanation on this story, we refer to the post of OpenIO's blog.

This can be the beginning of a beautiful partnership :)

Welcome to IOStack

IOStack is an European research project funded by the H2020 initiative. The project is a consortium of several European industrial and research partners including University of Rovira i Virgili, IBM, MPSTOR, Eurocom and the BARCELONA SUPERCOMPUTING CENTER. A number of the partners will act as users of the system including IDIADA (Automotive), GRIDPOCKET (IOT) and ARCTUR (HPC).

IOStack is designed for deployments of data analytics in virtual environments. Virtual environments allow very flexible deployment of analytics frameworks but have less performance than bare metal deployments. IOStack will focus on how Software Defined Storage can use its knowledge of the cloud topology and the real time dynamic characteristics of the cloud to deploy analytics jobs that will run and complete within guaranteed SLAs and timescales.

The SDS knowledge of the static topology allows compute and storage locality to be optimized, understanding the dynamic load of the cloud allows a further optimization of which resources, paths and devices should be used in a given workload.

 

Read more: Welcome to IOStack

You are here: Home News