Files:
pdf.png SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks 1.0 HOT

Download

Raúl Gracia-Tinedo, Danny Harnik, Dalit Naor, Dmitry Sotnikov, Sivan Toledo and Aviad Zuck

13th USENIX Conference on File and Storage Technologies (FAST 15)

Storage system benchmarks either use samples of proprietary data or synthesize artificial data in simple ways (such as using zeros or random data). However, many storage systems behave completely differently on such artificial data than they do on real-world data. This is the case with systems that include data reduction techniques, such as compression and/or deduplication.

To address this problem, we propose a benchmarking methodology called mimicking and apply it in the domain of data compression. Our methodology is based on characterizing the properties of real data that influence the performance of compressors. Then, we use these characterizations to generate new synthetic data that mimics the real one in many aspects of compression. Unlike current solutions that only address the compression ratio of data, mimicking is flexible enough to also emulate compression times and data heterogeneity. We show that these properties matter to the system’s performance.

In our implementation, called SDGen, characterizations take at most 2:5KB per data chunk (e.g., 64KB) and can be used to efficiently share benchmarking data in a highly anonymized fashion; sharing it carries few or no privacy concerns. We evaluated our data generator’s accuracy on compressibility and compression times using real-world datasets and multiple compressors (lz4, zlib, bzip2 and lzma). As a proof-of-concept, we integrated SDGen as a content generation layer in two popular benchmarks (LinkBench and Impressions).



Created
Size
Downloads
2015-02-25
872.21 KB
710.00
pdf.png The Power of Swarming in Personal Clouds Under Bandwidth Budget HOT

Download

Rahma Chaabouni, Marc Sánchez-Artigas and Pedro García-López

Elsevier Journal of Network and Computer Applications

Users are unceasingly relying on personal clouds (like Dropbox, Box, etc) to store, edit and retrieve their files stored in remote servers. These systems generally follow a client–server model to distribute the files to end-users. This means that they require a huge amount of bandwidth to meet the requirements of their clients. Personal clouds with limited bandwidth budget can benefit from the upload speed of the clients interested in the same content to improve the quality of service. This can be done by introducing a peer-to-peer protocol, BitTorrent for instance, when the load on a certain content becomes high. The main challenge is to decide when to switch to BitTorrent and how to allocate the cloud's available bandwidth to the different clients. In this paper, we propose an algorithm for the allocation of the cloud's bandwidth. Based on the current load and the predefined quality of service constraints, the algorithm identifies the most suitable protocol for each swarm and provides the corresponding bandwidth allocation. We validate the algorithm using a real trace of the Ubuntu One system and the results show important gains in the download times experienced by the clients.



Created
Size
Downloads
2016-04-14
3.8 MB
401.00
pdf.png Vertigo: Programmable Micro-controllers for Software-Defined Object Storage HOT

Download

J. Sampé, M. Sánchez-Artigas and P. García-López

IEEE International Conference on Cloud Computing (IEEE CLOUD'16)

Software-defined storage (SDS) aims to minimize the complexity of data management in the Cloud. SDS decouples the control plane from the data plane and simplifies the management of the storage system via automated storage policy enforcement. In this paper, we propose a novel SDS framework for Object Storage that allows to decentralize policy enforcement through the deployment of per-object management policies in the storage nodes. As in active storage systems, we leverage the underutilized CPU time in the storage nodes. But our framework goes one step further. It provides a new management abstraction called micro-controllers which operate on objects depending on their state and content, thereby permitting the implementation of sophisticated management policies, such as the automated deletion of an object based on its access history, and even allowing the orchestration of active storage tasks.

Our SDS system avoids the massive interception of data flows by moving that logic to the appropriate objects. Furthermore, our extensible model simplifies the customization of Object Storage services. We present in the validation several interesting use cases such as automated deletion, content level access control, and Web prefetching. 



Created
Size
Downloads
2016-06-13
255.61 KB
140.00
pdf.png Understanding Data Sharing in Private Personal Clouds HOT

Download

R. Gracia-Tinedo, P. García-López, A. Gómez and A. Illana

IEEE International Conference on Cloud Computing (IEEE CLOUD'16)

Data sharing in Personal Clouds blurs the lines between on-line storage and content distribution with a strong social component. Such social information may be exploited by researchers to devise optimized data management techniques for Personal Clouds. Unfortunately, due their proprietary nature, data sharing is one of the least studied facets of these systems.

In this work, we present the first study of data sharing in a private Personal Cloud. Concretely, we contribute a dataset collected at the metadata back-end of NEC: an enterprise oriented Personal Cloud. First, our analysis provides a deep inspection of the storage layer of NEC, comparing it with a well-known public vendor (UbuntuOne). Second, we study the social structure of NEC user communities, as well as the storage characteristics of user sharing links via multiplex network techniques.

Finally, we discuss a battery of data management optimizations for NEC derived from our findings, which may be of independent interest for other similar systems. Our proposals include content distribution, caching and data placement. We believe that both our study and dataset will foster further research in this field.



Created
Size
Downloads
2016-06-13
1.26 MB
154.00
pdf.png IOStack: Software-Defined Object Storage HOT

Download

Raúl Gracia-Tinedo, Pedro García-López, Marc Sánchez-Artigas, Josep Sampé, Yosef Moatti, Eran Rom, Dalit Naor, Ramon Nou, Toni Cortés, William Oppermann and Pietro Michiardi

IEEE Internet Computing

As the complexity and scale of cloud storage systems grow, software-defined storage (SDS) has become a prime candidate to simplify cloud storage management. In this work, the authors present IOStack: the first SDS architecture for object stores (such as OpenStack Swift). At the control plane, administrators provision SDS services to tenants according to policies expressed via a highlevel DSL. At the data plane, IOStack helps build a variety of filters, ranging from arbitrary computations on objects to data management mechanisms. Experiments illustrate that IOStack enables easy and effective policy-based provisioning, which can significantly improve the operation of a multitenant object store. 



Created
Size
Downloads
2016-06-13
1.47 MB
367.00
pdf.png Dissecting UbuntuOne: Autopsy of a Global-scale Personal Cloud Back-end HOT

Download

Raúl Gracia-Tinedo, Yongchao Tian, Josep Sampé, Hamza Harkous, John Lenton, Pedro García-López, Marc Sánchez-Artigas and Marko Vukolic

ACM Conference on Internet Measurement Conference (IMC '15)

Personal Cloud services, such as Dropbox or Box, have been widely adopted by users. Unfortunately, very little is known about the internal operation and general characteristics of Personal Clouds since they are proprietary services.

In this paper, we focus on understanding the nature of Personal Clouds by presenting the internal structure and a measurement study of UbuntuOne (U1). We first detail the U1 architecture, core components involved in the U1 metadata service hosted in the datacenter of Canonical, as well as the interactions of U$1$ with Amazon S3 to outsource data storage. To our knowledge, this is the first research work to describe the internals of a large-scale Personal Cloud.

Second, by means of tracing the U1 servers, we provide an extensive analysis of its back-end activity for one month. Our analysis includes the study of the storage workload, the user behavior and the performance of the U1 metadata store. Moreover, based on our analysis, we suggest improvements to U1 that can also benefit similar Personal Cloud systems.

Finally, we contribute our dataset to the community, which is the first to contain the back-end activity of a large-scale Personal Cloud. We believe that our dataset provides unique opportunities for extending research in the field.



Created
Size
Downloads
2016-06-13
2.61 MB
322.00
pdf.png Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service HOT

Download

Francesco Pace, Marco Milanesio, Daniele Venzano, Damiano Carra, Pietro Michiardi

IEEE International Conference on Cloud Computing (IEEE CLOUD'16)

An increasing number of Analytics-as-a-Service solutions has recently seen the light, in the landscape of cloud-based services. These services allow flexible composition of compute and storage components, that create powerful data ingestion and processing pipelines. This work is a first attempt at an experimental evaluation of analytic application performance executed using a wide range of storage service configurations. We present an intuitive notion of data locality, that we use as a proxy to rank different service compositions in terms of expected performance. Through an empirical analysis, we dissect the performance achieved by analytic workloads and unveil problems due to the impedance mismatch that arise in some configurations. Our work paves the way to a better understanding of modern cloud-based analytic services and their performance, both for its end-users and their providers.

 



Created
Size
Downloads
2016-06-13
532.01 KB
342.00
pdf.png Crystal: Software-Defined Storage for Multi-tenant Object Stores

Raúl Gracia-Tinedo, Josep Sampé, Edgar-Zamora, Marc Sánchez-Artigas , Pedro García-López, Yosef Moatti and Eran Rom

USENIX Conference on File and Storage Technologies (FAST '17)

Object stores are becoming pervasive due to their scalability and simplicity. Their broad adoption, however, contrasts with their rigidity for handling heterogeneous workloads and applications with evolving requirements, which prevents the adaptation of the system to such varied needs. In this work, we present Crystal, the first Software-Defined Storage (SDS) architecture whose core objective is to efficiently support  multi-tenancy in object stores. Crystal adds a filtering abstraction at the data plane and exposes it to the control plane to enable high-level policies at the tenant, container and object granularities. Crystal translates these policies into a set of distributed controllers that can orchestrate filters at the data plane based on real-time workload information.



Created
Size
Downloads
2017-01-30

0.00
pdf.png Improving the QoE in Personal Clouds with Cross-Swarm Bundling HOT

Download

Rahma ChaabouniMarc Sánchez-ArtigasAla Chaabouni, Pedro García-López

IEEE 41st Conference on Local Computer Networks (IEEE LCN'16)

Personal cloud storage systems, like Dropbox, are revolutionizing the way people think about and access their files. As the prevailing model, these systems use unicast to push file changes to each of the “unsynced” devices. And as a result, they transmit multiple times the same information, once per unsynced device. This puts an unnecessary strain on outgoing bandwidth at the datacenters. One way to address this is to leverage P2P-like content distribution to benefit from user resources at the edges of the Internet.

Although protocols like BitTorrent have proven to be effective in this scenario, we go a step further in this work and propose cross-swarm bundling as a mechanism for file distribution. One key contribution of this work is that, instead of using bundling as means to extend the lifetime of swarms, we show that it can be useful to improve the Quality of Experience (QoE). We validate our proposal using a trace of Ubuntu One, a real personal cloud system, obtaining significant improvements on the QoE levels.



Created
Size
Downloads
2017-01-30
483.28 KB
131.00
pdf.png NGDBSCAN: Scalable DensityBased Clustering for Arbitrary Data HOT

Download

Alessandro Lulli, Matteo Dell’Amico, Pietro Michiardi, Laura Ricci

Proceedings of the VLDB Endowment (VLDB '16)

We present NG-DBSCAN, an approximate density-based clustering algorithm that operates on arbitrary data and any symmetric distance measure. The distributed design of our algorithm makes it scalable to very large datasets; its approximate nature makes it fast, yet capable of producing high quality clustering results. We provide a detailed overview of the steps of NG-DBSCAN, together with their analysis. Our results, obtained through an extensive experimental campaign with real and synthetic data, substantiate our claims about NG-DBSCAN’s performance and scalability.



Created
Size
Downloads
2017-01-30
1.08 MB
124.00
pdf.png GivingWings to Your Data: A First Experience of Personal Cloud Interoperability

Raúl Gracia-Tinedo, Cristian Cotes, Edgar Zamora-Gómez, Genís Ortiz, Adrián Moreno-Martínez, Marc Sánchez-Artigas, Pedro García-López, Raquel Sánchez, Alberto Gómez and Anastasio Illiana

Elsevier Future Generation Computer Systems (2017)

Personal Clouds are becoming increasingly popular storage services for end-users and organizations. However, the competition among Personal Clouds, their proprietary nature and the heterogeneity of synchronization protocols have led to a complete lack of interoperability among them. Regrettably, this situation impedes that users share data transparently across multiple providers. Even worse, the lack of interoperability has associated serious risks, such as vendor lock-in, in which users get trapped in a single provider due to the cost of switching to another one.

In thiswork,we contribute DataWings: The first interoperability protocol for Personal Clouds. DataWings consists of an authentication management protocol and a storage API for file storage, synchronization and sharing that adhere to the current authentication (OAuth) and REST standards, respectively. Moreover, we demonstrate the feasibility of DataWings by implementing the protocol in various providers (NEC, StackSync, eyeOS) and performing a real deployment evaluated with real trace replays of production systems (UbuntuOne, NEC). To our knowledge, this is the first real-world experience of Personal Cloud interoperability. Our experiments provide new insights on the performance implications that different types of user activity and the underlying sharing network topology have on the implementation of our protocol. We conclude that DataWings is flexible enough to leverage interoperability for heterogeneous Personal Clouds, opening the door for a broader adoption by other vendors.



Created
Size
Downloads
2017-01-30

0.00
pdf.png Oblivious RAM as a Substrate for Cloud Storage -- The Leakage Challenge Ahead HOT

Download

Marc Sánchez-Artigas

ACM Cloud Computing Security Workshop (ACM CCSW '16)

Oblivious RAM (ORAM) is a well-established technology to hide data access patterns from an untrusted storage system. Although research in ORAM has been spurred in the last few years with the irruption of cloud computing, it is still unclear whether ORAM is ready for the cloud. As we demonstrate in this short paper, there are still some important hurdles to be overcome. One of those is the standard block-based ORAM interface, which can become a timing side-channel when used as a substrate to implement higher level abstractions such as filesystems, personal storage services, etc., typically found in the cloud. We analyze this form of leakage and discuss some possible solutions to this problem, concluding that thwarting it in an efficient manner calls for further research.



Created
Size
Downloads
2017-01-30
521.43 KB
152.00
You are here: Home Publications