Picky: Efficient and Reproducible Sharing of Large Datasets Using Merkle-Trees

TitlePicky: Efficient and Reproducible Sharing of Large Datasets Using Merkle-Trees
Publication TypeConference Paper
Year of Publication2016
AuthorsHintze, D, Rice, A
Conference Name2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)
Date Published09/2016
Conference LocationLondon, UK
ISBN Number978-1-5090-3432-1
KeywordsComputational modeling, Computers, data access, data distribution, data sharing, file organisation, Indexing, information retrieval, large datasets sharing, Merkle-trees, Metadata, Picky, Ports (Computers), repeatable research, selective download, versioning

There is growing demand for researchers to share datasets in order to allow others to reproduce results or investigate new questions. The most common option is to simply deposit the data online in its entirety. However, this mechanism of distribution becomes impractical as the size of the dataset increases or if the dataset is frequently changing as new data is collected. In this paper we describe Picky, a new Merkle tree based system for sharing large datasets which allows users to download selected portions and to receive incremental updates. We demonstrate the viability of our approach by quantifying its benefit when applied to a number of large datasets used in the networking and measurement community.

Refereed DesignationRefereed