COMPASS

Big Data Compression Tool using Attribute-based Signatures

Project Summary: A key challenge in the new era of big data management is to minimize its storage and I/O costs while maintaining its fidelity and supporting its efficient and reliable retrieval as required by the innovating ICT (Information and communications technology) services and smart applications. Most big data is generated continuously that must be ingested in a timely manner to maximize its usefulness. In addition to the raw big data, its associated metadata and indexes used for optimization purposes equally demand tremendous storage that impacts the I/O footprint of data centers. To be impactful and remain competitive organizations and enterprises, therefore, are in need to: (i) store incremental big data in the most efficient manner; and (ii) improve the response time for data analytics and exploration queries. The project aims to investigate the applicability of a revolutionary signature-based compression tool, dubbed COMPASS, that can transform traditional and modern data management systems to efficiently support the new era of ICT applications, which take advantage of data mining and analytics for big data. Specifically, this project will allow us to verify the industrial application potential of the COMPASS research technology/knowhow before continuing with the further development and commercialization of our tool. Existing systems address the aforementioned challenge by using compression techniques (e.g., Google’s snappy, Facebook’s zstd, LZMA, LZ4, GZIP, and 7z) in a coarse-grained fashion, i.e., using a single compression technique for all the stored data in the system while being oblivious of the characteristics of the data. In contrast, COMPASS optimizes data storage by exploiting data characteristics and using multiple compression techniques for different data types. Furthermore, COMPASS can be combined with indexing offering faster access to individual data items through partial decompression.

Details:

Programme PROOF OF CONCEPT FOR TECHNOLOGY / KNOWHOW APPLICATIONS
Proposal Number CONCEPT/0823/0002
Proposal Acronym COMPASS
Funding Research and Innovation Foundation, Cyprus

Paper: Big Data Compression Tool using Attribute-based Signatures

This paper introduces COMPASS, a multiple compression tool utilizing attribute-based signatures. COMPASS exploits K-means clustering to select the best compression scheme for different data subsets in a database. The experimental results show that COMPASS significantly reduces disk space usage compared to monolithic methods.

PDF PPTX