Skip to main content
SHARE
Publication

DEDUPKV: A Space-Efficient and High-Performance Key-Value Store via Fine-Grained Deduplication

by Safdar Jamil, Awais Khan, Xubin He, Youngjae Kim
Publication Type
Conference Paper
Book Title
ICS '25: Proceedings of the 39th ACM International Conference on Supercomputing
Publication Date
Page Numbers
580 to 595
Publisher Location
New York, United States of America
Conference Name
ACM International Conference on Supercomputing 2025 (ICS)
Conference Location
Salt Lake City, Utah, United States of America
Conference Sponsor
ACM
Conference Date
-

Log-Structured Merge Tree (LSM-tree) based key-value stores excel in write-intensive environments but suffer from data duplication, consuming up to 49% of storage space in LSM-tree-based key-value store deployments. Traditional solutions like compression and coarse-grained file system-level deduplication introduce overhead or have limited effectiveness. In this study, we propose DedupKV, a fine-grained deduplication framework tailored for LSM-tree, maximizing data reduction efficiency while minimizing write stalls and read overheads. DedupKV features three key innovations: (1) FLUSH-integrated inline deduplication, which removes duplicates during memory-to-storage writes; (2) WAL file-based offline deduplication, repurposing write-ahead logs to avoid double writes; and (3) elastic execution, dynamically balancing inline and offline deduplication based on memory pressure and workload intensity. Additionally, dynamic granularity management reduces deduplication metadata overhead. We implemented these four ideas in RocksDB for the first time and conducted experiments in a Linux environment. Our evaluation shows that WAL file-based offline deduplication and DedupKV outperform BlobDB by 33% and 23%, respectively, in write-heavy workloads, while reducing write amplification by 1.2 ×, 2 ×, and 1.6 × for real KV datasets.