From 21fb36a0781eef0bec541877e1bf7e5fd4ab1e87 Mon Sep 17 00:00:00 2001 From: gilbertchen Date: Fri, 26 Feb 2016 12:36:12 -0500 Subject: [PATCH] Update DESIGN.md --- DESIGN.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/DESIGN.md b/DESIGN.md index 988ecbc..cea6f79 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -19,6 +19,8 @@ before, one can just perform a file lookup via the file storage API using the fi This effectively turn a cloud storage offering only a very limited set of basic file operations into a powerful modern backup backend capable of both block-level and file-level deduplication. More importantly, the absence of a centralized indexing database means that there is no need to implement a distributed locking mechanism on top of the file storage. +By eliminating the chunk indexing database, lock-free duplication not only reduces the code complexity but also makes the deduplication less error prone. Each chunk is saved individually in its own file, and once saved there is no need for modification. Data corruption is therefore less likely to occur because of the immutability of chunk files. Another benefit that comes naturally from lock-free duplication is that when one client creates a new chunk, clients that happen to have the same file will notice that the chunk file already exist and therefore will not upload the chunk. This pushes the deduplication to its highest level -- clients without knowledge of each other can share identical chunks at no extra effort. + There is one problem, though. Deletion of snapshots without an indexing database, when concurrent access is permitted, turns out to be a hard problem. If exclusive access to a file storage by a single client can be guaranteed, the deletion procedure can simply search for