mirror of
https://github.com/gilbertchen/duplicacy
synced 2025-12-06 00:03:38 +00:00
Update DESIGN.md
This commit is contained in:
@@ -16,7 +16,7 @@ time the rolling hash window is shifted by one byte, thus significantly reducing
|
||||
What is novel about lock-free deduplication is the absence of a centralized indexing database for tracking all existing
|
||||
chunks and for determining which chunks are not needed any more. Instead, to check if a chunk has already been uploaded
|
||||
before, one can just perform a file lookup via the file storage API using the file name derived from the hash of the chunk.
|
||||
This effectively turn a cloud storage offering only a very limited
|
||||
This effectively turns a cloud storage offering only a very limited
|
||||
set of basic file operations into a powerful modern backup backend capable of both block-level and file-level deduplication. More importantly, the absence of a centralized indexing database means that there is no need to implement a distributed locking mechanism on top of the file storage.
|
||||
|
||||
By eliminating the chunk indexing database, lock-free duplication not only reduces the code complexity but also makes the deduplication less error-prone. Each chunk is saved individually in its own file, and once saved there is no need for modification. Data corruption is therefore less likely to occur because of the immutability of chunk files. Another benefit that comes naturally from lock-free duplication is that when one client creates a new chunk, other clients that happen to have the same original file will notice that the chunk already exist and therefore will not upload the same chunk again. This pushes the deduplication to its highest level -- clients without knowledge of each other can share identical chunks with no extra effort.
|
||||
|
||||
Reference in New Issue
Block a user