1
0
mirror of https://github.com/gilbertchen/duplicacy synced 2025-12-06 00:03:38 +00:00

Update README.md

This commit is contained in:
gilbertchen
2018-04-06 00:06:48 -04:00
committed by GitHub
parent 2290c4ace0
commit fbaea6e8b1

View File

@@ -1,28 +1,22 @@
# Duplicacy: A lock-free deduplication cloud backup tool
Duplicacy is a new generation cross-platform cloud backup tool based on the idea of [Lock-Free Deduplication](https://github.com/gilbertchen/duplicacy/wiki/Lock-Free-Deduplication). It is the only cloud backup tool that allows multiple computers to back up to the same storage simultaneously without using any locks (thus readily amenable to various cloud storage services).
Duplicacy is a new generation cross-platform cloud backup tool based on the idea of [Lock-Free Deduplication](https://github.com/gilbertchen/duplicacy/wiki/Lock-Free-Deduplication).
The repository hosts source code, design documents, and binary releases of the command line version. There is also a Duplicacy GUI frontend built for Windows and Mac OS X available from https://duplicacy.com.
This repository hosts source code, design documents, and binary releases of the command line version of Duplicacy. There is also a Duplicacy GUI frontend built for Windows and Mac OS X available from https://duplicacy.com.
There is a special edition of Duplicacy developed for VMware vSphere (ESXi) named [Vertical Backup](https://www.verticalbackup.com) that can back up virtual machine files on ESXi to local drives, network or cloud storages.
## Features
Duplicacy currently supports major cloud storage providers (Amazon S3, Google Cloud Storage, Microsoft Azure, Dropbox, Backblaze B2, Google Drive, Microsoft OneDrive, and Hubic) and offers all essential features of a modern backup tool:
There are 3 core advantages of Duplicacy over any other open-source or commercial backup tools:
* Incremental backup: only back up what has been changed
* Full snapshot: although each backup is incremental, it must behave like a full snapshot for easy restore and deletion
* Deduplication: identical files must be stored as one copy (file-level deduplication), and identical parts from different files must be stored as one copy (block-level deduplication)
* Encryption: encrypt not only file contents but also file paths, sizes, times, etc.
* Deletion: every backup can be deleted independently without affecting others
* Concurrent access: multiple clients can back up to the same storage at the same time
* Snapshot migration: all or selected snapshots can be migrated from one storage to another
* Duplicacy is the *only* cloud backup tool that allows multiple computers to back up to the same cloud storage, taking advantage of cross-computer deduplication whenever possible, without direct communication among them. This feature basically turns any cloud storage server supporting only a basic set of file operations into a sophisticated deduplication-aware server.
The key idea of **[Lock-Free Deduplication](https://github.com/gilbertchen/duplicacy/wiki/Lock-Free-Deduplication)** can be summarized as follows:
* Unlike other chunk-based backup tools where chunks are grouped into pack files and a chunk database is used to track which chunks are stored inside each pack file, Duplicacy takes a database-less approach where every chunk is saved independently using its hash as the file name to facilitate quick lookups. The lack of a centralized chunk database not only makes the implementation less error-prone, but also produces a highly maintainable piece of software with plenty of room for development of new features and usability enhancements.
* Use variable-size chunking algorithm to split files into chunks
* Store each chunk in the storage using a file name derived from its hash, and rely on the file system API to manage chunks without using a centralized indexing database
* Apply a *two-step fossil collection* algorithm to remove chunks that become unreferenced after a backup is deleted
* Duplicacy is fast. While the performance wasn't the top-priority design goal, Duplicacy has been shown to outperform other backup tools by significant margins, as indicated by the following results obtained from a [benchmarking experiment](https://github.com/gilbertchen/benchmarking) backing up the [Linux code base](https://github.com/torvalds/linux) using Duplicacy and 3 other open-source backup tools.
[![Comparison of Duplicacy, restic, Attic, duplicity](https://github.com/gilbertchen/duplicacy/blob/master/images/duplicacy_benchmark_speed.png "Comparison of Duplicacy, restic, Attic, duplicity")](https://github.com/gilbertchen/benchmarking)
## Getting Started
@@ -32,41 +26,36 @@ The key idea of **[Lock-Free Deduplication](https://github.com/gilbertchen/dupli
## Storages
With Duplicacy, you can back up files to local or networked drives, SFTP servers, or many cloud storage providers. The following table compares the costs of all cloud storages supported by Duplicacy.
Duplicacy currently provides the following storage backends:
| Type | Storage (monthly) | Upload | Download | API Charge |
|:------------:|:-------------:|:------------------:|:--------------:|:-----------:|
| Amazon S3 | $0.023/GB | free | $0.090/GB | [yes](https://aws.amazon.com/s3/pricing/) |
| Wasabi | $3.99 first 1TB <br> $0.0039/GB additional | free | $0.04/GB | no |
| DigitalOcean Spaces| $5 first 250GB <br> $0.020/GB additional | free | first 1TB free <br> $0.01/GB additional| no |
| Backblaze B2 | 10GB free <br> $0.005/GB | free | 1GB free/day <br> $0.02/GB | [yes](https://www.backblaze.com/b2/b2-transactions-price.html) |
| Google Cloud Storage| $0.026/GB | free |$ 0.12/GB | [yes](https://cloud.google.com/storage/pricing) |
| Google Drive | 15GB free <br> $1.99/100GB <br> $9.99/TB | free | free | no |
| Microsoft Azure | $0.0184/GB | free | free | [yes](https://azure.microsoft.com/en-us/pricing/details/storage/blobs/) |
| Microsoft OneDrive | 5GB free <br> $1.99/50GB <br> $5.83/TB | free | free | no |
| Dropbox | 2GB free <br> $8.25/TB | free | free | no |
| Hubic | 25GB free <br> €1/100GB <br> €5/10TB | free | free | no |
* Local disk
* SFTP
* Dropbox
* Amazon S3
* Wasabi
* DigitalOcean Spaces
* Google Cloud Storage
* Microsoft Azure
* Backblaze B2
* Google Drive
* Microsoft OneDrive
* Hubic
* OpenStack Swift
* WebDAV (under beta testing)
* pcloud (via WebDAV)
* Box.com (via WebDAV)
Please consult the [wiki page](https://github.com/gilbertchen/duplicacy/wiki/Storage-Backends) on how to set up Duplicacy to work with each cloud storage.
It should be noted that their performances vary a lot. A [performance comparison](https://github.com/gilbertchen/cloud-storage-comparison) of these storages measured the running times (in seconds) of backing up and restoring the [Linux code base](https://github.com/torvalds/linux) as follows:
For reference, the following chart shows the running times (in seconds) of backing up the [Linux code base](https://github.com/torvalds/linux) to each of those supported storages:
| Storage | initial backup | 2nd | 3rd | initial restore | 2nd | 3rd |
|:--------------------:|:------:|:----:|:-----:|:----:|:-----:|:----:|
| SFTP | 31.5 | 6.6 | 20.6 | 22.5 | 7.8 | 18.4 |
| Amazon S3 | 41.1 | 5.9 | 21.9 | 27.7 | 7.6 | 23.5 |
| Wasabi | 38.7 | 5.7 | 31.7 | 25.7 | 6.5 | 23.2 |
| DigitalOcean Spaces | 51.6 | 7.1 | 31.7 | 29.3 | 6.4 | 27.6 |
| Backblaze B2 | 106.7 | 24.0 | 88.2 | 67.9 | 14.4 | 39.1 |
| Google Cloud Storage | 76.9 | 11.9 | 33.1 | 39.5 | 9.9 | 26.2 |
| Google Drive | 139.3 | 14.7 | 45.2 | 129.4 | 17.8 | 54.4 |
| Microsoft Azure | 35.0 | 5.4 | 20.4 | 30.7 | 7.1 | 21.5 |
| Microsoft OneDrive | 250.0 | 31.6 | 80.2 | 333.4 | 26.2 | 82.0 |
| Dropbox | 267.2 | 35.8 | 113.7 | 164.0 | 31.6 | 80.3 |
For more details please visit https://github.com/gilbertchen/cloud-storage-comparison.
[![Comparison of Cloud Storages](https://github.com/gilbertchen/duplicacy/blob/master/images/duplicacy_benchmark_cloud.png "Comparison of Cloud Storages")](https://github.com/gilbertchen/cloud-storage-comparison)
## Feature Comparison with Other Backup Tools
For complete bencmarh results please visit https://github.com/gilbertchen/cloud-storage-comparison.
## Comparison with Other Backup Tools
[duplicity](http://duplicity.nongnu.org) works by applying the rsync algorithm (or more specific, the [librsync](https://github.com/librsync/librsync) library)
to find the differences from previous backups and only then uploading the differences. It is the only existing backup tool with extensive cloud support -- the [long list](http://duplicity.nongnu.org/duplicity.1.html#sect7) of storage backends covers almost every cloud provider one can think of. However, duplicity's biggest flaw lies in its incremental model -- a chain of dependent backups starts with a full backup followed by a number of incremental ones, and ends when another full backup is uploaded. Deleting one backup will render useless all the subsequent backups on the same chain. Periodic full backups are required, in order to make previous backups disposable.
@@ -101,30 +90,6 @@ The following table compares the feature lists of all these backup tools:
| Cloud Support | Extensive | No | No | No | S3, B2, OpenStack | **S3, GCS, Azure, Dropbox, Backblaze B2, Google Drive, OneDrive, and Hubic**|
| Snapshot Migration | No | No | No | No | No | **Yes** |
## Performance Comparison with Other Backup Tools
Duplicacy is not only more feature-rich but also faster than other backup tools. The following table lists the running times in seconds of backing up the [Linux code base](https://github.com/torvalds/linux) using Duplicacy and 3 other tools. Clearly Duplicacy is the fastest by a significant margin.
| | Duplicacy | restic | Attic | duplicity |
|:------------------:|:----------------:|:----------:|:----------:|:-----------:|
| Initial backup | 13.7 | 20.7 | 26.9 | 44.2 |
| 2nd backup | 4.8 | 8.0 | 15.4 | 19.5 |
| 3rd backup | 6.9 | 11.9 | 19.6 | 29.8 |
| 4th backup | 3.3 | 7.0 | 13.7 | 18.6 |
| 5th backup | 9.9 | 11.4 | 19.9 | 28.0 |
| 6th backup | 3.8 | 8.0 | 16.8 | 22.0 |
| 7th backup | 5.1 | 7.8 | 14.3 | 21.6 |
| 8th backup | 9.5 | 13.5 | 18.3 | 35.0 |
| 9th backup | 4.3 | 9.0 | 15.7 | 24.9 |
| 10th backup | 7.9 | 20.2 | 32.2 | 35.0 |
| 11th backup | 4.6 | 9.1 | 16.8 | 28.1 |
| 12th backup | 7.4 | 12.0 | 21.7 | 37.4 |
For more details and other speed comparison results, please visit https://github.com/gilbertchen/benchmarking. There you can also find test scripts that you can use to run your own experiments.
## License
* Free for personal use or commercial trial