diff --git a/Back-up-to-multiple-storages.md b/Back-up-to-multiple-storages.md index 6e5c84a..05bbee5 100644 --- a/Back-up-to-multiple-storages.md +++ b/Back-up-to-multiple-storages.md @@ -1,3 +1,98 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +There is always a danger of losing all your data if it is stored only in one storage. It is therefore highly recommended to back up your data to at least two different storage providers. Duplicacy provides a unique tool to make this backup redundancy solution super easy. -Your page is available at: https://forum.duplicacy.com/t/back-up-to-multiple-storages/1075 \ No newline at end of file +When a repository is initialized you always provide the url to the default storage, which will be initialized if it hasn't been. (:bulb: Even though you are not told directly, this storage has the name `default` for easy access. ) + +``` +cd /path/to/repository +duplicacy init repository_id onsite_storage_url +``` + +You can add additional storage providers to this repository by running the [[add]] +The first argument to `add` is the name of the new storage that can be used by other commands (instead of the normal `default`). + +``` +# add an additional storage named `offsite_storage`: + +duplicacy add offsite_storage repository_id offsite_storage_url +``` + +Now when you run the [[backup]] command, by default the backup will be stored to the default storage: + +``` +duplicacy backup +``` + +Therefore, when you want to backup to the new storage, you have to specifically select it when running `backup` as such: + +``` +duplicacy backup -storage offsite_storage +``` + +This works, but is not the best practice for two reasons: +First, you are running the backup twice: once for the `default` storage and the second time for the `offsite_storage` storage. Thus consuming double the CPU and disk resources and taking longer times for what is _a single_ redundant backup. +Second, if some files change between these two backup commands (eg.: you edit the name of a picture, or the rating of a song) then you would get two different backups, making the management of backups on these two storage a bit more complex. + +--- + +The recommended way is to use the [[copy]] command to `copy` from the `default` storage to the additional storage (`offsite_storage`). This way, you'll always get **identical** backups on both storage providers: + +``` +duplicacy copy -from default -to offsite_storage +``` + +Of course you may be able to use third-party tools, such as rsync or rclone, to copy the content of one storage to another (:grey_exclamation: in this case don't forget about using `--\bit-identical` as explained in the [[copy]] command details. +But compared with rsync/rclone, the `copy` command can be used to copy only a selected set of revisions instead of everything. Moreover, if two storage are set up differently (such as when one is encrypted and the other is not) then the `copy` command is your only choice. + +--- + +It is also possible to run the `copy` command directly on your onsite storage server. All you need to do is to create a dummy repository there, and then initialize it with the same default storage (but as a local disk) and add the same additional storage: + +``` +# log in to your local storage server +mkdir -p \path\to\dummy\repository +cd \path\to\dummy\repository +duplicacy init repository_id onsite_storage_url +duplicacy add -copy default offsite_storage --bit-identical repository_id offsite_storage_url +duplicacy copy -from default -to offsite_storage +``` + +This not only frees your work computer from running the `copy` command, but also speeds up the copy process, since now Duplicacy can read chunks from a local disk +```(local_server -> router -> offsite_storage)``` +instead of over the network +```(local server -> router -> your computer -> router -> offsite_storage)```. + +### Multiple cloud storage without a local storage + +If you're backing up to two cloud storage providers without a local storage, issuing a [[copy]] command between two cloud storage will cause all data to be downloaded from one cloud storage, and uploaded to the other. + +This can be slow, and may incur extra download costs. +To avoid this - while maintaining identical backups on both storage destinations - you can add the destination storage twice, with two different snapshot ids. + +One is used to issue "direct" (you can also see this snapshot id as _dummy_ -- read on) backups to the destination cloud storage, and the other is used to `copy` the _real_ backups from the source storage to the destination storage. + +This should work better because a "direct" (_dummy_) backup should hopefully have many duplicate chunks with the copied (_real_) backup performed later by a the `copy` operation (if there are files changes between the direct backup and the copy). + +Since the upload of files to the second storage is done in the _backup to dummy snapshot_ instead of in the _copy to real snapshot_ , when the copy command is run only the (very few) chunks modified between the backups will have to be downloaded from the first storage thus significantly reducing the amount of traffic needed for download. + +(:bulb: this trick is based on the knowledge that most storage providers offer free upload and only the download costs money, hence you should check if this is the case for your providers as well!) + +``` +duplicacy init my-backups --storage-name backblaze b2://bucket +duplicacy add -copy backblaze --bit-identical wasabi_real_storage my-backups wasabi://bucket # used for copying the real backups +duplicacy add -copy backblaze --bit-identical wasabi_dummy_storage my-backups_dummy wasabi://bucket # used for direct/dummy backup +duplicacy backup -storage backblaze +duplicacy backup -storage wasabi_dummy_storage +duplicacy copy -from backblaze -to wasabi_real_storage +``` + +### Pruning +It is worth mentioning that the `copy` command is non-destructive, so pruned data from one storage will not be automatically pruned on the copy. + +Example: `duplicacy copy -from onsite -to offsite` + +For a system running regular [[copy]] and [[prune]] operations, the following scenarios are possible: + +- If pruning onsite only, offsite storage will never be pruned. +- If onsite pruning is equal to offsite pruning, this is perfectly fine. +- If onsite pruning is more aggressive than offsite pruning, this would work (but is not a great idea). +- If onsite pruning is less aggressive than offsite pruning, this would work (but be inefficient to keep copying data that will be imminently pruned). If you wanted to keep the offsite storage lighter than onsite you would need to use specific revision numbers during copy. \ No newline at end of file diff --git a/Backing-Up-Large-Datasets-to-Both-Local-and-Backblaze-B2-Destinations-Using-Duplicacy-CLI-on-Linux.md b/Backing-Up-Large-Datasets-to-Both-Local-and-Backblaze-B2-Destinations-Using-Duplicacy-CLI-on-Linux.md index e111373..be98f8c 100644 --- a/Backing-Up-Large-Datasets-to-Both-Local-and-Backblaze-B2-Destinations-Using-Duplicacy-CLI-on-Linux.md +++ b/Backing-Up-Large-Datasets-to-Both-Local-and-Backblaze-B2-Destinations-Using-Duplicacy-CLI-on-Linux.md @@ -1,3 +1,194 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Backblaze B2 has become a popular cost-effective online storage mechanism, and is typically less expensive than competing services such as Amazon S3. + +This wiki describes how to take a large amount of data and back it up to both a local backup, such as an external hard drive, and a Backblaze B2 account using the CLI on Linux. + +The first step is to create a Backblaze account, and sign up for B2 storage. You will receive a B2 Account ID and a B2 Application Key from Backblaze. Next, from Backblaze’s website, create a B2 bucket. The name for the bucket must be unique across all buckets by all Backblaze B2 users. Substitute this bucket name for the placeholder MY-B2-BUCKET-NAME in this wiki. (Backblaze B2 buckets only allow alphanumeric characters and hyphens.) + +And last, if you’re going to be backing up more than the free amount of B2 storage (10 GB as of this writing), then on Backblaze’s website you will need to go to B2 Cloud Storage -> Caps and Alerts, and adjust the maximum daily storage cap. As of this writing (2018-Jun) the pricing for Backblaze B2 storage is $0.005/GB/month. Assuming 30 days/month, this amounts to $0.1667/TB/day, so a cap of $2 per day in storage cost allows for up to 12 TB. + +The current download costs are $0.01/GB past 1 GB per day, so the download caps may also need to be adjusted when restoring data, for instance. + +Identify the directory to be backed up: + +`[root@mycomputer ~]# cd /path/to/my/data` +`[root@mycomputer data]# pwd` +`/path/to/my/data` + +Let’s see how much data is to be backed up: + +`[root@mycomputer data]# du -s -h` +`3.4T` + +(Your results will reflect the amount of data contained under your current directory, which is likely to be different than this sample amount.) + +The first step will be to initialize the duplicacy backups at the directory to be backed up (“repository” in duplicacy terminology). The “-e” option indicates that this data will be encrypted with a password, so enter (and re-enter) the desired encryption password when prompted with the duplicacy init command. Assuming a destination for the backed up data is “/path/to/local/backup/destination”, you would issue the following command: + +`[root@mycomputer data]# duplicacy init -e data_backup /path/to/local/backup/location` +`Enter storage password for /path/to/local/backup/location /:*********************************` +`Re-enter storage password:*********************************` +`/path/to/my/data will be backed up to /path/to/local/backup/location with id data_backup` + +`[root@mycomputer data]# cat .duplicacy/preferences` +`[` + `{` + `"name": "default",` + `"id": "data_backup",` + `"storage": "/path/to/local/backup/location/",` + `"encrypted": true,` + `"no_backup": false,` + `"no_restore": false,` + `"no_save_password": false` + `}` +`]` + + +Now change the name “default” to something more descriptive. This will be the locally-connected backup (the external hard drive), so rename it to describe exactly what this storage is: + +`[root@mycomputer data]# perl -pi.bak -e 's/default/my_external_hard_drive_backup/' .duplicacy/preferences` + +The preceding command uses a Perl one-liner to substitute text in-place in a file. The command also creates a backup file, .duplicacy/preferences.bak, that contains the original preferences file (in case something goes wrong here). + +`[root@mycomputer data]# cat .duplicacy/preferences` + +`[` + `{` + `"name": "my_external_hard_drive_backup",` + `"id": "data_backup",` + `"storage": "/path/to/local/backup/location/",` + `"encrypted": true,` + `"no_backup": false,` + `"no_restore": false,` + `"no_save_password": false` + `}` +`]` + +Now add a second storage location for the Backblaze B2 bucket: + + +`[root@mycomputer data]# duplicacy add -e backblaze_b2_data data_backup b2://MY-B2-BUCKET-NAME` + +(Consider also adding the ‘-copy’ option to make the B2 and local backups copy-compatible.) + +Now enter the B2 account ID, application key, and encryption password when prompted: + +`Enter Backblaze Account ID:MY_BACKBLAZE_ACCOUNT_ID` +`Enter Backblaze Application key:MY_BACKBLAZE_APPLICATION_KEY` +`Enter storage password for b2://MY-B2-BUCKET-NAME:*********************************` +`Re-enter storage password:*********************************` +`/path/to/my/data will be backed up to b2://MY-B2-BUCKET-NAME with id data_backup` + + +`[root@mycomputer data]# cat .duplicacy/preferences` + + `[` + `{` + `"name": "my_external_hard_drive_backup",` + `"id": "data_backup",` + `"storage": /path/to/local/backup/location/",` + `"encrypted": true,` + `"no_backup": false,` + `"no_restore": false,` + `"no_save_password": false` + `},` + `{` + `"name": "backblaze_b2_data",` + `"id": "data_backup",` + `"storage": "b2://MY-B2-BUCKET-NAME",` + `"encrypted": true,` + `"no_backup": false,` + `"no_restore": false,` + `"no_save_password": false` + `}` +`]` + +Now let’s load the Backblaze account ID, application key, and encryption password for the B2 storage into the preferences file to enable set-and-forget backups: + +`[root@mycomputer data]# duplicacy set -storage backblaze_b2_data -key b2_id -value MY_BACKBLAZE_ACCOUNT_ID` +`New options for storage b2://MY-B2-BUCKET-NAME have been saved` + +`[root@mycomputer data]# duplicacy set -storage backblaze_b2_data -key b2_key -value MY_BACKBLAZE_APPLICATION_KEY` +`New options for storage b2://MY-B2-BUCKET-NAME have been saved` + +`[root@mycomputer data]# duplicacy set -storage backblaze_b2_data -key password -value "MY_ENCRYPTION_PASSWORD"` +`New options for storage b2://MY-B2-BUCKET-NAME have been saved` + +`[root@mycomputer data]# cat .duplicacy/preferences` + + `[` + `{` + `"name": "my_external_hard_drive_backup",` + `"id": "data_backup",` + `"storage": "/path/to/local/backup/location/",` + `"encrypted": true,` + `"no_backup": false,` + `"no_restore": false,` + `"no_save_password": false` + `},` + `{` + `"name": "backblaze_b2_data",` + `"id": "data_backup",` + `"storage": "b2://MY-B2-BUCKET-NAME",` + `"encrypted": true,` + `"no_backup": false,` + `"no_restore": false,` + `"no_save_password": false,` + `"keys": {` + `"b2_id": "MY_BACKBLAZE_ACCOUNT_ID",` + `"b2_key": "MY_BACKBLAZE_APPLICATION_KEY",` + `"password": "MY_ENCRYPTION_PASSWORD"` + `}` + `}` +`]` + + +The preferences file now has enough information stored in it to be able to backup to a B2 account and not require user interaction. + +Since the preferences file has passwords, let’s lock down the access to it: + + +`[root@mycomputer data]# chmod -R 600 .duplicacy/` + +Now let’s start the local backup first. Depending on the amount of data to be backed up, the computer’s processing power, and the data connection speed to the external hard drive, it may take some time (possibly on the order of several hours) to complete. + +`[root@mycomputer data]# duplicacy backup -threads 4 -stats -storage my_external_hard_drive_backup` + +Enter the encryption password when prompted. + +`Storage set to /path/to/local/backup/location/` +`No previous backup found` +`Indexing /path/to/my/data` + `(lots of “packed” lines…)` +`Backup for /path/to/my/data at revision 1 completed` + +You can experiment with the number of threads used for this task (we assumed 4 here) to minimize the time required for the backup. + +Now that we have a local backup using duplicacy, it’s time to create the online backup to the B2 bucket. Depending on the amount of data being backed up and the upload speed, the duration required for this might be measured in weeks or even months. The following command has been tested using bash and might not apply to all possible shells: + +`[root@mycomputer data]# nohup duplicacy backup -threads 2 -stats -storage backblaze_b2_data > /path/to/logfile/for/this/backup 2>&1 &` +`[1] 27719` +`[root@mycomputer data]#` + +Let’s dissect this last command: +`nohup`: Continue running this process even after the user logs out. This is helpful for backing up very large datasets, particularly with slower upload speeds. +`duplicacy backup -threads 2 -stats -storage backblaze_b2_data > /path/to/logfile/for/this/backup`: Initiate a backup using 2 threads, show stats at the end, and backup to the B2 bucket that was set up previously. Send the outputs to the path to the logfile indicated. +`2>&1`: Send outputs to stderr to stdout, so anything sent to stderr gets redirected to the logfile as well +`&`: Start this as a background process + + +You can periodically monitor the backup by looking at the current logfile created: + +`[root@mycomputer data]# tail /path/to/logfile/for/this/backup` +`Uploaded chunk 4062 size 14940431, 1.24MB/s 21 days 16:40:28 0.8%` +`Uploaded chunk 4067 size 2498829, 1.24MB/s 21 days 16:46:32 0.8%` +`Uploaded chunk 4064 size 3486015, 1.24MB/s 21 days 16:47:06 0.8%` +`Uploaded chunk 4068 size 2619000, 1.24MB/s 21 days 16:45:05 0.8%` +`Uploaded chunk 4069 size 3637340, 1.24MB/s 21 days 16:55:20 0.8%` +`Uploaded chunk 4066 size 12429227, 1.24MB/s 21 days 16:54:05 0.8%` +`Uploaded chunk 4070 size 9584627, 1.24MB/s 21 days 16:41:25 0.8%` +`Uploaded chunk 4073 size 2962527, 1.24MB/s 21 days 16:48:43 0.8%` +`Uploaded chunk 4072 size 5215431, 1.24MB/s 21 days 16:48:39 0.8%` +`Uploaded chunk 4071 size 6534516, 1.24MB/s 21 days 16:48:33 0.8%` + +(This is sample data – your log should look similar, but obviously not the same as this.) + -Your page is available at: https://forum.duplicacy.com/t/backing-up-large-datasets-to-both-local-and-backblaze-b2-destinations-using-duplicacy-cli-on-linux/1076 \ No newline at end of file diff --git a/Cache.md b/Cache.md index 5da8e22..7749ed0 100644 --- a/Cache.md +++ b/Cache.md @@ -1,3 +1,7 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Duplicacy maintains a local cache under the `.duplicacy/cache` folder in the repository. Only snapshot chunks may be stored in this local cache, and file chunks are never cached. -Your page is available at: https://forum.duplicacy.com/t/cache-usage-details/1079 \ No newline at end of file +At the end of a *backup* operation, Duplicacy will clean up the local cache in such a way that only chunks composing the snapshot file from the last backup will stay in the cache. All other chunks will be removed from the cache. However, if the *prune* command has been run before (which will leave a the `.duplicacy/collection` folder in the repository), then the *backup* command won't perform any cache cleanup and instead defer that to the *prune* command. + +At the end of a prune operation, Duplicacy will remove all chunks from the local cache except those composing the snapshot file from the last backup (those that would be kept by the *backup* command), as well as chunks that contain information about chunks referenced by *all* backups from *all* repositories connected to the same storage url. + +Other commands, such as *list*, *check*, do not clean up the local cache at all, so the local cache may keep growing if many of these commands run consecutively. However, once a *backup* or a *prune* command is invoked, the local cache should shrink to its normal size. \ No newline at end of file diff --git a/Chunk-Size.md b/Chunk-Size.md index e2e441e..bf55bdc 100644 --- a/Chunk-Size.md +++ b/Chunk-Size.md @@ -1,3 +1,33 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Duplicacy adopts a unique pack-and-split method to split files into chunks. First, all files to be backed up are packed together, in an alphabetical order, as if it were building an imaginary tar file. Of course, this is only conceptual, otherwise Duplicacy would have quickly run out of memory in most use cases. This imaginary tar file is then split into chunks, using a variable size chunking algorithm. The default settings of the [[init]] command would generate chunks that are 4M bytes on average (although the actual averages may vary), at least 1M bytes and at most 16M bytes. -Your page is available at: https://forum.duplicacy.com/t/chunk-size-details/1082 \ No newline at end of file +This pack-and-split method has two implications. First, any files smaller than the minimum chunk size will not individually benefit from deduplication. For instance, any change on a file that is 100K bytes or so will cause the entire file to be uploaded again (as a separate chunk, or part of a chunk if there are other changed files). On the other hand, when a directory consisting of many small files is to be moved or renamed, because these small files will be packed in the same order, most of chunks generated after the move or rename will remain unchanged, except for a few at the beginning and at the end that are likely affected by files in adjacent directories. + +Another implication is that chunks do not usually align with file boundaries. As a result, when a file larger than the average chunk size is moved or renamed, the pack-and-split procedure will produce several new chunks at the beginning and the end of the file. At the same time, if the directory where the file original resides is backed up again (using the `-hash` option), then the `hole` left by this file will also cause several new chunks to be generated. There have been lengthy discussions on this topic such as [here](https://github.com/gilbertchen/duplicacy/issues/334) and [here](https://forum.duplicacy.com/t/system-design-performance-issues/632). + +While there are techniques to achieve the 'perfect' deduplication, keep in mind that the amount of overhead from such deduplication inefficiency is not unbounded. Specifically, the overhead is roughly proportional to the chunks size: + +``` + overhead = a * c * chunk_size +``` + +where `c` is the number of changes, `a` is a small number representing the number of new chunks caused by each change. Therefore, by reducing the average chunk size, the deduplication ratio can be improved to a satisfactory level: + +``` +duplicacy init -c 1M repository_id storage_url +``` + +A chunk size smaller than 1M bytes isn't generally recommended, because the overhead from the chunk transfer as well as the chunk lookup before uploading each chunk will start to dominate with small chunks (which however can be partially alleviated by using multiple uploading threads). + +## Fixed Size Chunking + +Certain types of files, such as virtual machine disks, databases, and encrypted containers, are always updated in-place, and never subject to insertions and deletions. For this kind of files, the default variable size chunking algorithm in Duplicacy becomes over-complicate as it incurs the unnecessary overhead of calculating the rolling hash. The recommended configuration is to set all three chunk size parameters in the [[init]] or [[add]] command to the same value: + +``` +duplicacy init -c 1M -min 1M -max 1M repository_id storage_url +``` + +Duplicacy will then switch to the fixed size chunking algorithm which is faster and leads to higher deduplication ratios. In fact, [Vertical Backup](https://verticalbackup.com), a special edition of Duplicacy built for VMWare ESXi to back up virtual machines, uses this default configuration which has proven to work well in practice. + +One important thing to note is that, when the fixed size chunking algorithm is chosen, Duplicacy doesn't deploy the default pack-and-split method. Instead, no packing is performed and each file is split into chunks individually. A file smaller than the chunk size will be uploaded in a single chunk. Therefore, the fixed size chunking algorithm is appropriate only when there are a limited number of files. + + diff --git a/Encryption.md b/Encryption.md index 37aaa41..1e26619 100644 --- a/Encryption.md +++ b/Encryption.md @@ -1,3 +1,18 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +When encryption is enabled (by the -e option with the *init* or *add* command), Duplicacy will generate 4 random 256 bit keys: -Your page is available at: https://forum.duplicacy.com/t/encryption-of-the-storage/1085 \ No newline at end of file +* *Hash Key*: for generating a chunk hash from the content of a chunk +* *ID Key*: for generating a chunk id from a chunk hash +* *Chunk Key*: for encrypting chunk files +* *File Key*: for encrypting non-chunk files such as snapshot files. + +Here is a diagram showing how these keys are used: + +[[https://github.com/gilbertchen/duplicacy-beta/blob/master/images/duplicacy_encryption.png|alt="encryption"]] + +Chunk hashes are used internally and stored in the snapshot file. They are never exposed unless the snapshot file is decrypted. Chunk ids are used as the file names for the chunks and therefore exposed. When the *cat* command is used to print out a snapshot file, the chunk hashes stored in the snapshot file will be converted into chunk ids first which are then displayed instead. + +Chunk content is encrypted by AES-GCM, with an encryption key that is the HMAC-SHA256 of the chunk Hash with the *Chunk Key* as the secret key. + +The snapshot is encrypted by AES-GCM too, using an encrypt key that is the HMAC-SHA256 of the file path with the *File Key* as the secret key. + +These four random keys are saved in a file named 'config' in the storage, encrypted with a master key derived from the PBKDF2 function on the storage password chosen by the user. \ No newline at end of file diff --git a/Exit-Codes.md b/Exit-Codes.md index 4a2b974..2f77be9 100644 --- a/Exit-Codes.md +++ b/Exit-Codes.md @@ -1,3 +1,7 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +All Duplicacy commands return 0 when successful, otherwise non-zero. The specific non-zero codes are: -Your page is available at: https://forum.duplicacy.com/t/exit-codes-details/1086 \ No newline at end of file +* `1`: the command was interrupted by user +* `2`: the command arguments are malformed +* `3`: invalid value for a command argument +* `100`: the command encountered an error in the Duplicacy code (most run-time errors, including those from failed connections, will emit this exit code) +* `101`: the command encountered an error in a dependency library used by Duplicacy \ No newline at end of file diff --git a/Global-Options.md b/Global-Options.md index 03c07e4..29a4926 100644 --- a/Global-Options.md +++ b/Global-Options.md @@ -1,3 +1,83 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +
-Your page is available at: https://forum.duplicacy.com/t/global-options-details/1087 \ No newline at end of file +These options apply to all duplicacy commands and must be placed before any command + +# Quick overview +``` + -verbose, -v show more detailed information + -debug, -d show even more detailed information, useful for debugging + -log enable log-style output + -stack print the stack trace when an error occurs + -no-script do not run script before or after command execution + -background read passwords, tokens, or keys only from keychain/keyring or env + -profile enable the profiling tool and listen on the specified address:port + -comment add a comment to identify the process + -help, -h show help +``` + +# Usage + +` duplicacy [global options] command` + +# Options + +--- +### `-verbose, -v` + +Show more detailed messages. For the highest level of info, chose `-debug`. By default, only INFO, WARNING, and ERROR messages are displayed. + +--- +### `-debug, -d` + +Show DEBUG level messages (the highest, most detailed level of messages). By default, only INFO, WARNING, and ERROR messages are displayed. + +--- +### `-log` + +Show the timestamp, level, and the message id of each log message. + +--- +### `-stack` + +The `-stack` option is used to dump the stack trace when an error occurs to help locate where the error is. + +--- +### `-no-script` + +Using this option will stop [Pre Command and Post Command Scripts](https://forum.duplicacy.com/t/pre-command-and-post-command-scripts/1100) from running. This is usually used to avoid an infinite loop of command execution. + +--- +### :no_entry_sign: :no_entry:`-background` + +The `-background` option will instruct Duplicacy not to ask for interactive password input. As a result, Duplicacy will read all credentials only from keychain/keyring or the environment variables. If a credential can't be found, an error will be reported. + +This is the default CLI usage! + +:no_entry_sign: This options should is useful only for duplicacy GUI and therefore should **never** be used by duplicacy CLI! + +--- +### `-profile ` + +With the `-profile` option, you can open `http://address:port/debug/pprof/` in a browser to profile a running Duplicacy instance. + +Please refer to https://golang.org/pkg/net/http/pprof/ for instructions. + +--- +### `-comment` + +The `-comment` option was introduced to allow Duplicacy processed to be identified by arguments, for example when using ps. + +##### Example + +Suppose you have 2 duplicacy processes running: +``` +duplicacy -comment LONG_OPERATION check -all & +duplicacy check -all & +``` +when you `ps ax | grep duplicacy` there will be nothing of help to differentiate the 2 processes. +But when you use this option: `ps ax | grep LONG_OPERATION` you will find your desired process. + +--- +### `-help, -h` + +Show the help. \ No newline at end of file diff --git a/Home.md b/Home.md index f0421fe..7769e7d 100644 --- a/Home.md +++ b/Home.md @@ -1,3 +1,52 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Welcome to the duplicacy wiki! -Your page is available at: [Duplicacy Guide](https://forum.duplicacy.com/t/duplicacy-user-guide/1197). \ No newline at end of file +[[Installation]] + +[[Quick Start]] + +[[Storage Backends]] + +Commands + * [[init]] - initialize a new repository and storage + * [[backup]] - save a snapshot of the repository to the storage + * [[restore]] - restore files + * [[list]] - list snapshots and files + * [[check]] - check the integrity of snapshots + * [[prune]] - prune snapshots by revision number, tag, or retention policy + * [[cat]] - print the specified file or snapshot + * [[history]] - show the history of a file + * [[diff]] - compare two snapshots or two revisions of a file + * [[password]] - Change the storage password + * [[add]] - Add an additional storage for the existing repository + * [[set]] - Change storage options + * [[copy]] - Copy snapshots between compatible storages + * [[benchmark]] - Test disk access and network transfer speeds + +Advanced Usage +* [[Global Options]] +* [[Exit Codes]] +* [[Include/Exclude Patterns]] +* [[Managing Passwords]] +* [[Cache]] +* [[Pre-Command and Post-Command Scripts]] +* [[Chunk Size]] +* [[Missing Chunks]] +* [[RSA encryption]]] + +Use Cases +* [[Restore to a different folder or computer]] +* [[Back up to multiple storages]] +* [[Multiple repositories with different accounts of the same cloud storage service]] +* [[Move .duplicacy folder]] +* [Monitor Duplicacy for Errors etc](https://forum.duplicacy.com/tags/monitoring) +* [Schedule Duplicacy to run at certain times](https://forum.duplicacy.com/tags/schedule) +* [Monitor backups status using healthchecks.io](https://forum.duplicacy.com/t/monitor-backups-status-windows-cli/2263) + +Design and Implementation +* [[Lock-Free Deduplication]] +* [[Snapshot Format]] +* [[Encryption]] + +Utilities + +* [[Scripts and utilities index]] \ No newline at end of file diff --git a/Include-Exclude-Patterns.md b/Include-Exclude-Patterns.md index 4018ccb..a26e76a 100644 --- a/Include-Exclude-Patterns.md +++ b/Include-Exclude-Patterns.md @@ -1,3 +1,209 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +For the *backup* command, the include/exclude patterns are read from a file named *filters* under the *.duplicacy* directory. For the *restore* command, the include/exclude patterns are specified as the command line arguments. -Your page is available at: https://forum.duplicacy.com/t/filters-include-exclude-patterns/1089 \ No newline at end of file +Duplicacy offers two different methods for providing include/exclude filters, wildcard matching and regular expression matching. You may use one method exclusively or you may combine them as you deem necessary. + +All paths are relative to the repository (the folder you execute duplicacy from), without a leading "/". As the upmost folder on Windows is a drive, this means drive letters are not part of the path of a pattern. The path separator is always a "/", even on Windows. Paths are case sensitive. + +## 1. Wildcard Matching + +An include pattern starts with "+", and an exclude pattern starts with "-". Patterns may contain wildcard characters "\*" which matches a path string of any length, and "?" matches +a single character. Note that both "\*" and "?" will match any character including the path separator "/". + +When matching a path against a list of patterns, the path is compared with the part after "+" or "-", one pattern at a time. Therefore, the order of the patterns is significant. If a match with an include pattern is found, the path is said to be included without further comparisons. If a match with an exclude pattern is found, the path is said to be excluded without further comparison. If a match is not found, the path will be excluded if all patterns are include patterns, but included otherwise. + +Patterns ending with a "/" apply to directories only, and patterns not ending with a "/" apply to files only. +Patterns ending with "\*" and "?", however, apply to both directories and files. When a directory is excluded, all files and subdirectories under it will also be excluded. Therefore, to include a subdirectory, all parent directories must be explicitly included. +For instance, the following pattern list doesn't do what is intended, since the `foo` directory will be excluded so the `foo/bar` will never be visited: + +``` ++foo/bar/* +-* +``` + +This does not work because +``` -* ``` +implies +``` -foo/ ``` +So when duplicacy examines the first level of the file tree for matches and exclusions, it excludes foo/ and everything underneath. That means that it never goes to the second level into foo/, and therefore never sees a match for foo/bar/. It also excludes all other top-level directories, producing an empty backup. +So, we have to make sure foo/ is included first, before the wildcard excludes it. Here is the correct way to include `foo` as well: + +``` ++foo/bar/* ++foo/ +-* +``` + +The following pattern list includes only files under the directory foo/ but not files under the subdirectory foo/bar: + +``` +-foo/bar/ ++foo/* +-* +``` + +To include a directory while excluding all files under that directory, use these patterns: + +``` ++cache/ +-cache/?* +``` + +## 2. Regular Expression Matching + +An include pattern starts with "i:", and exclude pattern starts with "e:". The part of the filter after the include/exclude prefix must be a valid regular expression. The +regular expression syntax is the same general syntax used by Perl, Python, and other languages. +Full details for the supported regular expression syntax and features are available [here](https://github.com/google/re2/wiki/Syntax "Go Lang Regular Exprssion Syntax"). + +When matching a path against a list of patterns, the path is compared with the part after "i:" or "e:" one pattern at a time. Therefore, the order of the patterns is significant. If a match with an include pattern is found, the path is said to be included without further comparisons. If a match with an exclude pattern is found, the path is said to be excluded without further comparison. If a match is not found, the path will be excluded if all patterns are include patterns, but included otherwise. + +Some examples of regular expression filters are shown below: +``` +# always include sqlite databases +i:\.sqlite$ +``` + +``` +# exclude sqlite temp files +e:\.sqlite-.*$ +``` + +``` +# exclude temporary file names +e:.*/?~.*$ +``` + +``` +# exclude common file types (case insensitive) +e:(?i)\.(bak|mp4|mkv|o|obj|old|tmp)$ +``` + +``` +# exclude lotus notes full text directories +e:\.ft/.*$ +``` + +``` +# exclude any cache files/directories with cache in the name (case insensitive) +e:(?i).*cache.* +``` + +``` +# exclude lightroom previews +e:(?i).* Previews\.lrdata/.*$ +``` + +``` +# exclude Qt source +e:(?i)develop/qt[0-9]/.*$ +``` + +``` +# exclude any git stuff +e:\.git/.*$ +``` + +``` +# exclude cisco anyconnect log files: matches .cisco/log/* or .cisco/vpn/log/*, etc +e:\.cisco/.*/?log/.*$ +``` + +``` +# exclude trash bin stuff +e:\.Trash/.*$ +``` + +``` +# exclude old firefox stuff +e:Old Firefox Data/.*$ +``` + +``` +# exclude dirx stuff: excludes Documents/dir0/*, Documents/dir1/*, ... +e:Documents/dir[0-9]*/.*$ +``` + +``` +# exclude downloads +e:Downloads/.*$ +``` + +``` +# exclude duplicacy test stuff +e:DUPLICACY_TEST_ZONE/.*$ +``` + +``` +# exclude lotus notes stuff +e:Library/Application Support/IBM Notes Data/.*$ +``` + +``` +# exclude mobile backup stuff +e:Library/Application Support/MobileSync/Backup/.*$ +``` + +``` +# exclude movies +e:Movies/.*$ +``` + +``` +# exclude itunes stuff +e:Music/iTunes/iTunes Media/.*$ +``` + +``` +# include everything else +i:.* +``` + +``` +# include Firefox profile but nothing else from Mozilla +i:(?i)/AppData/[^/]+/Mozilla/$ +i:(?i)/AppData/[^/]+/Mozilla/Firefox/ +e:(?i)/AppData/[^/]+/Mozilla/ +``` + +Explanation of the regex above: +- `/[^/]+/`: has the purpose of assuring that there is exactly 1 folder between `AppData` and `Mozilla` +- we need to include + - the `Mozilla` folder, but nothing it contains (therefore the `$`) + - the `Firefox` folder, and everything it contains + - exclude everything in the `Mozilla` folder which is not contained in the rules above + - (important) put the `$` include rule(s) for each folder we want to include up to the actual folder where we take everything, (check Google Chrome profile below). (note: someone please explain this better) + +``` +# include Google Chrome profile but nothing else from Google +# note that we include the whole profile, because we are unsure how many "users" are added beside the "Default" profile +i:(?i)/AppData/[^/]+/Google/$ +i:(?i)/AppData/[^/]+/Google/Chrome/$ +i:(?i)/AppData/[^/]+/Google/Chrome/User Data/ +e:(?i)/AppData/[^/]+/Google/ +``` + +As seen in the examples above, you may add comments to your filters file by starting the line with a "#" **as the first character of the line**. +The entire comment line will be ignored and can be used to document the meaning of your include/exclude wildcard and regular expression filters. Completely blank lines are +also ignored and may be used to make your filters list more readable. Note that if you add # anywhere else but at the beginning of a line, it will be interpreted as part of the pattern, not as a comment. + +# Testing filters +Filters can be easily tested using the backup command: `duplicacy -d -log backup -enum-only`. This is further explained in https://forum.duplicacy.com/t/backup-command-details/1077. + +# Importing patterns from other files +You can now `@import` other files into the [`filters`](https://forum.duplicacy.com/t/filters-include-exclude-patterns/1089) file by using +``` +@/the/full/path/to/the/customised-filters-file +@/the/full/path/to/the/some-other-filters-file +other filters below +``` +See the details in https://forum.duplicacy.com/t/filters-just-got-a-big-upgrade-import-files/2120. + +# Custom filters file location +Start with version 2.3.0, you can now specify the location of the `filters` file rather that the default one at `.duplicacy/filters`. To do this, run the `set` command as: + +``` +set -storage second -filters +``` + +The path can be an absolute path, or relative to the repository path. +You can also edit the `.duplicacy/preferences` file directly to add the `filters` key. +This option means that you can now use a different `filters` file for each storage. \ No newline at end of file diff --git a/Installation.md b/Installation.md index d620397..d946be6 100644 --- a/Installation.md +++ b/Installation.md @@ -1,3 +1,82 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Duplicacy is written in Go. If you have Go installed you can run the following command to build from the latest source: -Your page is available at: https://forum.duplicacy.com/t/build-duplicacy-from-source/1091 \ No newline at end of file +``` +go get -u github.com/gilbertchen/duplicacy/... +``` + +The executable named `duplicacy` will then be created under `$GOPATH/bin`. + +You can also build the executable manually after running the go get command. +``` +cd $GOPATH/src/github.com/gilbertchen/duplicacy +go build duplicacy/duplicacy_main.go +``` + +The executable will be created under `$GOPATH/src/github.com/gilbertchen/duplicacy` with the name `duplicacy_main`. + +To download the pre-built executable, please visit the [releases page](https://github.com/gilbertchen/duplicacy-cli/releases/latest) and select the binary that works for your platform. + +
+ Longer explanation of how to build from source +0. help: + - https://golang.org/doc/code.html +1. download and install go and git and make them available in `PATH` +2. set $GOPATH to whichever directory you want to use as workspace (if you don't do this, go uses as default location `your_homefolder/go` ) +3. in cmd run `go get github.com/gilbertchen/duplicacy/duplicacy` + 3.5 wait until the command finishes (it has to download A LOT) of sources needed for building duplicacy (all the libraries which duplicacy depends on -- for each supported storage and so on) +4. after the download is finished, `cd` to the folder `$GOPATH/src/github.com/gilbertchen/duplicacy` (see the similarity with the above github path? -> this is how go manages libraries and dependencies). There are all the sources for duplicacy. + - 4.5 if you want to build a different branch from `master`, now it's the time! you have to pull the remote branch which has the changes you need into local master -- I recommend using a good GUI: [Sourcetree](https://www.sourcetreeapp.com/) but you can do it from cmd as well. ([_you_](https://www.youtube.com/watch?v=BomVB3QkKQs) [_monster_](https://www.youtube.com/watch?v=f2pmcAjvWYQ)). + - 4.6 If you want to modify something in the sources, you should do it at this step (eg.: modify the number of retries for some storage) +5. now that you have the sources, you have to actually compile duplicacy: again in cmd `go install github.com/gilbertchen/duplicacy/duplicacy` +6. compiling should take no more than 10 seconds. You will find the executable file in the folder `$GOPATH/bin/duplicacy.exe`
+ +# Build on macOS + +To build from source on macOS, Xcode must be installed because one of the dependency libraries, `github.com/gilbertchen/keyring` requires cgo for security reasons (see [#136](https://github.com/gilbertchen/duplicacy/issues/136) for details). If you don't want to install Xcode, you can switch to `github.com/tmc/keyring` (the original library that `github.com/gilbertchen/keyring` was forked from) by modifying this line in `src/duplicacy_keyring.go`: +``` + "github.com/gilbertchen/keyring" +``` +to +``` + "github.com/twc/keyring" +``` + +# Build from a fork + +If you fork the repository `github.com/gilbertchen/duplicacy` and clone the fork locally as `github.com/yourusername/duplicacy`, +please remember that `duplicacy/duplicacy_main.go` still imports from `github.com/gilbertchen/duplicacy/src`, +so any changes you make under `github.com/yourusername/duplicacy/src` will have no effect when building the executable. + +###### Option 1 + +Therefore, the recommended way is to clone your fork into the original namespace: + +``` +git clone https://github.com/yourusername/duplicacy $GOPATH/src/github.com/gilbertchen/duplicacy +``` + +###### Option 2 + +There is one other option when working with a fork which is explained fully [here](http://blog.campoy.cat/2014/03/github-and-go-forking-pull-requests-and.html). +The short version is: + +1. Make a normal github fork of duplicacy (so that you have `github.com/you/duplicacy`); +2. Clone original duplicacy: `go get github.com/gilbertchen/duplicacy`, and **not** your fork; + - The repository is locally @ `cd $GOPATH/src/github.com/gilbertchen/duplicacy`; +3. Add a remote to _your fork_ in the cloned repository `git remote add fork https://github.com/you/duplicacy` noted above; +4. Instead of committing/pushing to `origin` (which is the original `github.com/gilbertchen/duplicacy`), + now you use `fork` (which is your github repository `github.com/you/duplicacy`). + +In this way, it is also very easy to test other people's code: just add a new remote to `$GOPATH/src/github.com/gilbertchen/duplicacy` +(the original duplicacy clone) like you did in step 3. But just change the name `fork` to something else. + +Note about the second method: there is no clone of `github.com/you/duplicacy` therefore there shouldn't exist on your disk any folder +`$GOPATH/src/github.com/you/duplicacy`. + +# Cross compile + +An example for some NAS boxes - ARM, Debian with 64k pages: +``` +cd $GOPATH/src/github.com/gilbertchen/duplicacy +env GOARCH=arm GOOS=linux go build -o duplicacy_linux64k_arm --ldflags "-R 65536" -v duplicacy/duplicacy_main.go +``` diff --git a/Lock-Free-Deduplication.md b/Lock-Free-Deduplication.md index dd93ab4..b2c475d 100644 --- a/Lock-Free-Deduplication.md +++ b/Lock-Free-Deduplication.md @@ -59,6 +59,3 @@ Therefore, if a backup procedure references a chunk before the chunk is marked a delete the chunk until it sees that backup procedure finishes (as indicated by the appearance of a new snapshot file uploaded to the storage). This ensures that scenarios depicted in Figure 2 will never happen. [[https://github.com/gilbertchen/duplicacy-beta/blob/master/images/fossil_collection_2.png|alt="Reference before Rename"]] - -This wiki is also found on the [Duplicacy Forum](https://forum.duplicacy.com). -Please also modify the post in the forum when you edit this wiki: https://forum.duplicacy.com/t/lock-free-deduplication-algorithm/1093 \ No newline at end of file diff --git a/Managing-Passwords.md b/Managing-Passwords.md index 2580169..e9eab05 100644 --- a/Managing-Passwords.md +++ b/Managing-Passwords.md @@ -1,3 +1,43 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Duplicacy will attempt to retrieve in three ways the storage password and the storage-specific access tokens/keys. -Your page is available at: https://forum.duplicacy.com/t/passwords-credentials-and-environment-variables/1094 \ No newline at end of file +* If a secret vault service is available, Duplicacy will store passwords/keys entered by the user in such a secret vault and later retrieve them when needed. On Mac OS X it is Keychain, and on Linux it is gnome-keyring. On Windows the passwords/keys are encrypted and decrypted by the Data Protection API, and encrypted passwords/keys are stored in the file *.duplicacy/keyring*. However, if the -no-save-password option is specified for the storage, then Duplicacy will not save passwords this way. +* If an environment variable for a password is provided, Duplicacy will always take it. The table below shows the name of the environment variable for each kind of password. Note that if the storage is not the default one, the storage name will be included in the name of the environment variable (in uppercase). For example, if your storage name is b2, then the environment variable should be named DUPLICACY_B2_PASSWORD. +* If a matching key and its value are saved to the preference file (.duplicacy/preferences) by the *set* command, the value will be used as the password. The last column (_key in preferences_) in the table below lists the name of the preference key for each type of password. + +| password type | environment variable (default storage) | environment variable (non-default storage in uppercase) | key in preferences | +|:----------------:|:----------------:|:----------------:|:----------------:| +| storage password | DUPLICACY_PASSWORD | DUPLICACY_<STORAGENAME>_PASSWORD | password | +| sftp password | DUPLICACY_SSH_PASSWORD | DUPLICACY_<STORAGENAME>_SSH_PASSWORD | ssh_password | +| sftp key file | DUPLICACY_SSH_KEY_FILE | DUPLICACY_<STORAGENAME>_SSH_KEY_FILE | ssh_key_file | +| Dropbox Token | DUPLICACY_DROPBOX_TOKEN | DUPLICACY_<STORAGENAME>>_DROPBOX_TOKEN | dropbox_token | +| S3 Access ID | DUPLICACY_S3_ID | DUPLICACY_<STORAGENAME>_S3_ID | s3_id | +| S3 Secret Key | DUPLICACY_S3_SECRET | DUPLICACY_<STORAGENAME>_S3_SECRET | s3_secret | +| BackBlaze Account ID | DUPLICACY_B2_ID | DUPLICACY_<STORAGENAME>_B2_ID | b2_id | +| Backblaze Application Key | DUPLICACY_B2_KEY | DUPLICACY_<STORAGENAME>_B2_KEY | b2_key | +| Azure Access Key | DUPLICACY_AZURE_KEY | DUPLICACY_<STORAGENAME>_AZURE_KEY | azure_key | +| Google Drive Token File | DUPLICACY_GCD_TOKEN | DUPLICACY_<STORAGENAME>_GCD_TOKEN | gcd_token | +| Google Cloud Storage Token File | DUPLICACY_GCS_TOKEN | DUPLICACY_<STORAGENAME>_GCS_TOKEN | gcs_token | +| Microsoft OneDrive Token File | DUPLICACY_ONE_TOKEN | DUPLICACY_<STORAGENAME>_ONE_TOKEN | one_token | +| Hubic Token File | DUPLICACY_HUBIC_TOKEN | DUPLICACY_<STORAGENAME>_HUBIC_TOKEN | hubic_token | +| Wasabi Key | DUPLICACY_WASABI_KEY | DUPLICACY_<STORAGENAME>_WASABI_KEY | wasabi_key | +| Wasabi Secret | DUPLICACY_WASABI_SECRET | DUPLICACY_<STORAGENAME>_WASABI_SECRET | wasabi_secret | +| webdav password | DUPLICACY_WEBDAV_PASSWORD | DUPLICACY_<STORAGENAME>_WEBDAV_PASSWORD | webdav_password | + + +The passwords stored in the environment variable and the preference need to be in plaintext and thus are insecure and should be avoided whenever possible. + +Note that you must use the wasabi environment variables instead of the s3 environment variables if you are using the wasabi storage URL. + +The passwords will be stored when the `backup` command (or any other command apart from `init` or `add`) is run for the first time. This means you need to make sure that you do that first run interactively, i.e. not via a script (unless it passes on the password prompts, of course). + +# Saving credentials to Duplicacy config file +Use one of the above environment variables, but lowercase and remove duplicacy_ + +Example: duplicacy set -key b2_id -value 6fdd6eeeefff + +or: duplicacy set -storage mybackupstorage -key b2_id -value 6fdd6eeeefff + +or: duplicacy set -key b2_id -value "passphrase with spaces" + +# Changing passwords +To change passwords that have been stored in the keychain/keyring, use the [`list`](https://github.com/gilbertchen/duplicacy/wiki/list) command with the `-reset-passwords` option. \ No newline at end of file diff --git a/Missing-Chunks.md b/Missing-Chunks.md index 35a93d4..f1c19e8 100644 --- a/Missing-Chunks.md +++ b/Missing-Chunks.md @@ -1,3 +1,64 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Sometimes, when you run the [[check]] command, it may complain about missing chunks: +``` +$ duplicacy check +Storage set to sftp://gchen@192.168.1.125/AcrosyncTest/teststorage +Listing all chunks +Chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 referenced by snapshot test at revision 1 does not exist +Some chunks referenced by snapshot test at revision 1 are missing +``` -Your page is available at: https://forum.duplicacy.com/t/fix-missing-chunks/1095 \ No newline at end of file +All other commands can also report the same missing chunk messages. If that happens, it is recommended to run the [[check]] command instead as it can identify all missing chunks at once for a given snapshot, without any side effects. + +The first thing to do in this situation is to check by hand if those chunks actually exist on the storage. Some cloud storage services (such as [OneDrive](https://github.com/OneDrive/onedrive-api-docs/issues/740) and [Hubic](https://github.com/gilbertchen/duplicacy/issues/290)) have a bug that prevents the complete chunk list to be returned. In other cases, a chunk may be stored in a wrong folder. For instance, the expected path for the chunk `02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5` may be `chunks\02\c2\25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5`, but if it were stored as `chunks\02\c225aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5`, Duplicacy would have difficulty locating it. + +If a chunk reported as missing in fact does not exist in the storage, then you may need to find out why it is missing. The [[prune]] command is the only command that can delete chunks, and by default Duplicacy always produces a prune log and saved it under the `.duplicacy/logs` folder. + +Here is a sample prune log: +``` +$cat .duplicacy/logs/prune-log-20180124-205159 +Deleted chunk 2302e87bf0a8c863112bbdcd4d7e94e8a12a9939defaa8a3f30423c791119d4c (exclusive mode) +Deleted chunk 7aa4f3192ecbf5a67f52a2e791cfac445116658ec1e3bd00f8ee35dda6964fb3 (exclusive mode) +Deleted chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 (exclusive mode) +Deleted chunk dbbd5c008e107703e59d8f6633d89f9a55075fa6695c113a2f191dd6cddacb53 (exclusive mode) +Deleted chunk 611c478edcc4201f8b48e206391e9929359e71eb31691afc23fb059418d53fb5 (exclusive mode) +Deleted chunk 297dcc3d83dc05b8e697535306a3af847435874cbe7d5a6b5e6918811d418649 (exclusive mode) +Deleted cached snapshot test at revision 1 +``` + +This log indicates that these chunks were removed when the [[prune]] command was invoked with the `-exclusive` option, because these chunks are only referenced by the snapshot to be deleted, and the `-exclusive` assumes there weren't any other ongoing backups. + +This is an excerpt from another prune log: +``` +Marked fossil 909a14a87d185b11ec933dba7069fc2b3744288bb169929a3fc096879348b4fc +Marked fossil 0e92f9aa69cc98cd3228fcfaea480585fe1ab64b098b86438a02f7a3c78e797a +Marked fossil 3ab0be596614dd39bcacc2279d49b6fc1e0095c71b594c509a7b5d504d6d111e +Marked fossil a8a1377cab0dd7f25cac4ac3fb451b9948f129904588d9f9b67bead7c878b7d0 +``` +These chunks weren't immediately removed but rather marked as fossils. This is because another ongoing backup that was seen by the prune command may reference any of these chunks. To be safe, the prune command will turn them into fossils, which can be either permanently removed if no such backup exists, or turned back into normal chunks otherwise. Please refer to [this wiki page](https://github.com/gilbertchen/duplicacy/wiki/Lock-Free-Deduplication) for a detailed explanation of this technique. + +If you can find the missing chunk in any of these prune logs, then it is clear that the [[prune]] command removed it in the exclusive mode or marked it as a fossil (which may be removed at a later time). If you think the [[prune]] command mistakenly removed or marked the chunk due to a bug, then submit a github issue with relevant logs attached. + +# :exclamation: Exceptional cases + +Please be aware there are some corner cases when a fossil still needed may be mistakenly deleted. + +### Backups lasting longer than 7 days + +If there is a repository doing a backup which takes more than 7 days and the backup started before the chunk was marked as fossil, then the `prune` command will think that that particular repository becomes inactive and will be excluded from the criteria for determining safe fossils to be deleted. + +### Initial backups + +The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the `prune` command doesn't know the existence of such a repository at the fossil deletion time, it may think the fossil isn't needed any more by any snapshot and thus delete it permanently. + +### `-exclusive` mode + +If you see from the log that a missing chunk was deleted in exclusive mode, then it means that the prune command was incorrectly invoked with the `-exclusive` option, while there was still a backup in progress from a different computer to the same storage. + +# Fixing a missing chunk +In all these cases, a [[check]] command after the backup finishes will immediately reveals the missing chunk. + +What if the missing chunk can't be found in any of these prune logs? We may not be able to track down who the culprit was. It could be a bug in Duplicacy, or a bug in the cloud storage service, or it could be a user error. If you do not want to see this happen again, you may need to run a [[check]] command after every backup or before every prune. + +Is it possible to recover a missing chunk? Maybe, if the backup where the missing chunk comes from was done recently and the files in that backup haven't changed since the backup. In this case, you can modify the `.duplicacy/preferences` file to assign to the repository a new id that hasn't been used by any repositories connecting to the same storage, and then run a new backup. This backup will be an initial backup because of the new repository id and therefore attempt to upload all chunks that do not exist in the storage. If you are lucky, this procedure will be able to produce an identical copy of the missing chunk. + +If you are uninterested in figuring out why the chunk went missing and just want to fix the issue, you can keep removing by hand the affected snapshot files under the `snapshots` folder in the storage, until the `check -a` command passes without reporting missing chunks. At this time, you should be able to run new backups. However, there will likely be many unreferenced chunks in the storage. To fix this, run `prune -exhaustive` and all unreferenced chunks will be identified and marked as fossils for removal by a subsequent prune command. Or if you're very sure that no other backups are running, `prune -exhaustive -exclusive` can remove these unreferenced chunks immediately. \ No newline at end of file diff --git a/Move-.duplicacy-folder.md b/Move-.duplicacy-folder.md index 4867380..2ecf8f3 100644 --- a/Move-.duplicacy-folder.md +++ b/Move-.duplicacy-folder.md @@ -1,3 +1,7 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +By default a `.duplicacy` folder is created in the root of the repository to store configurations, filters, caches and logs. With the `-pref-dir` option of the `init` command, this folder can be moved to a different location, but still a `.duplicacy` file which points to that folder is created in the root of the repository. -Your page is available at: https://forum.duplicacy.com/t/move-duplicacy-folder-use-symlink-repository/1097 \ No newline at end of file +It is possible to move the `.duplicacy` folder to a desired location without creating the `.duplicacy` file: + +1. Pick the desired location for the `.duplicacy` folder, this will be your repository root. +2. In this directory create symlinks to the folders you want to include in your backup. On Unix use `ln -s /path_to/existing_folder /target_folder`, on Windows `mklink /D target_folder "C:\path_to\existing_folder\"`. Note that this is the only way to create a repository that includes multiple drives on Windows. This can also make include/exclude patterns configuration a lot simpler, or even unnecessary. +3. Execute the `init` command in this directory and then the `backup` command. By default Duplicacy will follow the first-level symlinks (those under the root of the repository). Symlinks located under any subdirectories of the repository will be backed up as symlinks and will not be followed. \ No newline at end of file diff --git a/Multiple-repositories-with-different-accounts-of-the-same-cloud-storage-service.md b/Multiple-repositories-with-different-accounts-of-the-same-cloud-storage-service.md index 91c321a..51cdffa 100644 --- a/Multiple-repositories-with-different-accounts-of-the-same-cloud-storage-service.md +++ b/Multiple-repositories-with-different-accounts-of-the-same-cloud-storage-service.md @@ -1,3 +1,78 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Below are step-by-step instructions for setting up two repositories, each of which backs up to a different account of the same cloud storage (Google Drive in this case). -Your page is available at: https://forum.duplicacy.com/t/multiple-repositories-on-the-same-cloud-storage-service-but-with-different-accounts/1098 \ No newline at end of file +#### Run the `init` command in `folder1`: + +``` +cd path/to/folder1 +duplicacy init -storage-name gcd1 folder1 gcd://storage1 +``` + +`folder1` will be backed up to the `storage1` directory in your Google Drive account. The `-storage-name` option is important here, because Duplicacy manages credentials by the storage name, and you need to give each repository a unique storage name in order to separate its credentials from others. + +The generated `preferences` file will look like this: + +``` +{ + "name": "gcd1", + "id": "folder1", + "storage": "gcd://storage1", + ... +} +``` + +Duplicacy will then ask you for the path to the token file used to access the first Google Drive account. On macOS and Windows, Duplicacy will save the token file path automatically, but you can also set up this environment variable to avoid the need to enter the token file path: + +``` +export DUPLICACY_GCD1_GCD_TOKEN=/path/to/tokenfile1 +``` + +You can also run the `set` command to save the token file path in the `preferences` file: +``` +duplicacy set -key gcd_token -value /path/to/tokenfile1 +``` + +#### Run the `init` command in `folder2`: + +``` +cd path/to/folder2 +duplicacy init -storage-name gcd2 folder2 gcd://storage2 +``` + +Note that we use a different storage name than `gcd1`. Therefore, Duplicacy will not retrieve the token file path saved by the `init` command in step 1. Instead, it will ask you for the path to a token file again, and you can supply with the one that is authorized to access your second Google Drive account. + +The generated preferences file will look like this: + +``` +{ + "name": "gcd2", + "id": "folder2", + "storage": "gcd://storage2", + ... +} +``` +... + +To reference the second token file, set up a different environment variable with `gcd2` in the name: + +``` +export DUPLICACY_GCD2_GCD_TOKEN=/path/to/tokenfile2 +``` + +If you use the `set` command instead, note that the key will be the same: + +``` +duplicacy set -key gcd_token -value /path/to/tokenfile2 +``` + +This is because the `set` command writes the key/value pair to the `preferences` file, which is per repository, so there won't be any conflict here. + +#### To perform the backup of `folder1` use the command: + +``` +cd /path/to/folder1; duplicacy backup -storage gcd1 +``` + +#### To perform the backup of `folder2` use the command: +``` +cd /path/to/folder2; duplicacy backup -storage gcd1 +``` \ No newline at end of file diff --git a/Pre-Command-and-Post-Command-Scripts.md b/Pre-Command-and-Post-Command-Scripts.md index 327dbd6..ba01eaa 100644 --- a/Pre-Command-and-Post-Command-Scripts.md +++ b/Pre-Command-and-Post-Command-Scripts.md @@ -1,3 +1,7 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +You can instruct Duplicacy to run a script before or after executing a command. -Your page is available at: https://forum.duplicacy.com/t/pre-command-and-post-command-scripts/1100 \ No newline at end of file +For example, if you create a bash script with the name `pre-prune` under the `.duplicacy/scripts` directory, this bash script will be run before the `prune` command starts. A script named `post-prune` will be run after the `prune` command finishes. + +This rule applies to all commands except `init`. + +On Windows these scripts should have the `.bat` extension in their names, while on linux they should have no extension. \ No newline at end of file diff --git a/Quick-Start.md b/Quick-Start.md index 7186787..ef7a83d 100644 --- a/Quick-Start.md +++ b/Quick-Start.md @@ -1,3 +1,83 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Once you have the Duplicacy executable on your path, you can change to the directory that you want to back up (called *repository*) and run the *init* command: -Your page is available at: https://forum.duplicacy.com/t/duplicacy-quick-start-cli/1101 \ No newline at end of file +``` +$ cd path/to/your/repository +$ duplicacy init mywork sftp://user@192.168.1.100/path/to/storage +``` + +This *init* command connects the repository with the remote storage at 192.168.1.100 via SFTP. It will initialize the remote storage if this has not been done before (create the required duplicacy config files and folders), but it requires that the folder already exists on the storage (duplicacy will not create it). It also assigns the repository id *mywork* to the repository. This repository id is used to uniquely identify this repository if there are other repositories that also back up to the same storage. + +You can now create backups of the repository by invoking the *backup* command. The first backup may take a while depending on the size of the repository and the upload bandwidth. Subsequent backups will be much faster, as only new or modified files will be uploaded. Each backup is identified by the repository id and an increasing revision number starting from 1. + +```sh +$ duplicacy backup -stats +``` + +The *restore* command rolls back the repository to a previous revision: +```sh +$ duplicacy restore -r 1 +``` + +Sometimes you may not want to run the restore operation directly from the original repository, as it may overwrite files that have not been backed up. Or you may want to run the restore operation from a different computer. Duplicacy is very flexible in this regard, as it allows you to create a new repository no matter where it is. As long as the new repository has the same repository id, Duplicacy will treat it as a clone of the original repository: + +``` +$ cd path/to/your/restore/dir # this can be on the same or a different computer +$ duplicacy init mywork sftp://user@192.168.1.100/path/to/storage +$ duplicacy restore -r 1 +``` + +It is possible to back up two different repositories to the same storage. In fact, this is the recommended way, because this way you will take advantage of cross-computer deduplication -- identical files from different repository will get deduplicated automatically. + +``` +$ cd path/to/your/repository2 # this can be on the same or a different computer +$ duplicacy init mywork2 sftp://user@192.168.1.100/path/to/storage # different repository id but same storage url +$ duplicacy backup +``` + +Duplicacy provides a set of commands, such as list, check, diff, cat history, to manage backups: + + +```makefile +$ duplicacy list # List all backups +$ duplicacy check # Check integrity of backups +$ duplicacy diff # Compare two backups of the same repository, or the same file in two backups +$ duplicacy cat # Print a file in a backup +$ duplicacy history # Show how a file changes over time +``` + + +The *prune* command removes backups by revisions, or tags, or retention policies: + +```sh +$ duplicacy prune -r 1 # Remove the backup with revision number 1 +$ duplicacy prune -t quick # Remove all backups with the tag 'quick' +$ duplicacy prune -keep 1:7 # Keep 1 backup per day for backups older than 7 days +$ duplicacy prune -keep 7:30 # Keep 1 backup every 7 days for backups older than 30 days +$ duplicacy prune -keep 0:180 # Remove all backups older than 180 days +``` + +The first time the *prune* command is called, it removes the specified backups but keeps all unreferenced chunks as fossils. +Since it uses the two-step fossil collection algorithm to clean chunks, you will need to run it again to remove those fossils from the storage: + +```sh +$ duplicacy prune # Chunks from deleted backups will be removed if deletion criteria are met +``` + +To back up to multiple storages, use the *add* command to add a new storage. The *add* command is similar to the *init* command, except that the first argument is a storage name used to distinguish different storages: + +```sh +$ duplicacy add s3 mywork s3://amazon.com/mybucket/path/to/storage +``` + +You can back up to any storage by specifying the storage name: + +```sh +$ duplicacy backup -storage s3 +``` + +However, backups created this way will be different on different storages, if the repository has been changed during two backup operations. A better approach, is to use the *copy* command to copy specified backups from one storage to another: + +```sh +$ duplicacy copy -r 1 -to s3 # Copy backup at revision 1 to the s3 storage +$ duplicacy copy -to s3 # Copy every backup to the s3 storage +``` diff --git a/RSA-encryption.md b/RSA-encryption.md new file mode 100644 index 0000000..9aac792 --- /dev/null +++ b/RSA-encryption.md @@ -0,0 +1,131 @@ +Starting from version 2.3.0, you can initialize a storage with an RSA public key. Backups can be created as usual, but to restore files you'll need to provide the corresponding private key. + +# Duplicacy with RSA Encryption + +#### Initialization + +To initialize a new encrypted storage with the RSA encryption enabled, run the following command: + +``` + +$ duplicacy init -e -key public.pem repository_id storage_url + +``` + +The RSA encryption can be only enabled if the storage is encrypted (by the `-e` option). + +The RSA public key, along with other configuration parameters, will be stored in the file named `config` which is then uploaded to the storage. + +You can verify if the RSA encryption is turned on by running the info command in the following way: + +``` + +$ duplicacy -d info storage_url + +... + +RSA public key: -----BEGIN PUBLIC KEY----- + +... + +-----END PUBLIC KEY----- + +``` + +#### Backup and Restore + +No extra option is needed when you run the backup command. You'll see a log message that says `RSA encryption is enabled`. + +``` + +$ duplicacy backup + +Storage set to ... + +RSA encryption is enabled + +... + +``` + +Note that when the RSA encryption is enabled, only file contents are encrypted by the RSA encryption. File metadata, such as modification times, permissions, and extended attributes are not protected by the RSA encryption (but still protected by the storage password). + +To restore you'll need the RSA private key: + +``` + +$ duplicacy restore -r 1 -key private.pem + +``` + +### Other commands + +Other commands that take the RSA private key are list, check, cat, diff, and copy. + +For the check command, you'll only need the RSA private key with the `-files` option, which is used to verify the integrity of every file. + +You can run the check and prune commands without the RSA private key to manage backups encrypted with the RSA public key. + +### Copy with RSA encryption + +If you want to switch to the RSA encryption for an existing storage, you can create a new encrypted storage with the RSA encryption enabled and then copy existing backups to the new storage: + +``` + +duplicacy add -e -key public.pem -copy default new_storage_name repository_id new_storage_url + +duplicacy copy -from default -to new_storage_name + +``` + +Vice versa, you can copy from an RSA encrypted storage to a new storage without RSA encryption: + +``` + +duplicacy add -e -copy default new_storage_name repository_id new_storage_url + +duplicacy copy -key private.pem -from default -to new_storage_name + +``` + +### Key generation + +You can run these commands to generate the private and public key pair: + +``` +openssl genrsa -aes256 -out private.pem 2048 +openssl rsa -in private.pem -pubout -out public.pem +``` +The key needs to be in the PEM format. + +### How it works + +The RSA encryption is performed on the chunk level. Previously, an encrypted chunk always starts with the header `duplicacy\000`, followed by the nonce and encrypted chunk content: + +``` + +----------------------------------------------- + +duplicacy\000 | nonce | encrypted chunk content + +----------------------------------------------- + +``` + +Note that the key used to encrypt the chunk content isn’t stored here. Rather, that key is derived from the hash of the chunk content. + +Chunks with the RSA encryption enabled will start with a new header `duplicacy\002`. The key to encrypt the chunk content is no longer derived from the hash of the chunk content. Instead, the key is randomly generated (unique to each chunk), and then encrypted by the RSA public key, and stored after the chunk header: + +``` + +------------------------------------------------------------------- + +duplicacy\002 | RSA encrypted key | nonce | encrypted chunk content + +------------------------------------------------------------------- + +``` + +To decrypt such a chunk, Duplicacy will first recover the key from the RSA encrypted key (which requires the RSA private key), and then use that key to decrypt the chunk content. + +RSA encryption only applies to file chunks, not metadata chunks. Therefore, the file names, timestamps, permissions, attributes, etc are not protected by the RSA public key (but still protected by the storage password). \ No newline at end of file diff --git a/Restore-to-a-different-folder-or-computer.md b/Restore-to-a-different-folder-or-computer.md index e254889..e782052 100644 --- a/Restore-to-a-different-folder-or-computer.md +++ b/Restore-to-a-different-folder-or-computer.md @@ -1,3 +1,30 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +We've briefly explained how to restore a backup to a different folder or computer in the [[Quick Start]] page, but this topic is so important that it deserves its own page. -Your page is available at: https://forum.duplicacy.com/t/restore-to-a-different-folder-or-computer/1103 \ No newline at end of file +Suppose you've set up a repository with the repository id `backup1` and run the first backup: +``` +cd path/to/repository1 +duplicacy init backup1 sftp://user@192.168.1.100/path/to/storage +duplicacy backup -stats +``` + +If you edit some files in the repository and later want to undo the edits, you can simply run the restore command: +``` +cd path/to/repository1 +duplicacy restore -r 1 +``` + +But what if you want to restore the original files to a different folder other than `repository1`? You'll notice that the [[restore]] command does not have a `-restore-to` option. + +This is because the use of the repository id makes such an option unnecessary. In Duplicacy, a repository id is what uniquely identify a repository. In fact, Duplicacy does not care about the path of the repository nor does it keep track of it. + +Therefore, when you create a new repository on a different folder on the same or a different computer, if you pass the same repository id to Duplicacy, Duplicacy will think it is the same repository and allow you to restore files (or to continue to run new backups), regardless of the location of the new repository. + +``` +cd path/to/repository2 # this can be on the same or a different computer +duplicacy init backup1 sftp://user@192.168.1.100/path/to/storage # the storage url must be the same +duplicacy restore -r 1 +``` + +In other words, the repository id is the only thing that you need in order to restore an existing backup (if you know where the storage is). Do you need to write it down in a safe place in case your computer crashes? The answer is no. If you log in to the remote computer where the storage reside (or the website of the cloud storage provider) and look under the storage folder. You'll see a `snapshots` subfolder there under which you will find all repository ids -- the name of every subfolder under `snapshots` is a repository id. + +And yes, there may be more than one repository id under the `snapshots` folder, because multiple repositories can back up to the same storage, taking advantage of the unique feature of Duplicacy -- cross-computer deduplication. \ No newline at end of file diff --git a/Scripts-and-utilities-index.md b/Scripts-and-utilities-index.md index 76c6a0b..88d284e 100644 --- a/Scripts-and-utilities-index.md +++ b/Scripts-and-utilities-index.md @@ -1,3 +1,18 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Since there are multiple use cases for Duplicacy on multiple operating systems, some users have also created various automation scripts and other utilities such as ignore file templates (`filters` file). Here is a list of them: -Your page is available at: https://forum.duplicacy.com/t/scripts-and-utilities-index/1104 \ No newline at end of file + + +* [Automation on Windows](https://github.com/TheBestPessimist/duplicacy-utils) contributed by [@TheBestPessimist](https://github.com/TheBestPessimist). + * Uses Powershell and Task Scheduler to run Duplicacy on a schedule. + +* [`filters` file template](https://github.com/TheBestPessimist/duplicacy-utils) contributed by [@TheBestPessimist](https://github.com/TheBestPessimist). + * One ignore file both for Windows and MacOS. + +* [duplicacy-util](https://github.com/jeffaco/duplicacy-util) contributed by [@jeffaco](https://github.com/jeffaco). + * Cross platform utility to run `duplicacy`. Tested on Windows, Mac OS/X, and Linux. Should run on any platform supported by `duplicacy` itself, as both were written in Go. Self-contained image (does not need any packages to be installed). + +* [duplicacy-autobackup](https://github.com/christophetd/duplicacy-autobackup) contributed by [@christophetd](https://github.com/christophetd) + * Painless automated backups to multiple storage providers with Docker and duplicacy. + +* [duplicacy-script](https://github.com/mattjm/duplicacy-script) contributed by [@mattjm](https://github.com/mattjm) + * Getting Started guide for Windows with brief explanations of command line options, basic Powershell script for local and remote backups, and a filter file specifically targeted for Windows user profiles. \ No newline at end of file diff --git a/Snapshot-Format.md b/Snapshot-Format.md index 8887217..63afd28 100644 --- a/Snapshot-Format.md +++ b/Snapshot-Format.md @@ -1,3 +1,116 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +A snapshot file is a file that the backup procedure uploads to the file storage after it finishes splitting files into +chunks and uploading all new chunks. It mainly contains metadata for the backup overall, metadata for all the files, +and chunk references for each file. Here is an example snapshot file for a repository containing 3 files (file1, file2, +and dir1/file3): -Your page is available at: https://forum.duplicacy.com/t/snapshot-file-format/1106 \ No newline at end of file +```json +{ + "id": "host1", + "revision": 1, + "tag": "first", + "start_time": 1455590487, + "end_time": 1455590487, + "files": [ + { + "path": "file1", + "content": "0:0:2:6108", + "hash": "a533c0398194f93b90bd945381ea4f2adb0ad50bd99fd3585b9ec809da395b51", + "size": 151901, + "time": 1455590487, + "mode": 420 + }, + { + "path": "file2", + "content": "2:6108:3:7586", + "hash": "f6111c1562fde4df9c0bafe2cf665778c6e25b49bcab5fec63675571293ed644", + "size": 172071, + "time": 1455590487, + "mode": 420 + }, + { + "path": "dir1/", + "size": 102, + "time": 1455590487, + "mode": 2147484096 + }, + { + "path": "dir1/file3", + "content": "3:7586:4:1734", + "hash": "6bf9150424169006388146908d83d07de413de05d1809884c38011b2a74d9d3f", + "size": 118457, + "time": 1455590487, + "mode": 420 + } + ], + "chunks": [ + "9f25db00881a10a8e7bcaa5a12b2659c2358a579118ea45a73c2582681f12919", + "6e903aace6cd05e26212fcec1939bb951611c4179c926351f3b20365ef2c212f", + "4b0d017bce5491dbb0558c518734429ec19b8a0d7c616f68ddf1b477916621f7", + "41841c98800d3b9faa01b1007d1afaf702000da182df89793c327f88a9aba698", + "7c11ee13ea32e9bb21a694c5418658b39e8894bbfecd9344927020a9e3129718" + ], + "lengths": [ + 64638, + 81155, + 170593, + 124309, + 1734 + ] +} +``` + +When Duplicacy splits a file in chunks using the variable-size chunking algorithm, if the end of a file is reached and yet the boundary marker for terminating a chunk +hasn't been found, the next file, if there is one, will be read in and the chunking algorithm continues. It is as if all +files were packed into a big tar file which is then split into chunks. + +The *content* field of a file indicates the indexes of starting and ending chunks and the corresponding offsets. For +instance, *file1* starts at chunk 0 offset 0 while ends at chunk 2 offset 6108, immediately followed by *file2*. + +The backup procedure can run in one of two modes. In the default quick mode, only modified or new files are scanned. Chunks only +referenced by old files that have been modified are removed from the chunk sequence, and then chunks referenced by new +files are appended. Indices for unchanged files need to be updated too. + +In the safe mode (enabled by the -hash option), all files are scanned and the chunk sequence is regenerated. + +The length sequence stores the lengths for all chunks, which are needed when calculating some statistics such as the total +length of chunks. For a repository containing a large number of files, the size of the snapshot file can be tremendous. +To make the situation worse, every time a big snapshot file would have been uploaded even if only a few files have been changed since +last backup. To save space, the variable-size chunking algorithm is also applied to the three dynamic fields of a snapshot +file, *files*, *chunks*, and *lengths*. + +Chunks produced during this step are deduplicated and uploaded in the same way as regular file chunks. The final snapshot file +contains sequences of chunk hashes and other fixed size fields: + +```json +{ + "id": "host1", + "revision": 1, + "start_time": 1455590487, + "tag": "first", + "end_time": 1455590487, + "file_sequence": [ + "21e4c69f3832e32349f653f31f13cefc7c52d52f5f3417ae21f2ef5a479c3437", + ], + "chunk_sequence": [ + "8a36ffb8f4959394fd39bba4f4a464545ff3dd6eed642ad4ccaa522253f2d5d6" + ], + "length_sequence": [ + "fc2758ae60a441c244dae05f035136e6dd33d3f3a0c5eb4b9025a9bed1d0c328" + ] +} +``` + +In the extreme case where the repository has not been modified since last backup, a new backup procedure will not create any new chunks, +as shown by the following output from a real use case: + +``` +$ duplicacy backup -stats +Storage set to sftp://gchen@192.168.1.100/Duplicacy +Last backup at revision 260 found +Backup for /Users/gchen/duplicacy at revision 261 completed +Files: 42367 total, 2,204M bytes; 0 new, 0 bytes +File chunks: 447 total, 2,238M bytes; 0 new, 0 bytes, 0 bytes uploaded +Metadata chunks: 6 total, 11,753K bytes; 0 new, 0 bytes, 0 bytes uploaded +All chunks: 453 total, 2,249M bytes; 0 new, 0 bytes, 0 bytes uploaded +Total running time: 00:00:05 +``` \ No newline at end of file diff --git a/Storage-Backends.md b/Storage-Backends.md index 18b6d67..d7e5500 100644 --- a/Storage-Backends.md +++ b/Storage-Backends.md @@ -1,3 +1,226 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +Duplicacy currently supports local file storage, SFTP, WebDav and many cloud storage providers. -Your page is available at: https://forum.duplicacy.com/t/supported-storage-backends/1107 \ No newline at end of file +
Local disk + +``` +Storage URL: /path/to/storage (on Linux or Mac OS X) + C:\path\to\storage (on Windows) +``` +
+ +
SFTP + +``` +Storage URL: sftp://username@server/path/to/storage (path relative to the home directory) + sftp://username@server//path/to/storage (absolute path) +``` + +Login methods include password authentication and public key authentication. You can set up SSH agent forwarding which is also supported by Duplicacy. + +**Note for Synology users** +If the SFTP server is a Synology NAS, it is highly recommended to use the absolute path (the one with double slashes) in the storage url. Otherwise, Synology's customized SFTP server may terminate the connections arbitrarily leading to frequent EOF errors. + + +
+ +
Dropbox + +``` +Storage URL: dropbox://path/to/storage +``` + +For Duplicacy to access your Dropbox storage, you must provide an access token that can be obtained in one of two ways: +* Create your own app on the [Dropbox Developer](https://www.dropbox.com/developers) page, and then generate the [access token](https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/) +* Or authorize Duplicacy to access its app folder inside your Dropbox (following [this link](https://duplicacy.com/dropbox_start.html)), and Dropbox will generate the access token (which is not visible to us, as the redirect page showing the token is merely a static html hosted by Dropbox). The actual storage folder will be the path specified in the storage url relative to the `Apps` folder. + +
+ +
Amazon S3 + +``` +Storage URL: s3://amazon.com/bucket/path/to/storage (default region is us-east-1) + s3://region@amazon.com/bucket/path/to/storage (other regions must be specified) +``` + +You'll need to input an access key and a secret key to access your Amazon S3 storage. + +Minio-based S3 compatiable storages are also supported by using the `minio` or `minios` backends: +``` +Storage URL: minio://region@host/bucket/path/to/storage (without TLS) +Storage URL: minios://region@host/bucket/path/to/storage (with TLS) +``` + +There is another backend that works with S3 compatible storage providers that require V2 signing: +``` +Storage URL: s3c://region@host/bucket/path/to/storage +``` + +
+ +
Wasabi + +``` + +Storage URL: + wasabi://us-east-1@s3.wasabisys.com/bucket/path + wasabi://us-east-2@s3.us-east-2.wasabisys.com/bucket/path + wasabi://us-west-1@s3.us-west-1.wasabisys.com/bucket/path + wasabi://eu-central-1@s3.eu-central-1.wasabisys.com/bucket/path + +``` +Where `region` is the storage region, `bucket` is the name of the bucket and `path` is the path to the top of the Duplicacy storage within the bucket. Note that `us-west-1` additionally has the `region` in the host name but `us-east-1` does not. + + +[Wasabi](https://wasabi.com) is a relatively new cloud storage service providing a S3-compatible API. It is well-suited for storing backups, because it is much cheaper than Amazon S3 with a storage cost of $0.0049/GB/month (see note below), and no additional charges on API calls and download bandwidth. + +### S3 and Billing + +#### Short Version + +The `s3` storage backend renames objects with a copy and delete which is inexpensive for AWS but more expensive for Wasabi. Use the `wasabi` backend for it to be handled properly. + +#### Long Version + +Wasabi's billing model differs from Amazon's in that any object created incurs charges for 90 days of storage, even if the object is deleted earlier than that, and then the monthly rate thereafter. + +As part of the [process for purging data which is no longer needed](https://github.com/gilbertchen/duplicacy/wiki/Lock-Free-Deduplication#two-step-fossil-collection), Duplicacy renames objects. Because S3 does not support renaming objects, Duplicacy's `s3` backend does the equivalent by using S3's copy operation to create a second object with the new name then deleting the one with the old name. S3-style renaming with Wasabi will incur additional charges during fossilization because of the additional objects it creates. For example, if a new 1 GB file is backed up in chunks on day 1, the initial storage will incur fees of at least $0.0117 (three months at $0.0039 each). If the file goes away and all snapshots that contained it are pruned on day 50, renaming the chunks will create an additional 1 GB of objects with a newly-started 90-day clock at a cost of $0.0117. + +The `wasabi` backend uses Wasabi's rename operation to avoid these extra charges. + + +### Snapshot Pruning + +Wasabi's 90-day minimum for stored data means there is no financial incentive to reduce utilization through early pruning of snapshots. Because of this, the strategy shown in the documentation for the [[prune]] command can be shortened to the following without incurring additional charges: + +``` + # Keep all snapshots younger than 90 days by doing nothing +$ duplicacy prune -keep 7:90 # Keep 1 snapshot every 7 days for snapshots older than 90 days +$ duplicacy prune -keep 30:180 # Keep 1 snapshot every 30 days for snapshots older than 180 days +$ duplicacy prune -keep 0:360 # Keep no snapshots older than 360 days +``` + +
+ +
DigitalOcean Spaces + +``` +Storage URL: s3://nyc3@nyc3.digitaloceanspaces.com/bucket/path/to/storage +``` + +[DigitalOcean Spaces](https://www.digitalocean.com/products/spaces/) is a s3-compatible cloud storage provided by DigitalOcean. The storage cost starts at $5 per month for 250GB and $0.02 for each additional GB. DigitalOcean Spaces has the lowest bandwidth cost (1TB free per account and $0.01/GB additionally) among those who charge bandwidth fees. There are no API charges which further lowers the overall cost. + +Here is a tutorial on how to set up Duplicacy to work with DigitalOcean Spaces: https://www.digitalocean.com/community/tutorials/manage-backups-cloud-duplicacy +
+ + +
Google Cloud Storage + +``` +Storage URL: gcs://bucket/path/to/storage +``` + +Starting from version 2.0.0, a new Google Cloud Storage backend is added which is implemented using the [official Google client library](https://godoc.org/cloud.google.com/go/storage). You must first obtain a credential file by [authorizing](https://duplicacy.com/gcp_start) Duplicacy to access your Google Cloud Storage account or by [downloading](https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts) a service account credential file. + +You can also use the s3 protocol to access Google Cloud Storage. To do this, you must enable the [s3 interoperability](https://cloud.google.com/storage/docs/migrating#migration-simple) in your Google Cloud Storage settings and set the storage url as `s3://storage.googleapis.com/bucket/path/to/storage`. + +
+ +
Microsoft Azure + +``` +Storage URL: azure://account/container +``` + +You'll need to input the access key once prompted. + +
+ +
Backblaze B2 + +``` +Storage URL: b2://bucketname +``` + +You'll need to enter the account id and the master application key. However, if you are using an application key to access your B2 account, you'll need to enter the application key id and the application key instead. + +Backblaze's B2 storage is one of the least expensive (at 0.5 cent per GB per month, with a download fee of 1 cent per GB, plus additional charges for API calls). + +Please note that if you back up multiple repositories to the same bucket, the [lifecyle rules](https://www.backblaze.com/b2/docs/lifecycle_rules.html) of the bucket is recommended to be set to `Keep all versions of the file` which is the default one. The `Keep prior versions for this number of days` option will work too if the number of days is more than 7. + +
+ +
Google Drive + +``` +Storage URL: gcd://path/to/storage +``` + +To use Google Drive as the storage, you first need to download a token file from https://duplicacy.com/gcd_start by authorizing Duplicacy to access your Google Drive, and then enter the path to this token file to Duplicacy when prompted. + +
+ +
Microsoft OneDrive + +``` +Storage URL: one://path/to/storage +``` + +To use Microsoft OneDrive as the storage, you first need to download a token file from https://duplicacy.com/one_start by authorizing Duplicacy to access your OneDrive, and then enter the path to this token file to Duplicacy when prompted. + +
+ +
Hubic + +``` +Storage URL: hubic://path/to/storage +``` + +To use Hubic as the storage, you first need to download a token file from https://duplicacy.com/hubic_start by authorizing Duplicacy to access your Hubic drive, and then enter the path to this token file to Duplicacy when prompted. + +Hubic offers the most free space (25GB) of all major cloud providers and there is no bandwidth charge (same as Google Drive and OneDrive), so it may be worth a try. + +Note that hubic no longer allows the creation of new accounts. + +
+ +
OpenStack Swift + +``` +Storage URL: swift://user@auth_url/container/path +``` + +If the storage requires more parameters you can specify them in the query string: + +``` +swift://user@auth_url/container/path?tenant=&domain= +``` + +The following is the list of parameters accepted by the query string: + +* domain +* domain_id +* user_id +* retries +* user_agent +* timeout +* connection_timeout +* region +* tenant +* tenant_id +* endpiont_type +* tenant_domain +* tenant_domain_id +* trust_id + +This backend is implemented using https://github.com/ncw/swift. + +
+ +
WebDav + +``` +Storage URL: webdav://username@server/path/to/storage (path relative to the home directory) + webdav://username@server//path/to/storage (absolute path) +``` + +
\ No newline at end of file diff --git a/add.md b/add.md index 0d08997..dd1d482 100644 --- a/add.md +++ b/add.md @@ -1,3 +1,130 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +The `add` command connects another storage to the current repository. -Your page is available at: https://forum.duplicacy.com/t/add-command-details/1074 \ No newline at end of file +Like the `init` command, if the storage has not been initialized before a storage configuration file derived from the command line options will be created. +If the `add`-ed storage has already been initialised before, then the command line options will be ignored and existing options are used. + +[Click here for a list of related forum topics.](https://forum.duplicacy.com/tags/add) + +# Quick overview +``` +SYNOPSIS: + duplicacy add - Add an additional storage to be used for the existing repository + +USAGE: + duplicacy add [command options] + +OPTIONS: + -encrypt, -e encrypt the storage with a password + -chunk-size, -c the average size of chunks (default is 4M) + -max-chunk-size, -max the maximum size of chunks (default is chunk-size*4) + -min-chunk-size, -min the minimum size of chunks (default is chunk-size/4) + -iterations the number of iterations used in storage key derivation (default is 16384) + -copy make the new storage compatible with an existing one to allow for copy operations + -bit-identical (when using -copy) make the new storage bit-identical to also allow rsync etc. + -repository specify the path of the repository (instead of the current working directory) +``` + + +# Usage + +`duplicacy add [command options] ` + +##### Example + +Add an encrypted webdav storage with the name `webdav-storage`, the path `webdav://username@server//path/to/storage`, the snapshot name `my-documents-snap` and copy all settings from the existing `default` storage. + +```duplicacy add -e -copy default -bit-identical webdav-storage my-documents-snap webdav://username@server//path/to/storage``` + + +# Options + +### `-chunk-size, -c ` + +The average size of a chunk. + +##### Example + +``` +duplicacy add -c 4M # the average chunk size is 4MB +``` + +--- + +### `-max-chunk-size, -max ` + +The maximum size of a chunk. + +##### Example + +``` +duplicacy add -max 16M # the maximum chunk size is 16MB +``` + +--- + +### `-min-chunk-size, -min ` + +The minimum size of a chunk. + +##### Example + +``` +duplicacy add -min 1M # the minimumchunk size is 1MB +``` + +--- + +### `-iterations ` + +The `-iterations` option specifies how many iterations are used to generate the key that encrypts the `config` file from the storage password. + +--- + +### `-copy ` + +The `-copy` option is required if later you want to [copy](https://forum.duplicacy.com/t/copy-command-details/1083) snapshots between this storage and another storage. + +Two different storage are copy-compatible if they have the same `chunk size`, the same `maximum chunk size`, the same `minimum chunk size`, the same `chunk seed` (used in calculating the rolling hash in the variable-size chunks algorithm), and the same `hash key`. + +If the `-copy` option is specified, these parameters will be copied from the existing storage rather than from the command line. + +--- + +### `-bit-identical` + +The `-bit-identical` option is used along with the `-copy` option and will copy the `IDKey`, `ChunkKey` and `FileKey` to the new storage from the old one. In this case the names of the chunks generated by Duplicacy during backup will be identical in the source and new storage. + +This has the effect that you can [rclone](https://rclone.org/) the `chunks` folder for example from local (source) to Google Drive (new storage), and then only do backups on Google Drive, and the existing chunks will be identical (same name, same size) as if the backup was run locally. + +The `-bit-identical` option does not copy the encryption option. It is possible to have an encrypted source and an unencrypted new storage, or vice versa. The `-e` option determines whether or not the new storage will be encrypted. + +This means of course, that the added storage can have a different password from the source. + +##### Example + +`duplicacy add -copy default -bit-identical` + + +### `-repository ` + +There's a huge discussion about this option here: https://forum.duplicacy.com/t/repository-init-and-add-options/1195/ + +--- + +# Notes + +:bulb: Each storage name must be _unique_ in order to distinguish it from other storage. + + +### :exclamation: Never `-bit-idential` from unencrypted to encrypted + +Please note that when the `-bit-identical` option is used to make a new encrypted storage compatible with an unencrypted one, it is possible to replace the new `config` file with the standard `config` file (which contains keys with standard values) and the new storage will become unencrypted. Therefore, the `-bit-identical` option should never be used to copy the `config` file for a new encrypted storage from an unencrypted storage. + + +### :warning: Only rclone if `-bit-identical` is used + +Rclone is only feasible when `-copy -bit-identical` is used. Otherwise you will just create chunks which are not used in the backups! (storage space, time, and electricity will be wasted) + +This happens because new chunks have different names in the 2 storage. + +(_Help needed: there should be a topic about this -> please link it if you know which was_) \ No newline at end of file diff --git a/backup.md b/backup.md index f50daaa..cd980bb 100644 --- a/backup.md +++ b/backup.md @@ -1,3 +1,95 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). - -Your page is available at: https://forum.duplicacy.com/t/backup-command-details/1077 \ No newline at end of file +The `backup` command creates a snapshot of the repository and uploads it to the storage. +[Click here for a list of related forum topics.](https://forum.duplicacy.com/tags/backup) +# Quick overview +``` +NAME: + duplicacy backup - Save a snapshot of the repository to the storage +USAGE: + duplicacy backup [command options] +OPTIONS: + -hash detect file differences by hash (rather than size and timestamp) + -t assign a tag to the backup + -stats show statistics during and after backup + -threads number of uploading threads + -limit-rate the maximum upload rate (in kilobytes/sec) + -dry-run dry run for testing, don't backup anything. Use with -stats and -d + -vss enable the Volume Shadow Copy service (Windows and macOS using APFS only) + -vss-timeout the timeout in seconds to wait for the Volume Shadow Copy operation to complete + -storage backup to the specified storage instead of the default one + -enum-only enumerate the repository recursively and then exit +``` +# Usage +`duplicacy backup [command options]` +# Options +### `-hash` +By default duplicacy detects modified files (since last backup) by comparing file sizes and timestamps. This is the default option and works in more than 99% of cases. +If `-hash` is provided, duplicacy will scan every file in your repository by reading its contents in order to detect the modifications. +Using this option leads to increased running time since all the files are read from the disk, so it is not advised to run all the backups with the option. +##### Example +``` +duplicacy backup -hash +``` +--- +### `-t ` +You can assign a tag to this snapshot so that later you can refer to it by tag in other commands, instead of revision number. +##### Example +``` +duplicacy backup -t some_revisions_are_more_special_and_have_this_tag +``` +--- +### `-stats` +If the `-stats` option is specified, statistical information such as transfer speed and the number of chunks uploaded or skipped will be displayed throughout the backup process. +--- +### `-threads ` +This option is used to specify more than one thread to upload chunks. This is generally useful to increase upload speed. +:bulb: You should test the best number of threads for your connection and storage provider but using more than 30 threads is unadvised as it will not improve speeds significantly. +:point_up: Reading the repository is always done using only 1 thread, since even a normal HDD can have read speeds of 50MB per second or more, which is more than most storage providers offer for upload per user. +##### Example +```duplicacy backup -threads 10 # use 10 threads for the upload process``` +--- +### `-limit-rate ` +The `-limit-rate` option sets a cap on the maximum upload rate. +This applies to the global upload speed, disregarding the number of threads used. +##### Example +``` +duplicacy backup -limit-rate 650 # maximum upload speed is 650 kb/s +duplicacy backup -threads 10 -limit-rate 650 # maximum total upload speed for the 10 threads is 650 kb/s +``` +--- +### `-dry-run` +This option is used to test what changes the backup command would have done. It is guaranteed not to make any changes on the storage. +##### Example: +After running this nothing will be modified in the storage, but duplicacy will show all output just like a normal backup: +```duplicacy backup -dry-run``` +or +```duplicacy backup -dry-run -threads 500 -vss -stats``` +--- +### `-vss`, `-vss-timeout ` +The `-vss` option works on Windows and on MacOS (APFS format only) but not on linux. +It is used to turn on the Volume Shadow Copy (Windows) or AFPS snapshot (MacOS) services such that files opened by other processes with exclusive locks can be read as usual. +The `-vss-timeout` option changes the grace period in which duplicacy will wait for the system to enable or disable the `vss`. The minimum is 60 seconds. This option should be used only with `-vss`. +##### Example +``` +duplicacy backup -vss +duplicacy backup -vss -vss-timeout 200 # wait 200 seconds instead of 60 +``` +--- +### `-storage ` +When the repository can have multiple storage destinations (added by the [add](https://forum.duplicacy.com/t/add-command-details/1074) command), you can select the storage to back up to by giving a storage name. +##### Example +``` +duplicacy backup -storage backblaze_b2_storage # backup to storage backblaze_b2_storage instead of the default one +``` +--- +### `-enum-only` +This option will only list all the files and folders included or excluded in the repository. It will not modify anything in the repository or on the storage. +This offers a quick way to test the [filters](https://forum.duplicacy.com/t/filters-include-exclude-patterns/1089) file. +Historical note: this was first proposed [here](https://github.com/gilbertchen/duplicacy/pull/405). +##### Example +``` +duplicacy -d -log backup -enum-only +``` +# Notes +### :page_facing_up: Include/exclude files and filters +You can specify patterns to include/exclude files by putting them in a file named `.duplicacy/filters`. +Please refer to [[Include/Exclude Patterns]] for how to specify the filtering patterns. \ No newline at end of file diff --git a/benchmark.md b/benchmark.md index c03d4ce..e72df20 100644 --- a/benchmark.md +++ b/benchmark.md @@ -1,3 +1,33 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy benchmark - Run a set of benchmarks to test download and upload speeds -Your page is available at: https://forum.duplicacy.com/t/benchmark-command-details/1078 \ No newline at end of file +USAGE: + duplicacy benchmark [command options] + +OPTIONS: + -file-size the size of the local file to write to and read from (in MB, default to 256) + -chunk-count the number of chunks to upload and download (default to 64) + -chunk-size the size of chunks to upload and download (in MB, default to 4) + -upload-threads the number of upload threads (default to 1) + -download-threads the number of download threads (default to 1) +``` +The benchmark command has been [introduced in duplicacy 2.1.1](https://github.com/gilbertchen/duplicacy/pull/449). It can be used to measure disk access and network speed. + +Sample output: +``` +duplicacy benchmark +Storage set to sftp://gchen@192.168.1.125/storage +Generating 244.14M byte random data in memory +Writing random data to local disk +Wrote 244.14M bytes in 3.05s: 80.00M/s +Reading the random data from local disk +Read 244.14M bytes in 0.18s: 1388.05M/s +Split 244.14M bytes into 53 chunks without compression/encryption in 1.69s: 144.25M/s +Split 244.14M bytes into 53 chunks with compression but without encryption in 2.32s: 105.02M/s +Split 244.14M bytes into 53 chunks with compression and encryption in 2.44s: 99.90M/s +Generating 64 chunks +Uploaded 256.00M bytes in 62.88s: 4.07M/s +Downloaded 256.00M bytes in 63.01s: 4.06M/s +Deleting 64 temporary files +``` \ No newline at end of file diff --git a/cat.md b/cat.md index 79af8cc..fc78a07 100644 --- a/cat.md +++ b/cat.md @@ -1,3 +1,48 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +The `cat` command prints to stdout a file or the entire snapshot content if no file is specified. -Your page is available at: https://forum.duplicacy.com/t/cat-command-details/1080 \ No newline at end of file +[Click here for a list of related forum topics.](https://forum.duplicacy.com/tags/cat) + + +# Quick overview +``` +NAME: + duplicacy cat - Print to stdout the specified file, or the snapshot content if no file is specified + +USAGE: + duplicacy cat [command options] [] + +OPTIONS: + -id retrieve from the snapshot with the specified id + -r the revision number of the snapshot + -storage retrieve the file from the specified storage +``` + +# Usage + +``` +duplicacy cat [command options] [] +``` + +# Options + + +--- +### `-id ` + +You can specify a different snapshot id rather than the default id. + +--- +### `-r ` + +By default the latest revision is selected. +If `-r` is specified, then that revision is selected instead. + +--- +### `-storage ` + +You can use the `-storage` option to select a different storage other than the default one. + + +# Notes + +The file must be specified with a path relative to the repository. \ No newline at end of file diff --git a/check.md b/check.md index e93d43a..d89ef2c 100644 --- a/check.md +++ b/check.md @@ -1,3 +1,31 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy check - Check the integrity of snapshots -Your page is available at: https://forum.duplicacy.com/t/check-command-details/1081 \ No newline at end of file +USAGE: + duplicacy check [command options] + +OPTIONS: + -all, -a check snapshots with any id + -id check snapshots with the specified id rather than the default one + -r [+] the revision number of the snapshot + -t check snapshots with the specified tag + -fossils search fossils if a chunk can't be found + -resurrect turn referenced fossils back into chunks + -files verify the integrity of every file + -stats show deduplication statistics (imply -all and all revisions) + -tabular show tabular usage and deduplication statistics (imply -stats, -all, and all revisions) + -storage retrieve snapshots from the specified storage +``` +The *check* command checks, for each specified snapshot, that all referenced chunks exist in the storage. + +By default the *check* command will check snapshots created from the +current repository, but you can check all snapshots stored in the storage at once by specifying the `-all` option, or snapshots from a different repository using the `-id` option, and/or snapshots with a particular tag with the `-t` option. + +The revision number is a number assigned to the snapshot when it is being created. This number will keep increasing every time a new snapshot is created from a repository. You can refer to snapshots by their revision numbers using the `-r` option, which either takes a single revision number `-r 123` or a range `-r 123-456`. There can be multiple `-r` options. + +By default the *check* command only verifies the existence of chunks. To verify the full integrity of a snapshot, you should specify the `-files` option, which will download chunks and compute file hashes in memory, to make sure that all hashes match. + +By default the *check* command does not find fossils. If the `-fossils` option is specified, it will find the fossil if the referenced chunk does not exist. if the `-resurrect` option is specified, it will turn the fossil back into a chunk. + +When the repository can have multiple storages (added by the *add* command), you can specify the storage to check by specifying the storage name. diff --git a/copy.md b/copy.md index b651c23..3658e03 100644 --- a/copy.md +++ b/copy.md @@ -1,3 +1,23 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy copy - Copy snapshots between compatible storages -Your page is available at: https://forum.duplicacy.com/t/copy-command-details/1083 \ No newline at end of file +USAGE: + duplicacy copy [command options] + +OPTIONS: + -id copy snapshots with the specified id instead of all snapshot ids + -r [+] copy snapshots with the specified revisions + -from copy snapshots from the specified storage + -to copy snapshots to the specified storage + -download-limit-rate the maximum download rate (in kilobytes/sec) + -upload-limit-rate the maximum upload rate (in kilobytes/sec) + -threads number of uploading threads + +``` + +The *copy* command copies snapshots from one storage to another storage. They must be copy-compatible, i.e., some configuration parameters must be the same. One storage must be initialized with the `-copy` option provided by the *add* command. + +Instead of copying all snapshots, you can specify a set of snapshots to copy by giving the `-r` options. The *copy* command preserves the revision numbers, so if a revision number already exists on the destination storage the command will fail. + +If no `-from` option is given, the snapshots from the default storage will be copied. The `-to` option specified the destination storage and is required. diff --git a/diff.md b/diff.md index 543ff2b..6ae6df6 100644 --- a/diff.md +++ b/diff.md @@ -1,3 +1,50 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). -Your page is available at: https://forum.duplicacy.com/t/diff-command-details/1084 \ No newline at end of file +The `diff` command compares the same file in two different snapshots if a file is given, otherwise compares the two snapshots. + +[Click here for a list of related forum topics.](https://forum.duplicacy.com/tags/diff) + +# Quick overview +``` +NAME: + duplicacy diff - Compare two snapshots or two revisions of a file + +USAGE: + duplicacy diff [command options] [] + +OPTIONS: + -id diff snapshots with the specified id + -r [+] the revision number of the snapshot + -hash compute the hashes of on-disk files + -storage retrieve files from the specified storage +``` + +# Usage +` duplicacy diff [command options] []` + + +# Options + +--- +### `-id ` + +You can specify a different snapshot id rather than the default snapshot id. + +--- +### `-r [+]` + +If only one revision is given by `-r`, the right hand side of the comparison will be the on-disk file. + +--- +### `-hash` + +The `-hash` option can then instruct this command to compute the hash of the file. + +--- +### `-storage ` + +You can use the `-storage` option to select a different storage other than the default one. + + +# Notes + +The file must be specified with a path relative to the repository. \ No newline at end of file diff --git a/history.md b/history.md index 85fb98e..401e190 100644 --- a/history.md +++ b/history.md @@ -1,3 +1,21 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy history - Show the history of a file -Your page is available at: https://forum.duplicacy.com/t/history-command-details/1088 \ No newline at end of file +USAGE: + duplicacy history [command options] + +OPTIONS: + -id find the file in the snapshot with the specified id + -r [+] show history of the specified revisions + -hash show the hash of the on-disk file + -storage retrieve files from the specified storage +``` + +The *history* command shows how the hash, size, and timestamp of a file change over the specified set of revisions. + +You can specify a different snapshot id rather than the default snapshot id, and multiple `-r` options to specify the set of revisions. + +The `-hash` option is to compute the hash of the on-disk file. Otherwise, only the size and timestamp of the on-disk file will be included. + +You can use the `-storage` option to select a different storage other than the default one. \ No newline at end of file diff --git a/init.md b/init.md index da1bd32..5c1a4bc 100644 --- a/init.md +++ b/init.md @@ -1,3 +1,40 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy init - Initialize the storage if necessary and the current directory as the repository -Your page is available at: https://forum.duplicacy.com/t/init-command-details/1090 \ No newline at end of file +USAGE: + duplicacy init [command options] + +OPTIONS: + -encrypt, -e encrypt the storage with a password + -chunk-size, -c the average size of chunks (default is 4M) + -max-chunk-size, -max the maximum size of chunks (default is chunk-size*4) + -min-chunk-size, -min the minimum size of chunks (default is chunk-size/4) + -iterations the number of iterations used in storage key derivation (default is 16384) + -pref-dir alternate location for the .duplicacy directory (absolute or relative to current directory) + -storage-name assign a name to the storage +``` + +The *init* command first connects to the storage specified by the storage URL. If the storage has been already initialized before, it will download the storage configuration (stored in the file named *config*) and ignore the options provided in the command line. Otherwise, it will create the configuration file from the options and upload the file. + +Duplicacy will not create the destination folder on the storage if the folder does not exists. + +The initialized storage will then become the default storage for other commands if the `-storage` option is not specified for those commands. This default storage actually has a name, *default*. + +After that, it will prepare the current working directory as the repository to be backed up. Under the hood, it will create a directory named *.duplicacy* in the repository and put a file named *preferences* that stores the snapshot id and encryption and storage options. + +The snapshot id is an id used to distinguish different repositories connected to the same storage. Each repository must have a unique snapshot id. A snapshot id must contain only alphanumeric characters as well as `-` and `_`. + +The `-e` option controls whether or not encryption will be enabled for the storage. If encryption is enabled, you will be prompted to enter a storage password. The storage password is used to encrypt the `config` file only. + +If you have already created an encrypted storage to which you are now connecting, you will have to add the `-e` flag, so that you will be asked to enter the encryption password. + +The three chunk size parameters are passed to the variable-size chunking algorithm. Their values are important to the overall performance, especially for cloud storages. If the chunk size is too small, a lot of overhead will be in sending requests and receiving responses. If the chunk size is too large, the effect of de-duplication will be less obvious as more data will need to be transferred with each chunk. + +The `-iterations` option specifies how many iterations are used to generate the key that encrypts the `config` file from the storage password. + +The `-pref-dir` option is deprecated and the `-repository` option should be used instead. This option controls the location of the preferences directory. If not specified, a directory named .duplicacy is created in the repository. If specified, it must point to a non-existing directory. The directory is created and a .duplicacy file is created in the repository. The .duplicacy file contains the absolute path name to the preferences directory. + +Once a storage has been initialized with certain chunk size parameters, these parameters cannot be modified any more. + +The `-repository` option specifies how the repository root directory is defined in the preferences file. This may be specified as either an absolute or relative path. Relative paths are relative to the current working directory of Duplicacy at the time it is executed (when the preferences file is being parsed). This option allows for the possibility of the repository configuration files and the repository itself being maintained in separate file system locations. When not specified, an empty repository path is written to the preferences file, causing Duplicacy to treat its current working directory as the repository root. \ No newline at end of file diff --git a/list.md b/list.md index 87abf84..7dc5557 100644 --- a/list.md +++ b/list.md @@ -1,3 +1,29 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy list - List snapshots -Your page is available at: https://forum.duplicacy.com/t/list-command-details/1092 \ No newline at end of file +USAGE: + duplicacy list [command options] + +OPTIONS: + -all, -a list snapshots with any id + -id list snapshots with the specified id rather than the default one + -r [+] the revision number of the snapshot + -t list snapshots with the specified tag + -files print the file list in each snapshot + -chunks print chunks in each snapshot or all chunks if no snapshot specified + -reset-passwords take passwords from input rather than keychain/keyring or env + -storage retrieve snapshots from the specified storage +``` + +The *list* command lists information about specified snapshots. By default it will list snapshots created from the current repository, but you can list all snapshots stored in the storage by specifying the -all option, or list snapshots with a different snapshot id using the `-id` option, and/or snapshots with a particular tag with the `-t` option. + +The revision number is a number assigned to the snapshot when it is being created. This number will keep increasing every time a new snapshot is created from a repository. You can refer to snapshots by their revision numbers using the `-r` option, which either takes a single revision number `-r 123` or a range `-r 123-456`. There can be multiple `-r` options. + +If `-files` is specified, for each snapshot to be listed, this command will also print information about every file contained in the snapshot. + +If `-chunks` is specified, the command will also print out every chunk the snapshot references. + +The `-reset-passwords` option is used to reset stored passwords and to allow passwords to be entered again. Please refer to the [[Managing Passwords]] section for more information. + +When the repository can have multiple storages (added by the *add* command), you can specify the storage to list by specifying the storage name. \ No newline at end of file diff --git a/password.md b/password.md index 7b34b49..f2ddf01 100644 --- a/password.md +++ b/password.md @@ -1,3 +1,33 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +The `password` command decrypts the _storage configuration file_ (`config`) using the old password, and re-encrypts the file using a new password. -Your page is available at: https://forum.duplicacy.com/t/password-command-details/1099 \ No newline at end of file +It does not change all the encryption keys used to encrypt and decrypt chunk files, snapshot files, etc. + +# Quick overview + +``` +USAGE: + duplicacy password [command options] + +OPTIONS: + -storage change the password used to access the specified storage + -iterations the number of iterations used in storage key derivation (default is 16384) +``` + +# Usage + +``` +duplicacy password [command options] +``` + +The `password` command requires no arguments, changing a password is an _interactive_ process. + +# Options + +### `-storage ` + +You can specify the storage to change the password for when working with multiple storages. + +--- +### `-iterations ` + +The `-iterations` option specifies how many iterations are used to generate the key that encrypts the `config` file from the storage password. \ No newline at end of file diff --git a/prune.md b/prune.md index 54bf698..ddc075e 100644 --- a/prune.md +++ b/prune.md @@ -1,3 +1,225 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +
-Your page is available at: https://forum.duplicacy.com/t/prune-command-details/1005 \ No newline at end of file +The `prune` command has the task of deleting old/unwanted revisions and unused chunks from a storage. + +[Click here for a list of related forum topics.](https://forum.duplicacy.com/tags/prune) + +# Quick overview + +```text +NAME: + duplicacy prune - Prune revisions by number, tag, or retention policy + +USAGE: + duplicacy prune [command options] + +OPTIONS: + -id delete revisions with the specified snapshot ID instead of the default one + -all, -a match against all snapshot IDs + -r [+] delete the specified revisions + -t [+] delete revisions with the specified tags + -keep [+] keep 1 revision every n days for revisions older than m days + -exhaustive remove all unreferenced chunks (not just those referenced by deleted snapshots) + -exclusive assume exclusive access to the storage (disable two-step fossil collection) + -dry-run, -d show what would have been deleted + -delete-only delete fossils previously collected (if deletable) and don't collect fossils + -collect-only identify and collect fossils, but don't delete fossils previously collected + -ignore [+] ignore revisions with the specified snapshot ID when deciding if fossils can be deleted + -storage prune revisions from the specified storage + -threads number of threads used to prune unreferenced chunks +``` + +# Usage +`duplicacy prune [command options]` + +# Options + +Options marked with [+] can be passed more than once. + +### `-id ` +Delete revisions with the specified snapshot ID instead of the default one. +##### Example: +``` +duplicacy prune -id computer-2 +``` + +### `-all, -a` +Run the prune command against all snapshot IDs in selected storage. +##### Example: +``` +duplicacy prune -all +``` + +### `-r [+]` +Delete the specified revisions. + +##### Examples: +``` +duplicacy prune -r 6 # delete revision 6 +duplicacy prune -r 344-350 # delete revisions starting with 344 to 350 (included) +duplicacy prune -r 310 -r 1322 # delete only the revisions 310 and 1322 +``` + +### `-t [+]` +Delete revisions with the specified tags. + + + +### `-keep [+]` +Keep 1 revision every n days for revisions older than m days. + +The retention policies are specified by the `-keep` option, which accepts an argument in the form of two numbers `n:m`, where `n` indicates the number of days between two consecutive revisions to keep, and `m` means that the policy only applies to revisions at least `m` day old. If `n` is zero, any revisions older than `m` days will be removed. + +##### Examples: +``` +duplicacy prune -keep 1:7 # Keep a revision per (1) day for revisions older than 7 days +duplicacy prune -keep 7:30 # Keep a revision every 7 days for revisions older than 30 days +duplicacy prune -keep 30:180 # Keep a revision every 30 days for revisions older than 180 days +duplicacy prune -keep 0:360 # Keep no revisions older than 360 days +``` + +Multiple `-keep` options **must** be sorted by their `m` values in decreasing order. + +For example, to combine the above policies into one line, it would become: + +``` +duplicacy prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7 +``` + + +### `-exhaustive` +Remove all unreferenced chunks (not just those referenced by deleted revisions). + + +The `-exhaustive` option will scan the list of all chunks in the storage, therefore it will find not only unreferenced chunks from deleted revivions, but also chunks that become unreferenced for other reasons, such as those from an incomplete backup. + +It will also find any file that does not look like a chunk file. + +In contrast, a normal `prune` command will only identify chunks referenced by deleted revisions but not any other revisions. + +##### Example: +``` +duplicacy prune -exhaustive +``` + +### `-exclusive` +Assume exclusive access to the storage (disable two-step fossil collection). + +The `-exclusive` option will assume that no other clients are accessing the storage, effectively disabling the *two-step fossil collection* algorithm. + +With this option, the `prune` command will immediately remove unreferenced chunks. + +WARNING: Only run `-exclusive` when you are sure that **no other backup is running**, on any other device or repository. + +##### Example: +``` +duplicacy prune -exclusive +``` + +### `-dry-run, -d` +This option is used to test what changes the `prune` command would have done. It is guaranteed not to make any changes on the storage, not even creating the local fossil collection file. + +##### Example: +After running this nothing will be modified in the storage, but duplicacy will show all output just like a normal run: +``` +duplicacy prune -dry-run -all -exhaustive - exclusive +``` + + +### `-delete-only` +Delete fossils previously collected (if deletable) and don't collect fossils. + +##### Example: +``` +duplicacy prune -delete-only +``` + +### `-collect-only` +Identify and collect fossils, but don't delete fossils previously collected. + +##### Example: +``` +duplicacy prune -collect-only +``` + +The `-delete-only` option will skip the fossil collection step, while the `-collect-only` option will skip the fossil deletion step. + +### `-ignore [+]` +Ignore revisions with the specified snapshot ID when deciding if fossils can be deleted. + + +### `-storage ` +Prune revisions from the specified storage instead of the default one. + +##### Example: +``` +duplicacy prune -storage google-drive +``` + + +--- +### `-threads ` + +This option is used to specify more than one thread to prune chunks. This is generally useful to increase pruning speed. + +:bulb: You should test the best number of threads for your connection and storage provider but using more than 30 threads is unadvised as it will not improve speeds significantly. + + + +##### Example +```duplicacy prune -keep 1:7 -threads 10 # use 10 threads for the pruning process``` + + +# Notes + +:bulb: Revivions to be deleted can be specified by numbers, by a tag, by retention policies, or by any combination of these categories. + + +### :bulb: Only one repository should run prune + +Since :d: encourages multiple repositories backing up to the same storage (so that deduplication will be efficient), users might want to run prune from each different repository. + +The design of :d: however was based on the assumption that only one instance would run the prune command (using `-all`). This can greatly simplify the implementation. + +It also is a bit wasting the resources to have a prune command working on one repository id only, since it still needs to download all backups for all other repository ids in order to decide which chunks are to be deleted. + +Finally, in theory race conditions can happen when two instances try to operate on the same chunk at the same time, but in practice it may never happen especially if the prune command runs after the backup so they will start at random times. + +### Cache folder is is extremely big! :scream: + +Please read https://forum.duplicacy.com/t/cache-folder-is-is-extremely-big/2118. + +### :bulb: Pruning is logged +All prune actions are logged by default locally, on the machine where the prune command is executed, under `.duplicacy/logs`. The prune logs are named similarly to `prune-log-20171230-142510`. + +In the same folder you will also find log files which are empty. There is no need to worry if the files are empty as this means that in that particular prune operation, nothing was pruned from the storage. + +### :bulb: `-exhaustive` should be used sparingly +The `-exhaustive` option is only needed when there are known unreferenced chunks in the storage, for example, when a backup is interrupted by user and terminated due to an error **and** the files in the repository change afterwards. + +It is not recommended to run the prune command regularly with this option without a recent incomplete backup, mainly because if there is an ongoing backup from a different computer, the prune command will mark as fossils all new chunks uploaded by that backup. + +Although in the fossil deletion step the prune command can correctly identify that these chunks are actually referenced and thus turn them back into chunks, the cost of extra API calls can be excessive. + +### :bulb: The last revision can only be deleted in `-exclusive` mode + +The latest revision from each repository can’t be deleted in non-exclusive mode because in theory it is possible that a backup for that repository may be in progress which will use the latest revision as the base, so removal of the latest revision would cause some chunks to be removed even though they are needed by the backup in progress. + +### :warning: Corner cases when prune may delete too much + +There are two corner cases that a fossil still needed may be mistakenly deleted. When there is a backup taking more than 7 days that started before the chunk was marked as fossil, then the prune command will think the repository has become inactive which will then be excluded from the criteria for determining safe fossils +to be deleted. + +The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the prune command doesn't know the existence of such a repository at the fossil deletion time, it may think the fossil isn't needed any more by any backup and thus delete it permanently. + +Therefore, a check command must be used if a backup is an initial backup or takes more than 7 days. Once a backup passes the check command, it is guaranteed that it won't be affected by any future prune operations. + +### :bulb: Individual files cannot be pruned + +Note that duplicacy always prunes entire revisions of entire snapshots, not of individual files. In other words: it is not possible to remove backups of specific files from the storage. This means, for example, if you realize after a couple of months, that you have accidentally been backing up some huge useless files, the only way to remove them from the storage to free up space is to prune each and every revision in which they are included. + +### Two-step fossil collection algorithm + +The `prune` command implements the _two-step fossil collection algorithm_. It will first find fossil collection files from previous runs and check if contained fossils are eligible for permanent deletion (the _fossil deletion_ step). Then it will search for snapshots to be deleted, mark unreferenced chunks as fossils (by renaming) and save them in a new fossil collection file stored locally (the _fossil collection_ step). + +For fossils collected in the fossil collection step to be eligible for safe deletion in the fossil deletion step, at least one new snapshot from *each* snapshot id must be created between two runs of the *prune* command. However, some repository may not be set up to back up with a regular schedule, and thus literally blocking other repositories from deleting any fossils. Duplicacy by default will ignore repositories that have no new backup in the past 7 days, and you can also use the `-ignore` option to skip certain repositories when deciding the deletion criteria. \ No newline at end of file diff --git a/restore.md b/restore.md index b48a321..47da08c 100644 --- a/restore.md +++ b/restore.md @@ -1,3 +1,41 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy restore - Restore the repository to a previously saved snapshot -Your page is available at: https://forum.duplicacy.com/t/restore-command-details/1102 \ No newline at end of file +USAGE: + duplicacy restore [command options] [--] [pattern] ... + +OPTIONS: + -r the revision number of the snapshot (required) + -hash detect file differences by hash (rather than size and timestamp) + -overwrite overwrite existing files in the repository + -delete delete files not in the snapshot + -ignore-owner do not set the original uid/gid on restored files + -stats show statistics during and after restore + -threads number of downloading threads + -limit-rate the maximum download rate (in kilobytes/sec) + -storage restore from the specified storage instead of the default one +``` + +The *restore* command restores the repository to a previous revision. By default the restore procedure will treat files that have the same sizes and timestamps as those in the snapshot as unchanged files, but with the -hash option, every file will be fully scanned to make sure they are in fact unchanged. + +By default the restore procedure will not overwriting existing files, unless the `-overwrite` option is specified. + +The `-delete` option indicates that files not in the snapshot will be removed. This option is ignored if any `pattern` is specified. + +If the `-ignore-owner` option is specified, the restore procedure will not attempt to restore the original user/group id ownership on restored files (all restored files will be owned by the current user); this can be useful when restoring to a new or different machine. This option is available in a version later than 2.0.9. + +If the `-stats` option is specified, statistical information such as transfer speed, and number of chunks will be displayed throughout the restore procedure. + +The `-threads` option can be used to specify more than one thread to download chunks. + +The `-limit-rate` option sets a cap on the maximum download rate. + +When the repository can have multiple storages (added by the *add* command), you can select the storage to restore from by specifying the storage name. + +Unlike the *backup* procedure that reading the include/exclude patterns from a file, the *restore* procedure reads them from the command line. If the patterns can cause confusion to the command line argument parser, `--` should be prepended to the patterns. Please refer to the [[Include/Exclude Patterns]] section for how to specify patterns. +# Example +##### Restore the folder `C__Users_link\crist\Documents\StarCraft II\` +``` +duplicacy.exe restore -r 113 "C__Users_link\crist\Documents\StarCraft II\*" +``` \ No newline at end of file diff --git a/set.md b/set.md index d72c92f..01cc209 100644 --- a/set.md +++ b/set.md @@ -1,3 +1,36 @@ -The wiki has moved to a new home on the [Duplicacy Forum](https://forum.duplicacy.com). +``` +SYNOPSIS: + duplicacy set - Change the options for the default or specified storage -Your page is available at: https://forum.duplicacy.com/t/set-command-details/1105 \ No newline at end of file +USAGE: + duplicacy set [command options] + +OPTIONS: + -encrypt, e[=true] encrypt the storage with a password + -no-backup[=true] backup to this storage is prohibited + -no-restore[=true] restore from this storage is prohibited + -no-save-password[=true] don't save password or access keys to keychain/keyring + -nobackup-file Directories containing a file with this name will not be backed up + -key add a key/password whose value is supplied by the -value option + -value the value of the key/password + -storage use the specified storage instead of the default one + -filters specify the path of the filters file containing include/exclude patterns +``` + +The *set* command changes the options for the specified storage. + +The `-e` option turns on the storage encryption. If specified as `-e=false`, it turns off the storage encryption. + +The `-no-backup` option will not allow backups from this repository to be created. + +The `-no-restore` option will not allow restoring this repository to a different revision. + +The `-no-save-password` option will require every password or token to be entered every time and not saved anywhere. + +The `-key` and `-value` options are used to store (in plain text) access keys or tokens need by various storages. Please refer to the [[Managing Passwords]] section for more details. + +You can select a storage to change options for by specifying a storage name using the `-storage` option. + +With the `-filters` option, you can specify the path of the file containing include/exclude patterns rather than the default one (`.duplicacy/filters`). + +The `-nobackup-file` option was [introduced in version 2.1.1](https://forum.duplicacy.com/t/version-2-1-1-has-been-released/1031?u=christoph) after discussion [here](https://forum.duplicacy.com/t/file-exclusion-based-on-file-attributes-folder-exclusion-based-on-contained-file-name/485?u=christoph). \ No newline at end of file