1
0
mirror of https://github.com/rclone/rclone.git synced 2025-12-06 00:03:32 +00:00

fs: implement --metadata-mapper to transform metatadata with a user supplied program

This commit is contained in:
Nick Craig-Wood
2023-10-23 23:47:18 +01:00
parent 54196f34e3
commit 47ca0c326e
14 changed files with 423 additions and 51 deletions

View File

@@ -475,6 +475,10 @@ Note that arbitrary metadata may be added to objects using the
`--metadata-set key=value` flag when the object is first uploaded.
This flag can be repeated as many times as necessary.
The [--metadata-mapper](#metadata-mapper) flag can be used to pass the
name of a program in which can transform metadata when it is being
copied from source to destination.
### Types of metadata
Metadata is divided into two type. System metadata and User metadata.
@@ -1504,12 +1508,123 @@ from reaching the limit. Only applicable for `--max-transfer`
Setting this flag enables rclone to copy the metadata from the source
to the destination. For local backends this is ownership, permissions,
xattr etc. See the [#metadata](metadata section) for more info.
xattr etc. See the [metadata section](#metadata) for more info.
### --metadata-mapper SpaceSepList {#metadata-mapper}
If you supply the parameter `--metadata-mapper /path/to/program` then
rclone will use that program to map metadata from source object to
destination object.
The argument to this flag should be a command with an optional space separated
list of arguments. If one of the arguments has a space in then enclose
it in `"`, if you want a literal `"` in an argument then enclose the
argument in `"` and double the `"`. See [CSV encoding](https://godoc.org/encoding/csv)
for more info.
--metadata-mapper "python bin/test_metadata_mapper.py"
--metadata-mapper 'python bin/test_metadata_mapper.py "argument with a space"'
--metadata-mapper 'python bin/test_metadata_mapper.py "argument with ""two"" quotes"'
This uses a simple JSON based protocol with input on STDIN and output
on STDOUT. This will be called for every file and directory copied and
may be called concurrently.
The program's job is to take a metadata blob on the input and turn it
into a metadata blob on the output suitable for the destination
backend.
Input to the program (via STDIN) might look like this. This provides
some context for the `Metadata` which may be important.
- `SrcFs` is the config string for the remote that the object is currently on.
- `SrcFsType` is the name of the source backend.
- `DstFs` is the config string for the remote that the object is being copied to
- `DstFsType` is the name of the destination backend.
- `Remote` is the path of the file relative to the root.
- `Size`, `MimeType`, `ModTime` are attributes of the file.
- `IsDir` is `true` if this is a directory (not yet implemented).
- `ID` is the source `ID` of the file if known.
- `Metadata` is the backend specific metadata as described in the backend docs.
```json
{
"SrcFs": "gdrive:",
"SrcFsType": "drive",
"DstFs": "newdrive:user",
"DstFsType": "onedrive",
"Remote": "test.txt",
"Size": 6,
"MimeType": "text/plain; charset=utf-8",
"ModTime": "2022-10-11T17:53:10.286745272+01:00",
"IsDir": false,
"ID": "xyz",
"Metadata": {
"btime": "2022-10-11T16:53:11Z",
"content-type": "text/plain; charset=utf-8",
"mtime": "2022-10-11T17:53:10.286745272+01:00",
"owner": "user1@domain1.com",
"permissions": "...",
"description": "my nice file",
"starred": "false"
}
}
```
The program should then modify the input as desired and send it to
STDOUT. The returned `Metadata` field will be used in its entirety for
the destination object. Any other fields will be ignored. Note in this
example we translate user names and permissions and add something to
the description:
```json
{
"Metadata": {
"btime": "2022-10-11T16:53:11Z",
"content-type": "text/plain; charset=utf-8",
"mtime": "2022-10-11T17:53:10.286745272+01:00",
"owner": "user1@domain2.com",
"permissions": "...",
"description": "my nice file [migrated from domain1]",
"starred": "false"
}
}
```
Metadata can be removed here too.
An example python program might look something like this to implement
the above transformations.
```python
import sys, json
i = json.load(sys.stdin)
metadata = i["Metadata"]
# Add tag to description
if "description" in metadata:
metadata["description"] += " [migrated from domain1]"
else:
metadata["description"] = "[migrated from domain1]"
# Modify owner
if "owner" in metadata:
metadata["owner"] = metadata["owner"].replace("domain1.com", "domain2.com")
o = { "Metadata": metadata }
json.dump(o, sys.stdout, indent="\t")
```
You can find this example (slightly expanded) in the rclone source code at
[bin/test_metadata_mapper.py](https://github.com/rclone/rclone/blob/master/test_metadata_mapper.py).
If you want to see the input to the metadata mapper and the output
returned from it in the log you can use `-vv --dump mapper`.
See the [metadata section](#metadata) for more info.
### --metadata-set key=value
Add metadata `key` = `value` when uploading. This can be repeated as
many times as required. See the [#metadata](metadata section) for more
many times as required. See the [metadata section](#metadata) for more
info.
### --modify-window=TIME ###
@@ -1752,9 +1867,9 @@ for more info.
Eg
--password-command echo hello
--password-command echo "hello with space"
--password-command echo "hello with ""quotes"" and space"
--password-command "echo hello"
--password-command 'echo "hello with space"'
--password-command 'echo "hello with ""quotes"" and space"'
See the [Configuration Encryption](#configuration-encryption) for more info.
@@ -2503,6 +2618,12 @@ This dumps a list of the open files at the end of the command. It
uses the `lsof` command to do that so you'll need that installed to
use it.
#### --dump mapper ####
This shows the JSON blobs being sent to the program supplied with
`--metadata-mapper` and received from it. It can be useful for
debugging the metadata mapper interface.
### --memprofile=FILE ###
Write memory profile to file. This can be analysed with `go tool pprof`.