This commit introduces a new validation step to ensure data integrity
during file uploads.
- The API's returned file name (new.File.Name) is now verified
against the requested file name (leaf) immediately after
the initial upload ticket is created.
- If a mismatch is detected, the upload process is aborted with an error,
and the defer cleanup logic is triggered to delete any partially created file.
- This addresses an unexpected API behavior where numbered suffixes
might be appended to filenames even without conflicts.
- This change prevents corrupted or misnamed files from being uploaded
without client-side awareness.
Before this change, server side copy of files with & gave the error:
Invalid Argument</Message><Resource>x-(amz|archive)-copy-source
header has bad character
This fix switches to using url.QueryEscape which escapes everything
from url.PathEscape which doesn't escape &.
Fixes#8754
This reverts commit 64ed9b175f.
This fails the integration tests with
s3_internal_test.go:434: Creating a bucket we already have created returned code: No Error
s3_internal_test.go:439:
Error Trace: backend/s3/s3_internal_test.go:439
Error: Should be true
Test: TestIntegration/FsMkdir/FsPutFiles/Internal/Versions/Mkdir
Messages: Need to set UseAlreadyExists quirk
In the current design, OpenWriterAt provides the interface for random-access
writes, and openChunkWriterFromOpenWriterAt wraps this interface to enable
parallel chunk uploads using multiple goroutines. A global connection pool is
already in place to manage SMB connections across files.
However, currently only one connection is used per file, which makes multiple
goroutines compete for the connection during multithreaded writes.
This changes create separate connections for each goroutine, which allows true
parallelism by giving each goroutine its own SMB connection
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Before this change, bisync used some global variables, which could cause errors
if running multiple concurrent bisync runs through the rc. (Running normally
from the command line was not affected.)
This change deglobalizes those variables so that multiple bisync runs can be
safely run at once, from the same rclone instance.
`Content-Type: aws-chunked` is used on S3 PUT requests to signal SigV4
streaming uploads: the body is sent in AWS-formatted chunks, each
chunk framed and HMAC-signed.
When copying from a non S3 compatible object store (like Digital
Ocean) the objects can have `Content-Type: aws-chunked` (which you
won't see on AWS S3). Attempting to copy these objects to S3 with
`--metadata` this produces this error.
aws-chunked encoding is not supported when x-amz-content-sha256 UNSIGNED-PAYLOAD is supplied
This patch makes sure `aws-chunked` is removed from the `Content-Type`
metadata both on the way in and the way out.
Fixes#8724
Before this change we were reading input from stdin using the terminal
in the default line mode which has a limit of 4095 characters.
The typical culprit was onedrive tokens (which are very long) giving the error
Couldn't decode response: invalid character 'e' looking for beginning of value
This change swaps over to use the github.com/peterh/liner read line
library which does not have that limitation and also enables more
sensible cursor editing.
Fixes#8688#8323#5835
Before this change we used the current context to start the average
loop. This means that if the context came from the rc the average loop
would be cancelled at the end of the rc request leading the speed not
being measured.
This uses the background context for the accounting loop so it doesn't
get cancelled when its parent gets cancelled.
Before this change multipart uploads using OpenChunkWriter would
account for twice the space used.
This fixes the problem by adjusting the accounting delay.
Before this change we used an overcomplicated method of memory
reservations in the pool.RW which caused deadlocks.
This changes it to use a much simpler reservation system where we
actually reserve the memory and store it in the pool.RW. This allows
us to use the semaphore.Weighted to count the actually memory in use
(rather than the memory in use and in the cache). This in turn allows
accurate use of the semaphore by users wanting memory.
Before this change the azureblob backend could deadlock when using
--max-connections. This is because when it receives InvalidBlockOrBlob
error it attempts to clear the condition before retrying. This in turn
involved recursively calling the pacer. At this point the pacer can
easily have no connections left which causes a deadlock as all the
other pacer connections are waiting for the InvalidBlockOrBlob to be
resolved.
This fixes the problem by using a temporary pacer when resolving the
InvalidBlockOrBlob errors.
Before this change, the bisync tests were directly setting the time.Local
variable to UTC.
The reason for overriding the time zone on the tests is to make them
deterministic regardless of where in the world the user happens to be. There are
some goldenized strings which have the time zone hard-coded and would result in a
miscompare failure outside of that time zone.
However, mutating the time.Local variable is not the right way to do this, as OP
correctly pointed out on #8272.
Setting the TZ environment variable from within the code was also not an ideal
solution because, while it worked on unix, it did not work on Windows. See
fbac94a799/src/time/zoneinfo.go (L79-L80)
This change fixes the issue by defining a new bisync.LogTZ setting for use when
printing timestamps in /cmd/bisync/resolve.go. We override this on the tests
instead of time.Local.
Additional to googlecloudstorage's general rate limiting, it apparently has a
separate limit for updating the same object more than once per second:
googleapi: Error 429: The object rclone-test-
demilaf1fexu/015108so/check_access/path2/modtime_write_test exceeded the rate
limit for object mutation operations (create, update, and delete). Please reduce
your request rate. See https://cloud.google.com/storage/docs/gcs429.,
rateLimitExceeded
We were encountering this in the part of the bisync tests where we create an
object, verify that we can edit its modtime, then remove it. We were not
encountering it elsewhere because it only concerns manipulations of the same
object -- not the rate of API calls in general. For the same reason, the standard
pacer is not an effective solution for enforcing this (unless, of course, we
want to slow the entire test down by setting a 1s MinSleep across the board.)
While ideally this would be handled in the backend, this gets around it by
sleeping for 1s in the relevant part of the bisync tests.
These were habitually failing at some point and ignored for that reason, but
seem to be passing now. It is possible that in the interim, the underlying issue
was resolved by another commit. If there is still an issue lurking, the nightly
tests will surely reveal it (and give us a log to look at.)
Connection string remotes like "TestGoogleCloudStorage,directory_markers:" use
commas. Before this change, these could not be passed with the -remotes flag,
which expected commas to be used only as separators.
After this change, CSV parsing is used so that commas will be properly
recognized inside a terminal-escaped and quoted value, like:
-remotes local,\"TestGoogleCloudStorage,directory_markers:\"
Before this change, setting an object's modtime with o.SetModTime() (without
updating the file's content) would inadvertently erase its md5 hash.
The documentation notes: "If this property isn't specified on the request, the
property is cleared for the file. Subsequent calls to Get File Properties won't
return this property, unless it's explicitly set on the file again."
https://learn.microsoft.com/en-us/rest/api/storageservices/set-file-properties#common-request-headers
This change fixes the issue by setting ContentMD5 (and ContentType), to the
extent we have it, during SetModTime.
Discovered on bisync integration tests such as TestBisyncRemoteRemote/resolve
Before this change, TestSFTPOpenssh integration tests would fail due to setting
copy_is_hardlink=true in /fstest/testserver/init.d/TestSFTPOpenssh.
For example, if a file was server-side copied from path1 to path2 and then the
bisync tests set the path2 modtime, the path1 modtime would also unexpectedly
mutate.
Hardlinks are not the same as copies. The bisync tests assume that they can
modify a file on one side without affecting a file on the other. This change
essentially sets --sftp-copy-is-hardlink to the default of false for the bisync
tests.
Before this change we opened the connection before allocating memory.
This meant a long wait sometimes for memory and too many connections
open.
Now we allocate the memory first before opening the connection.
Because multipart transfers can need more than one buffer to complete,
if transfers was set very high, it was possible for lots of multipart
transfers to start, grab fewer buffers than chunk size, then deadlock
because no more memory was available.
This fixes the problem by introducing a reservation system which the
multipart transfer uses to ensure it can reserve all the memory for
one chunk before starting.