Sync with S3 with s3cmd, but not re-download files that only changed name

Sync with S3 with s3cmd, but not re-download files that only changed name - amazon-s3

I'm syncing a bunch of files between my computer and Amazon S3. Say a couple of the files change name, but their content is still the same. Do I have to have the local file removed by s3cmd and then the "new" file re-downloaded, just because it has a new name? Or is there any other way of checking for changes? I would like s3cmd to, in that case, simply change the name of the local file in accordance with the new name on the server.

s3cmd upstream (github.com/s3tools/s3cmd master branch) and 1.5.0-rc1 latest published version, can figure this out, if you used a recent version to put the file into S3 in the first place that used the --preserve option to store the md5sum of each file. Using the md5sums, it knows that you have a duplicate (even if renamed) file locally, and won't re-download it, but instead will do a local copy (or hardlink) from the file system name to the name from S3.

Related

How can I have Alluxio show all the not-yet-accessed files in the directory?

When mounted an s3 bucket under alluxio://s3/, the bucket already has objects. However, when I get the directory list (either by alluxio fs ls or ls the fuse-mounted directory or on the web ui) i see no files. When I write a new file or read an already existing object via Alluxio, it appears in the dir list. Is there a way I can have Alluxio show all the not-yet-accessed files in the directory? (rather than only showing files after writing or accessing them)

a simple way is to run bin/alluxio fs loadMetadata /s3 to force refresh the Alluxio directory. There are other ways to trigger it, checkout “How to Trigger Metadata Sync” section in this latest blog:
https://www.alluxio.io/blog/metadata-synchronization-in-alluxio-design-implementation-and-optimization/

Prevent rclone from re-copying files to AWS S3 Deep Archive

I'm using rclone in order to copy some files to an S3 bucket (deep archive). The command I'm using is:
rclone copy --ignore-existing --progress --max-delete 0 "/var/vmail" foo-backups:foo-backups/vmail
This is making rclone to copy files that I know for sure that already exist in the bucket. I tried removing the --ignore-existing flag (which IMHO is badly named, as it does exactly the opposite of what you'd initially expect), but I still get the same behaviour.
I also tried adding --size-only, but the "bug" doesn't get fixed.
How can I make rclone copy only new files?

You could use rclone sync, check out https://rclone.org/commands/rclone_sync/
Doesn’t transfer unchanged files, testing by size and modification time or MD5SUM. Destination is updated to match source, including deleting files if necessary.

It turned out to be a bug in rclone. https://github.com/rclone/rclone/issues/3834

Can make figure out file dependencies in AWS S3 buckets?

The source directory contains numerous large image and video files.
These files need to be uploaded to an AWS S3 bucket with the aws s3 cp command. For example, as part of this build process, I copy my image file my_image.jpg to the S3 bucket like this: aws s3 cp my_image.jpg s3://mybucket.mydomain.com/
I have no problem doing this copy to AWS manually. And I can script it too. But I want to use the makefile to upload my image file my_image.jpg iff the same-named file in my S3 bucket is older than the one in my source directory.
Generally make is very good at this kind of dependency checking based on file dates. However, is there a way I can tell make to get the file dates from files in S3 buckets and use that to determine if dependencies need to be rebuilt or not?

The AWS CLI has an s3 sync command that can take care of a fair amount of this for you. From the documentation:
A s3 object will require copying if:
the sizes of the two s3 objects differ,
the last modified time of the source is newer than the last modified time of the destination,
or the s3 object does not exist under the specified bucket and prefix destination.

I think you'll need to make S3 look like a file system to make this work. On Linux it is common to use FUSE to build adapters like that. Here are some projects to present S3 as a local filesystem. I haven't tried any of those, but it seems like the way to go.

Is it possible to sync a single file to s3?

I'd like to sync a single file from my filesystem to s3.
Is this possible or can only directories by synced?

Use include/exclude options for the sync-directory command:
e.g. To sync just /var/local/path/filename.xyz to S3 use:
s3 sync /var/local/path s3://bucket/path --exclude='*' --include='*/filename.xyz'

cp can be used to copy a single file to S3. If the filename already exists in the destination, this will replace it:
aws s3 cp local/path/to/file.js s3://bucket/path/to/file.js
Keep in mind that per the docs, sync will only make updates to the target if there have been file changes to the source file since the last run: s3 sync updates any files that have a size or modified time that are different from files with the same name at the destination. However, cp will always make updates to the target regardless of whether the source file has been modified.
Reference: AWS CLI Command Reference: cp

Just to comment on pythonjsgeo's answer. That seems to be the right solution but make sure so execute the command without the = symbol after the include and exclude tag. I was including the = symbol and getting weird behavior with the sync command.
s3 sync /var/local/path s3://bucket/path --exclude '*' --include '*/filename.xyz'

You can mount S3 bucket as a local folder (using RioFS, for example) and then use your favorite tool to synchronize file(-s) or directories.

Empty files on S3 prevent from downloading using s3cmd and s3sync

I am trying to setup a backup/restore using S3. The upload sync worked well using s3sync. However, next to each folder there is an empty file with matching name. I read somewhere that this is created to define the folder structure but I am not sure about that as it doesn't happen if I create a folder using a different method s3fox etc.
These empty files prevent me from restoring the directories/files. When I do s3cmd sync, I get an error message "can not make directory: File exists" as it first creates that empty file and that fails when trying to create the directory. Any ideas how I can solve this problem?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sync with S3 with s3cmd, but not re-download files that only changed name - amazon-s3

Related

How can I have Alluxio show all the not-yet-accessed files in the directory?

Prevent rclone from re-copying files to AWS S3 Deep Archive

Can make figure out file dependencies in AWS S3 buckets?

Is it possible to sync a single file to s3?

Empty files on S3 prevent from downloading using s3cmd and s3sync

Categories

Resources