How to download S3-Bucket, compress on the fly and reupload to another s3 bucket without downloading locally? - amazon-s3

I want to download the contents of a s3 bucket (hosted on wasabi, claims to be fully s3 compatible) to my VPS, tar and gzip and gpg it and reupload this archive to another s3 bucket on wasabi!
My vps machine only has 30GB of storage, the whole buckets is about 1000GB in size so I need to download, archive, encrypt and reupload all of it on the fly without storing the data locally.
The secret seems to be in using the | pipe command. But I am stuck even in the beginning of download a bucket into an archive locally (I want to go step by step):
s3cmd sync s3://mybucket | tar cvz archive.tar.gz -
In my mind at the end I expect some code like this:
s3cmd sync s3://mybucket | tar cvz | gpg --passphrase secretpassword | s3cmd put s3://theotherbucket/archive.tar.gz.gpg
but its not working so far!
What am I missing?

The aws s3 sync command copies multiple files to the destination. It does not copy to stdout.
You could use aws s3 cp s3://mybucket - (including the dash at the end) to copy the contents of the file to stdout.
From cp — AWS CLI Command Reference:
The following cp command downloads an S3 object locally as a stream to standard output. Downloading as a stream is not currently compatible with the --recursive parameter:
aws s3 cp s3://mybucket/stream.txt -
This will only work for a single file.

You may try https://github.com/kahing/goofys. I guess, in your case it could be the following algo:
$ goofys source-s3-bucket-name /mnt/src
$ goofys destination-s3-bucket-name /mnt/dst
$ tar -cvzf /mnt/src | gpg -e -o /mnt/dst/archive.tgz.gpg

Related

Scaleway GLACIER class object storage with restic

Scaleway recently launched GLACIER class storage "C14 Cold Storage Class"
They have a great plan of 75GB free and I'd like to take advantage of this using the restic backup tool.
To get this working I have successfully followed the S3 instructions for repository creation and uploading, with one caveat. I can not successfully pass the storage-class header as GLACIER.
Using awscliv2, I can successfully pass a header that looks very much like this from my local machine: aws s3 cp object s3://bucket/ --storage-class GLACIER
But with restic, having dug through some github issues, I can see an option to pass a -o flag. The linked issues resolution is not that clear to me so I have tried the following restic commands without successfully seeing the "GLACIER" class of storage label next to the files objects in the Scaleway bucket console:
restic -r s3:s3.fr-par.scw.cloud/restic-testing -o GLACIER --verbose backup ~/test.txt
restic -r s3:s3.fr-par.scw.cloud/restic-testing -o storage-class=GLACIER --verbose backup ~/test.txt
Can someone suggest another option?
I'm starting to use C14's GLACIER storage class with restic, and until now it seems be working very well.
I suggest to create the repository in the usual way with restic -r s3:s3.fr-par.scw.cloud/test-bucket init, which will create the config file and keys in the STANDARD storage class.
For backups, I'm using the command:
$ restic backup -r s3:s3.fr-par.scw.cloud/test-bucket -o s3.storage-class=GLACIER --host host /path
similar to what you did, apart the option is s3.storage-class and not storage-class.
In this way files in the data and snapshots directories are in GLACIER storage class, and you can add backups with no problem.
I can also mount the repository while data is in GLACIER class (I suppose all the info are taken from cache) so I can do restic mount /mnt/c14 and I can browse the files, also if I cannot copy them or see their content.
If I need to restore files, I restore all bucket in STANDARD class with s3cmd restore --recursive s3://test-bucket/ (see s3cmd), I test that all files are correctly in standard class with:
$ aws s3 ls s3://test-bucket --recursive | tr -s ' ' | cut -d' ' -f 4 | xargs -n 1 -I {} sh -c "aws s3api head-object --bucket unitedhost --key '{}' | jq -r .StorageClass" | grep --quiet GLACIER
which returns true if at least one file is in GLACIER class, so you have to wait this command to returns false.
Obviously a restore will need more time, but I'm using C14 glacier as a second or third backup, while using another restic repository in Backblaze B2 which is a warm storage.
In addition to vstefanoxx 's answer : Here is my workflow.
I setup the restic repository just like vstefanoxx.
Now, if you want to prune the repository... you cannot as the files are in glacier and restic needs read-write access to the bucket to prune.
What is interesting about Scaleway is that file transferts between glacier and standard class are free. So let's move the data back to the standard class :
s3cmd restore --recursive s3://test-bucket
And wait until the end of the process using the command given by vstefanoxx. Once your data is in the standard class it costs you five times more, so we have to be efficient :-)
So we now prune the repository:
restic prune -r s3:s3.fr-par.scw.cloud/test-bucket
And once it is finished, move everything (in fact data, index and snapshots but not keys) back to glacier:
s3cmd cp s3://test-bucket/data/ s3://test-bucket/data/ --recursive --storage-class=GLACIER
s3cmd cp s3://test-bucket/index/ s3://test-bucket/index/ --recursive --storage-class=GLACIER
s3cmd cp s3://test-bucket/snapshots/ s3://test-bucket/snapshots/ --recursive --storage-class=GLACIER
So we are now to a point where we have pruned the repository, trying to pay the least amount of money !
The chosen answer doesn't seem to work when doing incremental backups. I went with a different solution.
I set up a normal bucket, initialized with your usual restic init. Then I set up the following lifetime rule:
<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>data-to-glacier</ID>
<Filter>
<Prefix>data/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>0</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
</Rule>
</LifecycleConfiguration>
Days is set to 0, which means that the rule will be applied to all files. Rules are not applied continuously though, they're applied once a day at midnight UTC.
This rule will only apply to the files in data/, which are the big files.
This rule description is supposed to be used with s3cmd but you can also do it from the dashboard if you prefer a GUI.

How to upload a directory to a AWS S3 bucket along with a KMS ID through CLI?

I want to upload a directory (A folder consist of other folders and .txt files) to a folder(partition) in a specific S3 bucket along with a given KMS-id via CLI. The following command which is to upload a jar file to an S3 bucket, was found.
The command I found for upload a jar:
aws s3 sync /?? s3://???-??-dev-us-east-2-813426848798/build/tmp/snapshot --sse aws:kms --sse-kms-key-id alias/nbs/dev/data --delete --region us-east-2 --exclude "*" --include "*.?????"
Suppose;
Location (Bucket Name with folder name) - "s3://abc-app-us-east-2-12345678/tmp"
KMS-id - https://us-east-2.console.aws.amazon.com/kms/home?region=us-east-2#/kms/keys/aa11-123aa-45/
Directory to be uploaded - myDirectory
And I want to know;
Whether the same command can be used to upload a directory with a
bunch of files and folders in it?
If so, how this command should be changed?
the cp command works this way:
aws s3 cp ./localFolder s3://awsexamplebucket/abc --recursive --sse aws:kms --sse-kms-key-id a1b2c3d4-e5f6-7890-g1h2-123456789abc
I haven't tried sync command with kms, but the way you use sync is,
aws s3 sync ./localFolder s3://awsexamplebucket/remotefolder

S3 : Download Multiple Files in local git-bash console

I have multiple files in S3 bucket like
file1.txt
file2.txt
file3.txt
another-file1.txt
another-file1.txt
another-file1.txt
now, I want to download first 3 files, name startwith "file*", How can i download from aws s3 in local git-bash console?
Simply you can download with below command :
aws s3 cp --recursive s3://bucket-name/ /local-destination-folder/ --exclude "*" --include "file*"

Compare (not sync) the contents of a local folder and a AWS S3 bucket

I need to compare the contents of a local folder with a AWS S3 bucket so that where there are differences a script is executed on the local files.
The idea is that local files (pictures) get encrypted and uploaded to S3. Once the upload has occurred I delete the encrypted copy of the pictures to save space. The next day new files get added to the local folder. I need to check between the local folder and the S3 bucket which pictures have already been encrypted and uploaded so that I only encrypt the newly added pictures rather than all of them all over again. I have a script that does exactly this between two local folders but I'm struggling to adapt it so that the comparison is performed between a local folder and a S3 bucket.
Thank you to anyone who can help.
Here is the actual script I am currently using for my picture sorting, encryption and back up to S3:
!/bin/bash
perl /volume1/Synology/scripts/Exiftool/exiftool '-createdate
perl /volume1/Synology/scripts/Exiftool/exiftool '-model=camera model missing' -r -if '(not $model)' -overwrite_original -r /volume1/photo/"input"/ --ext .DS_Store -i "#eaDir"
perl /volume1/Synology/scripts/Exiftool/exiftool '-Directory
cd /volume1/Synology/Pictures/"Pictures Glacier back up"/"Compressed encrypted pics for Glacier"/post_2016/ && (cd /volume1/Synology/Pictures/Pictures/post_2016/; find . -type d ! -name .) | xargs -i mkdir -p "{}"
while IFS= read -r file; do /usr/bin/gpg --encrypt -r xxx#yyy.com /volume1/Synology/Pictures/Pictures/post_2016/**///$(basename "$file" .gpg); done < <(comm -23 <(find /volume1/Synology/Pictures/Pictures/post_2016 -type f -printf '%f.gpg\n'|sort) <(find /volume1/Synology/Pictures/"Pictures Glacier back up"/"Compressed encrypted pics for Glacier"/post_2016 -type f -printf '%f\n'|sort))
rsync -zarv --exclude=#eaDir --include="/" --include=".gpg" --exclude="" /volume1/Synology/Pictures/Pictures/post_2016/ /volume1/Synology/Pictures/"Pictures Glacier back up"/"Compressed encrypted pics for Glacier"/post_2016/
find /volume1/Synology/Pictures/Pictures/post_2016/ -name ".gpg" -type f -delete
/usr/bin/aws s3 sync /volume1/Synology/Pictures/"Pictures Glacier back up"/"Compressed encrypted pics for Glacier"/post_2016/ s3://xyz/Pictures/post_2016/ --exclude "" --include ".gpg" --sse
It would be inefficient to continually compare the local and remote folders, especially as the quantity of objects increases.
A better flow would be:
Unencrypted files are added to a local folder
Each file is:
Copied to another folder in an encrypted state
Once that action is confirmed, the original file is then deleted
Files in the encrypted local folder are copied to S3
Once that action is confirmed, the source file is then deleted
The AWS Command-Line Interface (CLI) has an aws s3 sync command that makes it easy to copy new/modified files to an Amazon S3 bucket, but this could be slow if you have thousands of files.

Files will not move or copy from folder on file system to local bucket

I am using the command
aws s3 mv --recursive Folder s3://bucket/dsFiles/
The aws console is not giving me any feedback. I change the permissions of the directory
sudo chmod -R 666 ds000007_R2.0.1/
It looks like AWS is passing through those files and giving "File does not exist" for every directory.
I am confused about why AWS is not actually performing the copy is there some size limitation or recursion depth limitation?
I believe you want to cp, not mv. Try the following:
aws s3 cp $local/folder s3://your/bucket --recursive --include "*".
Source, my answer here.