gsutil cp / download file to windows server - gsutil

I'm very new at this and need some help; I'm sure I'm not doing something right. I have a Synology NAS that has a cool options to sync files to Google cloud storage. This is a great way to get my backups off site 
I have my backups syncing to a cold line storage bucket. Now that my files are syncing I'm looking to document the process if I need to retrieve them.
I want to download a whole folder and all of the files inside it to a windows server. I installed the gsutil and trying to run this command.
gsutil -m cp -R dir gs://bhp_backup_sync/backup/foldername
but after I run this I get the following exception.
CommandException: No URLs matched: dir
CommandException: 1 file/object could not be transferred.
NOOB here what am I missing?

Related

How to use rclone to download data from S3

I have a quick question on rclone.
I am trying to download data from tradestatistics.io, where it gives a sample code for downloading:
rclone sync spaces:tradestatistics/hs-rev1992-visualization hs-rev1992-visualization
My question is that how to access list of files in that source and can it be directly done with terminal?
Assuming you've already installed rclone (https://rclone.org/downloads/)
To configure rclone to see storage on S3, see https://rclone.org/s3/
Assuming spaces: is your correctly configured rclone source remote, you can list all files using terminal amnd rclone lsl command:
rclone lsl spaces:tradestatistics/hs-rev1992-visualization
where tradestatistics is the bucket and hs-rev1992-visualization is the root folder.
A more human-readable list can be done with lsf. It's not recursive, so add -R:
rclone lsf -R spaces:
More details at https://rclone.org/commands/rclone_lsl/ with info on other lists.

Removing files from GCS: "gsutil -m rm" throws CommandException: files/objects could not be removed

gsutil -m rm gs://{our_bucket}/{dir}/{subdir}/*
...
Removing gs://our_bucket/dir/subdir/staging-000000000102.json...
Removing gs://our_bucket/dir/subdir/staging-000000000101.json...
CommandException: 103 files/objects could not be removed.
The command is able to find the directory with the 103 .JSON files, and "tries" removing them per the Removing gs://... being output. For what reason might we be receiving CommandException: 103 files/objects could not be removed.?
This works on my local machine
This works in our docker container run locally
This does not work in our docker container on the GCP compute engine where we need it to be working.
Perhaps this is a permissions issue with the compute engine not having permission to remove files in our GCS?
Edit: We have a service account JSON in the /config folder of our Airflow project, and that service account is shared to an IAM user with Storage Admin permission. Perhaps having the JSON in the /config folder is not sufficient for assigning permissions to the entire GCP compute engine? I am particularly confused because this server is able to query from our BQ database, and WRITE to GCS, but cannot delete from GCS...
The solution in this link - https://gist.github.com/ryderdamen/926518ddddd46dd4c8c2e4ef5167243d was exactly what we needed:
Stop the instance
Edit the settings
Remove gsutil cache

Amazon S3 console: download multiple files at once

When I log to my S3 console I am unable to download multiple selected files (the WebUI allows downloads only when one file is selected):
https://console.aws.amazon.com/s3
Is this something that can be changed in the user policy or is it a limitation of Amazon?
It is not possible through the AWS Console web user interface.
But it's a very simple task if you install AWS CLI.
You can check the installation and configuration steps on Installing in the AWS Command Line Interface
After that you go to the command line:
aws s3 cp --recursive s3://<bucket>/<folder> <local_folder>
This will copy all the files from given S3 path to your given local path.
Selecting a bunch of files and clicking Actions->Open opened each in a browser tab, and they immediately started to download (6 at a time).
If you use AWS CLI, you can use the exclude along with --include and --recursive flags to accomplish this
aws s3 cp s3://path/to/bucket/ . --recursive --exclude "*" --include "things_you_want"
Eg.
--exclude "*" --include "*.txt"
will download all files with .txt extension. More details - https://docs.aws.amazon.com/cli/latest/reference/s3/
I believe it is a limitation of the AWS console web interface, having tried (and failed) to do this myself.
Alternatively, perhaps use a 3rd party S3 browser client such as http://s3browser.com/
If you have Visual Studio with the AWS Explorer extension installed, you can also browse to Amazon S3 (step 1), select your bucket (step 2), select al the files you want to download (step 3) and right click to download them all (step 4).
The S3 service has no meaningful limits on simultaneous downloads (easily several hundred downloads at a time are possible) and there is no policy setting related to this... but the S3 console only allows you to select one file for downloading at a time.
Once the download starts, you can start another and another, as many as your browser will let you attempt simultaneously.
In case someone is still looking for an S3 browser and downloader I have just tried Fillezilla Pro (it's a paid version). It worked great.
I created a connection to S3 with Access key and secret key set up via IAM. Connection was instant and downloading of all folders and files was fast.
Using AWS CLI, I ran all the downloads in the background using "&" and then waited on all the pids to complete. It was amazingly fast. Apparently the "aws s3 cp" knows to limit the number of concurrent connections because it only ran 100 at a time.
aws --profile $awsProfile s3 cp "$s3path" "$tofile" &
pids[${npids}]=$! ## save the spawned pid
let "npids=npids+1"
followed by
echo "waiting on $npids downloads"
for pid in ${pids[*]}; do
echo $pid
wait $pid
done
I downloaded 1500+ files (72,000 bytes) in about a minute
I wrote a simple shell script to download NOT JUST all files but also all versions of every file from a specific folder under AWS s3 bucket. Here it is & you may find it useful
# Script generates the version info file for all the
# content under a particular bucket and then parses
# the file to grab the versionId for each of the versions
# and finally generates a fully qualified http url for
# the different versioned files and use that to download
# the content.
s3region="s3.ap-south-1.amazonaws.com"
bucket="your_bucket_name"
# note the location has no forward slash at beginning or at end
location="data/that/you/want/to/download"
# file names were like ABB-quarterly-results.csv, AVANTIFEED--quarterly-results.csv
fileNamePattern="-quarterly-results.csv"
# AWS CLI command to get version info
content="$(aws s3api list-object-versions --bucket $bucket --prefix "$location/")"
#save the file locally, if you want
echo "$content" >> version-info.json
versions=$(echo "$content" | grep -ir VersionId | awk -F ":" '{gsub(/"/, "", $3);gsub(/,/, "", $3);gsub(/ /, "", $3);print $3 }')
for version in $versions
do
echo ############### $fileId ###################
#echo $version
url="https://$s3region/$bucket/$location/$fileId$fileNamePattern?versionId=$version"
echo $url
content="$(curl -s "$url")"
echo "$content" >> $fileId$fileNamePattern-$version.csv
echo ############### $i ###################
done
Also you could use the --include "filename" many times in a single command with each time including a different filename within the double quotes, e.g.
aws s3 mycommand --include "file1" --include "file2"
It will save your time rather than repeating the command to download one file at a time.
Also if you are running Windows(tm), WinSCP now allows drag and drop of a selection of multiple files. Including sub-folders.
Many enterprise workstations will have WinSCP installed for editing files on servers by means of SSH.
I am not affiliated, I simply think this was really worth doing.
In my case Aur's didn't work and if you're looking for a quick solution to download all files in a folder just using the browser, you can try entering this snippet in your dev console:
(function() {
const rows = Array.from(document.querySelectorAll('.fix-width-table tbody tr'));
const downloadButton = document.querySelector('[data-e2e-id="button-download"]');
const timeBetweenClicks = 500;
function downloadFiles(remaining) {
if (!remaining.length) {
return
}
const row = remaining[0];
row.click();
downloadButton.click();
setTimeout(() => {
downloadFiles(remaining.slice(1));
}, timeBetweenClicks)
}
downloadFiles(rows)
}())
I have done, by creating shell script using aws cli (i.e : example.sh)
#!/bin/bash
aws s3 cp s3://s3-bucket-path/example1.pdf LocalPath/Download/example1.pdf
aws s3 cp s3://s3-bucket-path/example2.pdf LocalPath/Download/example2.pdf
give executable rights to example.sh (i.e sudo chmod 777 example.sh)
then run your shell script ./example.sh
I think simplest way to download or upload files is to use aws s3 sync command. You can also use it to sync two s3 buckets in same time.
aws s3 sync <LocalPath> <S3Uri> or <S3Uri> <LocalPath> or <S3Uri> <S3Uri>
# Download file(s)
aws s3 sync s3://<bucket_name>/<file_or_directory_path> .
# Upload file(s)
aws s3 sync . s3://<bucket_name>/<file_or_directory_path>
# Sync two buckets
aws s3 sync s3://<1st_s3_path> s3://<2nd_s3_path>
What I usually do is mount the s3 bucket (with s3fs) in a linux machine and zip the files I need into one, then I just download that file from any pc/browser.
# mount bucket in file system
/usr/bin/s3fs s3-bucket -o use_cache=/tmp -o allow_other -o uid=1000 -o mp_umask=002 -o multireq_max=5 /mnt/local-s3-bucket-mount
# zip files into one
cd /mnt/local-s3-bucket-mount
zip all-processed-files.zip *.jpg
import os
import boto3
import json
s3 = boto3.resource('s3', aws_access_key_id="AKIAxxxxxxxxxxxxJWB",
aws_secret_access_key="LV0+vsaxxxxxxxxxxxxxxxxxxxxxry0/LjxZkN")
my_bucket = s3.Bucket('s3testing')
# download file into current directory
for s3_object in my_bucket.objects.all():
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
my_bucket.download_file(s3_object.key, filename)

How can I use boto or boto-rsync a full backup of 1000+ files to an S3-compatible cloud?

I'm trying to back up my entire collection of over 1000 work files, mainly text but also pictures, and a few large (0.5-1G) audiorecordings, to an S3 cloud (Dreamhost DreamObjects). I have tried to use boto-rsync to perform the first full 'put' with this:
$ boto-rsync --endpoint objects.dreamhost.com /media/Storage/Work/ \
> s3:/work.personalsite.net/ > output.txt
where '/media/Storage/Work/' is on a local hard disk, 's3:/work.personalsite.net/' is a bucket named after my personal web site for uniqueness, and output.txt is where I wanted a list of the files uploaded and error messages to go.
Boto-rsync grinds its way through the whole dirtree, but refreshing output about each file's progress doesn't look so good when it's printed in a file. Still as the upload is going, I 'tail output.txt' and I see that most files are uploaded, but some are only uploaded to less than 100%, and some are skipped altogether. My questions are:
Is there any way to confirm that a transfer is 100% complete and correct?
Is there a good way to log the results and errors of a transfer?
Is there a good way transfer a large number of files in a big directory hierarchy to one or more buckets for the first time, as opposed to an incremental backup?
I am on a Ubuntu 12.04 running Python 2.7.3. Thank you for your help.
you can encapsulate the command in an script and starts over nohup:
nohup script.sh
nohup generates automaticaly nohup.out file where all the output aof the script/command are captured.
to appoint the log you can do:
nohup script.sh > /path/to/log
br
Eddi

Can I move an object into a 'folder' inside an S3 bucket using the s3cmd mv command?

I have the s3cmd command line tool for linux installed. It works fine to put files in a bucket. However, I want to move a file into a 'folder'. I know that folders aren't natively supported by S3, but my Cyberduck GUI tool converts them nicely for me to view my backups.
For instance, I have a file in the root of the bucket, called 'test.mov' that I want to move to the 'idea' folder. I am trying this:
s3cmd mv s3://mybucket/test.mov s3://mybucket/idea/test.mov
but I get strange errors like:
WARNING: Retrying failed request: /idea/test.mov (timed out)
WARNING: Waiting 3 sec...
I also tried quotes, but that didn't help either:
s3cmd mv 's3://mybucket/test.mov' 's3://mybucket/idea/test.mov'
Neither did just the folder name
s3cmd mv 's3://mybucket/test.mov' 's3://mybucket/idea/'
Is there a way within having to delete and reput this 3GB file?
Update: Just FYI, I can put new files directly into a folder like this:
s3cmd put test2.mov s3://mybucket/idea/test2.mov
But still don't know how to move them around....
To move/copy from one bucket to another or the same bucket I use s3cmd tool and works fine. For instance:
s3cmd cp --r s3://bucket1/directory1 s3://bucket2/directory1
s3cmd mv --recursive s3://bucket1/directory1 s3://bucket2/directory1
Probably your file is quite big, try increasing socket_timeout s3cmd configuration setting
http://sumanrs.wordpress.com/2013/03/19/s3cmd-timeout-problems-moving-large-files-on-s3-250mb/
Remove the ' signs. Your code should be:
s3cmd mv s3://mybucket/test.mov s3://mybucket/idea/test.mov
Also try what are the permissions of your bucket - for your username you should have all the permissions.
Also try to connect CloudFront to your bucket. I know it doesn' make sense but I have similar problem to bucket which do not have cloudfront instance clonnected to it.