Can I pass a key file as parameter to connect to Google Cloud Storage using gsutil? - authentication

When I use gsutil to connect to my bucket on Google Cloud Storage, I usually use the following command:
gcloud auth activate-service-account --key-file="pathKeyFile"
What should I do if two scripts that are running on the same machine at the same time need two different Service Accounts?
I would like to use a command such as:
gsutil ls mybucket --key-file="mykeyspath"
I say this because in the case my script is running and another script changes the Service Account which is actually active, my script would not have permission to access the bucket anymore.

You can do this with BOTO file. You can create one as explained in the documentation.
Then you can specify which file to use when you run your gsutil command (here an example in linux)
# if you have several GSUTIL command to run
export BOTO_CONFIG=/path/to/.botoMyBucket
gsutil ls myBucket
# For only one command, you can define an env var inline like this
BOTO_CONFIG=/path/to/.botoMyBucket2 gsutil ls myBucket2

Related

Copy files from GCLOUD to S3 with SDK GCloud

I am trying to copy a file between gcloud and aws s3 with sdk gcloud console and it shows me an error. I have got the way to copy the gcloud file to a local directory (gsutil -D cp gs://mybucket/myfile C:\tmp\storage\file) and to upload this local file to s3 using aws cli (aws s3 cp C:\tmp\storage\file s3://my_s3_dirctory/file), and it works perfectly, but i would like to do all of this directly, with no need to download the files and only using SDK Gcloud console.
When i try to do this, the system shows me an error:
gsutil -D cp gs://mybucket/myfile s3://my_s3_dirctory/file.csv
Failure: Host [...] returned an invalid certificate. (remote hostname
"....s3.amazonaws.com" does not match certificate)...
I have edited and uncommented that lines in .boto file, but the error continues:
# To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
# following two lines:
aws_access_key_id = [MY_AWS_ACCESS_KEY_ID]
aws_secret_access_key = [MY_AWS_SECRET_ACCESS_KEY]
I am a noob in this and i dont know what is boto and i have no idea if i am editing it well or not. I dont know if i can to put the keys directly in the sentence, because i dont know how works .boto file...
Can somebody help me whit that, please? And explain the whole process to me so this works?? I really apreciate this... It would be very helpful for me!
Thak you so much.

How to use S3 adapter cli for snowball

I'm using s3 adapter to copy files from a snowball device to local machine.
Everything appears to be in order as I was able to run this command and see the bucket name:
aws s3 ls --endpoint http://snowballip:8080
But besides this, aws doesn't offer any examples for calling cp command. How do I provide the bucket name and the key with this --endpoint flag.
Further, when I ran this:
aws s3 ls --endpoint http://snowballip:8080/bucketname
It returned 'Bucket'... Not sure what that means because I expect to see the files.
I can confirm the following is correct for snowball and snowball edge, as #sqlbot says in the comment
aws s3 ls --endpoint http://snowballip:8080 s3://bucketname/[optionalprefix]
References:
http://docs.aws.amazon.com/cli/latest/reference/
http://docs.aws.amazon.com/snowball/latest/ug/using-adapter-cli.html
Just got one in the post

How do I design a Bootup script for setting up an apache web server running on a brand new AWS EC2 Ubuntu instance?

I want to configure an EC2 instance so that it installs, configures and starts an Apache web server without my (human) intervention.
To this end, I am taking advantage of the "User Data" section and I have written the following script:
#!/bin/bash
sudo apt-get upgrade
sudo apt-get install -y apache2
sudo apt-get install -y awscli
while [ ! -e /tmp/index.html ]; do aws s3 cp s3://vietnhi-s-bucket/index.html /var/www/html; done
sudo systemctl restart apache2
Description of the Functionality of the Bootup script:
The script forces an update of the Ubuntu instance from whatever the date of the AMI image was when the image was created to today, when the EC2 instance is created from the image.
The script installs the Apache 2 server.
The script installs the AWS CLI interface. Because the aws s3 cp command on the next line is not going to work without the AWS CLI interface.
The script copies the sample index.html file from the vietnhi-s-bucket S3 bucket to the /var/www/html directory of the Apache web server and overwrites its default index.html file.
The script restarts the Apache web server. I could have used "Start" but I chose to use "restart".
Explanatory Notes:
The script assumes that I have created an IAM role that permits AWS to copy the file index.html from an S3 bucket called "vietnhi-s-bucket". I have given the name "S3" to the IAM role and assigned the "S3ReadAccess" policy to that role.
The script assumes that I have created an S3 bucket called "vietnhi-s-bucket" where I have stashed a sample index.html file.
For reference, here are the contents of the sample index.html file:
[html]
[body]
This is a test
[/body]
[/html]
Does the bootup script work as intended?
The script works as-is.
To arrive at that script, I had to overcome three challenges:
Create an appropriate IAM role. The minimum viable role MUST include the "S3ReadAccess" policy. This role is absolutely necessary for AWS to be able to use the public and private access keys for your AWS account and that are loaded in your environment. Copying the index.html file from the Vietnhi-s-bucket S3 bucket is not feasible if AWS does not have access to your AWS account keys.
Install the AWS CLI interface (awscli). For whatever reason, I never saw that line included in any of the AWS official documentation or any of the support offered on the web including the AWS forums. You can't run the AWS CLI s3 cp command if you don't install the AWS CLI interface.
I originally used "aws s3 cp s3://vietnhi-s-bucket/index.html /var/www/html" as my original copy-from-S3 instruction. Bad call. https://forums.aws.amazon.com/thread.jspa?threadID=220187
The link above refers to a timing issue that AWS hasn't resolved and the only workaround is to wrap retries around the aws s3 cp command.

Exporting data from Google Cloud Storage to Amazon S3

I would like to transfer data from a table in BigQuery, into another one in Redshift.
My planned data flow is as follows:
BigQuery -> Google Cloud Storage -> Amazon S3 -> Redshift
I know about Google Cloud Storage Transfer Service, but I'm not sure it can help me. From Google Cloud documentation:
Cloud Storage Transfer Service
This page describes Cloud Storage Transfer Service, which you can use
to quickly import online data into Google Cloud Storage.
I understand that this service can be used to import data into Google Cloud Storage and not to export from it.
Is there a way I can export data from Google Cloud Storage to Amazon S3?
You can use gsutil to copy data from a Google Cloud Storage bucket to an Amazon bucket, using a command such as:
gsutil -m rsync -rd gs://your-gcs-bucket s3://your-s3-bucket
Note that the -d option above will cause gsutil rsync to delete objects from your S3 bucket that aren't present in your GCS bucket (in addition to adding new objects). You can leave off that option if you just want to add new objects from your GCS to your S3 bucket.
Go to any instance or cloud shell in GCP
First of all configure your AWS credentials in your GCP
aws configure
if this is not recognising the install AWS CLI follow this guide https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
follow this URL for AWS configure
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Attaching my screenshot
Then using gsutil
gsutil -m rsync -rd gs://storagename s3://bucketname
16GB data transferred in some minutes
Using Rclone (https://rclone.org/).
Rclone is a command line program to sync files and directories to and from
Google Drive
Amazon S3
Openstack Swift / Rackspace cloud files / Memset Memstore
Dropbox
Google Cloud Storage
Amazon Drive
Microsoft OneDrive
Hubic
Backblaze B2
Yandex Disk
SFTP
The local filesystem
Using the gsutil tool we can do a wide range of bucket and object management tasks, including:
Creating and deleting buckets.
Uploading, downloading, and deleting objects.
Listing buckets and objects. Moving, copying, and renaming objects.
we can copy data from a Google Cloud Storage bucket to an amazon s3 bucket using gsutil rsync and gsutil cp operations. whereas
gsutil rsync collects all metadata from the bucket and syncs the data to s3
gsutil -m rsync -r gs://your-gcs-bucket s3://your-s3-bucket
gsutil cp copies the files one by one and as the transfer rate is good it copies 1 GB in 1 minute approximately.
gsutil cp gs://<gcs-bucket> s3://<s3-bucket-name>
if you have a large number of files with high data volume then use this bash script and run it in the background with multiple threads using the screen command in amazon or GCP instance with AWS credentials configured and GCP auth verified.
Before running the script list all the files and redirect to a file and read the file as input in the script to copy the file
gsutil ls gs://<gcs-bucket> > file_list_part.out
Bash script:
#!/bin/bash
echo "start processing"
input="file_list_part.out"
while IFS= read -r line
do
command="gsutil cp ${line} s3://<bucket-name>"
echo "command :: $command :: $now"
eval $command
retVal=$?
if [ $retVal -ne 0 ]; then
echo "Error copying file"
exit 1
fi
echo "Copy completed successfully"
done < "$input"
echo "completed processing"
execute the Bash script and write the output to a log file to check the progress of completed and failed files.
bash file_copy.sh > /root/logs/file_copy.log 2>&1
I needed to transfer 2TB of data from Google Cloud Storage bucket to Amazon S3 bucket.
For the task, I created the Google Compute Engine of V8CPU (30 GB).
Allow Login using SSH on the Compute Engine.
Once logedin create and empty .boto configuration file to add AWS credential information. Added AWS credentials by taking the reference from the mentioned link.
Then run the command:
gsutil -m rsync -rd gs://your-gcs-bucket s3://your-s3-bucket
The data transfer rate is ~1GB/s.
Hope this help.
(Do not forget to terminate the compute instance once the job is done)
For large amounts of large files (100MB+) you might get issues with broken pipes and other annoyances, probably due to multipart upload requirement (as Pathead mentioned).
For that case you're left with simple downloading all files to your machine and uploading them back. Depending on your connection and data amount, it might be more effective to create VM instance to utilize high-speed connection and ability to run it in the background on different machine than yours.
Create VM machine (make sure the service account has access to your buckets), connect via SSH and install AWS CLI (apt install awscli) and configure the access to S3 (aws configure).
Run these two lines, or make it a bash script, if you have many buckets to copy.
gsutil -m cp -r "gs://$1" ./
aws s3 cp --recursive "./$1" "s3://$1"
(It's better to use rsync in general, but cp was faster for me)
Tools like gsutil and aws s3 cp won't use multipart uploads/downloads, so will have poor performance for large files.
Skyplane is a much faster alternative for transferring data between clouds (up to 110x for large files). You can transfer data with the command:
skyplane cp -r s3://aws-bucket-name/ gcs://google-bucket-name/
(disclaimer: I am a contributor)

migration from s3 to google cloud storage and ACL

I am currently planning a possible migration from s3 to google cloud storage(g-c-s). I have decided to spin up a gce instance and use gsutil to rsync several millions of files. I would like to know if the permission will be preserved or not.
for example if a file has public read on amazon s3 what will be the acl on g-c-s.
thanks
If you use the gsutil cp command you can specify a canned ACL on the command line, like this:
gsutil cp -R -a public-read s3://your-s3-bucket gs://your-gs-bucket
The rsync command doesn't have a way to do that. However, the other option is you can set a default object ACL on the destination bucket, using the gsutil defacl command. Then you can use gsutil cp without specifying the canned ACL, or you could use gsutil rsync.