Google Cloud gsutil instructions say update but there is no update command - gsutil

On this page:
https://developers.google.com/storage/docs/gsutil_install?hl=ja#install
The gsutil install recommends, right after install, running
gsutil update
which returns
CommandException: Invalid command "update".
Am I just seeing incorrect documentation? Is there some other way to update?
Checking on 'usage' doesn't mention there being any update command:
Usage: gsutil [-d][-D] [-h header]... [-m] [command [opts...] args...] [-q]
Available commands:
acl Get, set, or change bucket and/or object ACLs
cat Concatenate object content to stdout
compose Concatenate a sequence of objects into a new composite object.
config Obtain credentials and create configuration file
cors Set a CORS XML document for one or more buckets
cp Copy files and objects
defacl Get, set, or change default ACL on buckets
du Display object size usage
help Get help about commands and topics
lifecycle Get or set lifecycle configuration for a bucket
logging Configure or retrieve logging on buckets
ls List providers, buckets, or objects
mb Make buckets
mv Move/rename objects and/or subdirectories
notification Configure object change notification
perfdiag Run performance diagnostic
rb Remove buckets
rm Remove objects
setmeta Set metadata on already uploaded objects
stat Display object status
test Run gsutil tests
version Print version info about gsutil
versioning Enable or suspend versioning for one or more buckets
web Set a main page and/or error page for one or more buckets
Additional help topics:
acls Working With Access Control Lists
anon Accessing Public Data Without Credentials
crc32c CRC32C and Installing crcmod
creds Credential Types Supporting Various Use Cases
dev Contributing Code to gsutil
metadata Working With Object Metadata
naming Object and Bucket Naming
options Top-Level Command-Line Options
prod Scripting Production Transfers
projects Working With Projects
subdirs How Subdirectories Work
support Google Cloud Storage Support
versions Object Versioning and Concurrency Control
wildcards Wildcard Names
Use gsutil help for detailed help.
EDIT:
It is gsutil version 3.42

Maybe you have a very old version?
Try gsutil version to see yours.
you can check release notes here:
https://github.com/GoogleCloudPlatform/gsutil/blob/master/CHANGES.md

Related

Read bucket object in CDK

In terraform to read an object from s3 bucket at the time of deployment I can use data source
data aws_s3_bucket_object { }
Is there a similar concept in CDK? I've seen various methods of uploading assets to s3, as well as importing an existing bucket, but not getting an object from the bucket. I need to read a configuration file from the bucket that will affect further deployment.
Its important to remember that CDK itself is not a deployment option. it can deploy, but the code you are writing in a cdk stack is the definition of your resources - not a method for deployment.
So, you can do one of a few things.
Use your SDK for your language to make a call to the s3 bucket and load the data directly. This is perfectly acceptable and an understood way to gather information you need before deployment - each time the stack Synths (which it does before every cdk deploy that code will run and will pull your data.
Use a CodePipeline to set up a proper pipeline, and give it two sources - one your version control repo and the second your s3 bucket:
https://docs.aws.amazon.com/codebuild/latest/userguide/sample-multi-in-out.html
The preferred way - drop the json file, and use Parameter Store. CDK contains modules that will create a token version of this parameter on synth, and when it deploys it will reference that properly back to the Systems Manager Parameter store
https://docs.aws.amazon.com/cdk/v2/guide/get_ssm_value.html
If your parameters change after deployment, you can have that as part of your cdk stack pretty easily (using cfn outputs). If they change in the middle/during deployment, you really need to be using a CodePipeline to manage these steps instead of just CDK.
Because remember: The cdk deploy option is just a convenience. It will execute everything and has no way to pause in the middle and execute specific steps. (other than a very basic, this depends on this resources)

Sharing data between several Google projects

A question about Google Storage:
Is it possible to give r/o access to a (not world-accessible) storage bucket to a user from another Google project?
If yes, how?
I want it to backup data to another Google project, for the case if somebody may incidentally delete all storage buckets from our project.
Yes. Access to Google Cloud Storage buckets and objects are controlled by ACLs that allow you to specify individual users, service accounts, groups, or project role.
You can add users to any existing object through the UI, the gsutil command-line utility, or via any of the APIs.
If you want to grant one specific user the ability to write objects into project X, you need only specify the user's email:
$> gsutil acl ch -u bob.smith#gmail.com:W gs://bucket-in-project-x
If you want to say that every member of the project my-project is permitted to write into some bucket in a different project, you can do that as well:
$> gsutil acl ch -p members-my-project:W gs://bucket-in-project-x
The "-u" means user, "-p" means 'project'. User names are just email addresses. Project names are the strings "owners-", "viewers-", or "editors-" and then the project's ID. The ":W" bit at the end means "WRITE" permission. You could also use O or R or OWNER or READ or WRITE instead.
You can find out more by reading the help page: $> gsutil help acl ch

How to authenticate google APIs with different service account credentials?

As anyone who has ever had the misfortune of having to interact with the panoply of Google CLI binaries programmatically will have realised, authenticating with the likes of gcloud, gsutil, bq, etc. is far from intuitive or trivial, especially when you need to work across different projects.
I am running various cron jobs that interact with Google Cloud Storage and BigQuery for different projects. Since the cron jobs may overlap, renaming config files is clearly not an option, and nor would any sane person take that approach.
There must surely be some sort of method of passing a path to a service account's key pair file to these CLI binaries, but bq help yields nothing.
The Google documentation, while verbose, is largely useless, taking one on a tour of how OAuth2 works, etc, instead of explaining what must surely be a very common requirement, vis-a-vis, how to actually authenticate a service account without running commands that modify central config files.
Can any enlightened being tell me whether the engineers at Google decided to add a feature as simple as passing the path to a service account's key pair file to the likes of gsutil and bq? Or perhaps I could simply export some variable so they know which key pair file to use for authentication?
I realise these simplistic approaches may be an insult to the intelligence, but we aren't concerning ourselves with harnessing nuclear fusion, so we needn't even consider what Amazon got so right with their approach to authentication in comparison...
Configuration in the Cloud SDK is global for the user, but you can specify what aspects of that config to use on a per command basis. To accomplish what you are trying to do you can:
gcloud auth activate-service-account foo#developer.gserviceaccount.com --key-file ...
gcloud auth activate-service-account bar#developer.gserviceaccount.com --key-file ...
At this point, both sets of credentials are in your global credentials store.
Now you can run:
gcloud --account foo#developer.gserviceaccount.com some-command
gcloud --account bar#developer.gserviceaccount.com some-command
in parallel, and each will use the given account without interfering.
A larger extension of this is 'configurations' which do the same thing, but for your entire set of config (including settings like account and project).
# Create first configuration
gcloud config configurations create myconfig
gcloud config configurations activate myconfig
gcloud config set account foo#developer.gserviceaccount.com
gcloud config set project foo
# Create second configuration
gcloud config configurations create anotherconfig
gcloud config configurations activate anotherconfig
gcloud config set account bar#developer.gserviceaccount.com
gcloud config set project bar
And you can say which configuration to use on a per command basis.
gcloud --configuration myconfig some-command
gcloud --configuration anotherconfig some-command
You can read more about configurations by running: gcloud topic configurations
All properties have corresponding environment variables that allow you to set that particular property for a single command invocation or for a terminal session. They take the form:
CLOUDSDK_<SECTION>_<PROPERTY>
for example: CLOUDSDK_CORE_ACCOUNT
You can see all the available config settings by running: gcloud help config
The equivalent of the --configuration flag is: CLOUDSDK_ACTIVE_CONFIG_NAME
If you really want complete isolation, you can also change the Cloud SDK's config directory by setting CLOUDSDK_CONFIG to a directory of your choosing. Note that if you do this, the config is completely separate including the credential store, all configurations, logs, etc.

Cannot delete Amazon S3 object

I have the following Amazon S3 object:
I cannot delete it. I have tried aws cli, 3Hub, and Amazon's Management Console. When I try using aws cli or 3Hub, I get a key does not exist error. When I try Amazon's Management Console, the object always reappears with the same last modified date.
I have noticed that the object has that %0A (linefeed?) on the end of the link and suspect that this is part of the problem.
How can I delete this object?
I have also opened a thread in the AWS forums here: https://forums.aws.amazon.com/thread.jspa?threadID=142946&tstart=0. I have also created a private support ticket -- which is getting good Amazon attention.
Update
Other things I am trying:
Using the s3curl tool (didn't work)
Using the AWS S3 CLI rm tool (didn't work)
Using the fixbucket command from s3cmd (didn't work)
Using a lifecycle rule (this worked after about 24 hours):
S3s lifecycle rules unfortunately do not accept wildcards. You will have to specify the fill in the ** in 'media/**/'. You do not need the * after 'Icon' however since lifecycle rules accept a prefix, which means that all keys prefixed with what you supply will be deleted.

Access files in s3n://elasticmapreduce/samples/wordcount/input

How I can I access the file sitting in the following folder of S3 which is own by someone else
s3n://elasticmapreduce/samples/wordcount/input
The files in s3n://elasticmapreduce/samples/wordcount/input are public, and made available as input by Amazon to the sample word count Hadoop program. The best way to fetch them is to
Start a new Amazon Elastic MapReduce Job Flow (it doesn't matter which one) from the Amazon Web Services console, and make sure that you keep the the job alive with the Keep Alive option
Once the EC2 machines have started, find the instances on EC2 from the Amazon Web Services console
ssh into one of the running EC2 instances, using the hadoop user, for example
ssh -i keypair.pem hadoop#ec2-IPADDRESS.compute-1.amazonaws.com
Obtain the files you need, using hadoop dfs -copyToLocal s3://elasticmapreduce/samples/wordcount/input/0002 .
sftp the files to your local system
You can access wordSplitter.py here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/wordSplitter.py
You can access the input files here:
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0012
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0011
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0010
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0009
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0008
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0007
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0006
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0005
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0004
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0003
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0002
https://elasticmapreduce.s3.amazonaws.com/samples/wordcount/input/0001
The owner of the folder (most likely a file in the folder) must have made it accessible to anonymous reader.
If that is the case, s3n://x/y... is translated to
http://s3.amazonaws.com/x/y...
or
http://x.s3.amazonaws.com/y...
x is the name of the bucket.
y... is the path wihtin the bucket.
If you want to make sure the file exists, e.g. if you suspect the name was misspelled, you can in your browser to open
http://s3.amazonaws.com/x
and you'll see XML describing "files" that is S3 objects, available.
Try this:
http://s3.amazonaws.com/elasticmapreduce
I tried this, and seems that the path you want is not public.
AWS EBS documentation quotes s3://elasticmapreduce/samples/wordcount/input in one of the "getting started" examples. But s3 is different from s3n, so input might be available to EMR, but not to HTTP access.
In Amazon S3, there is no concept of folders, a bucket it just a flat collection of objects. But you can list all the files you are interested in a browser with the following URL:
s3.amazonaws.com/elasticmapreduce?prefix=samples/wordcount/input/
Then you can download them by specifying the whole name, e.g.
s3.amazonaws.com/elasticmapreduce/samples/wordcount/input/0001