Long lived key/token based way to download google storage bucket objects with curl? - authentication

O.k. my fellow devops and coders. I have spent the last week trying to figure this out with Google (GCP) Cloud Storage objects. Here is my objective.
The solution needs to be light weight as it will be used to download images inside a docker image, hence the curl requirement.
The GCP bucket and object needs to be secure and not public.
I need a "long" lived ticket/key/client_ID.
I have tried the OAuth2.0 setup that Google's documentation mentions but everytime I want to setup an OAuth2.0 key it I do not get the option to have the "offline" access. AND to top it off it requires you to put in source URL's that will be accessing the auth request.
Also Google Cloud Storage does not support the key= like some of their other services. So here I have a an API KEY for my project as well as an OAuth JSON file for my service user and they are useless.
I can get a curl command to work with the temp OAuth bearer key but I need a long term solution for this.
RUN curl -X GET \
-H "Authorization: Bearer ya29.GlsoB-ck37IIrXkvYVZLIr3u_oGB8e60UyUgiP74l4UZ4UkT2aki2TI1ZtROKs6GKB6ZMeYSZWRTjoHQSMA1R0Q9wW9ZSP003MsAnFSVx5FkRd9-XhCu4MIWYTHX" \
-o "/home/shmac/test.tar.gz" \
"https://www.googleapis.com/storage/v1/b/mybucket/o/my.tar.gz?alt=media"
A long term key/ID/secret that will allow me to download a GCP bucket object from any location.

The solution needs to be lightweight as it will be used to download
images inside a docker image, hence the curl requirement.
This is a vague requirement. What is lightweight? No external libraries, everything written in assembly language, must fit in 1 KB, etc.
The GCP bucket and object needs to be secure and not public.
This normal requirement. With some exceptions (static file storage for websites, etc) you want your buckets to be private.
I need a "long" lived ticket/key/client_ID.
My advice is to stop thinking "long-term keys". The trend in security is to implement short-term keys. In Google Cloud Storage, seven-days is considered long-term. 3600 seconds (one hour) is the norm almost everywhere in Google Cloud.
For Google Cloud Storage you have several options. You did not specify the environment so I will include both user credentials, service account, and presigned-url based access.
User Credentials
You can authenticate with User Credentials (eg username#gmail.com) and save the Refresh Token. Then when an Access Token is required, you can generate one from the Refresh Token. In my website article about learning the Go language, I wrote a program on Day #8 which implements Google OAuth, saves the necessary credentials and creates Access Tokens and ID Tokens as required with no further "login" required. The comments in the source code should help you understand how this is done. https://www.jhanley.com/google-cloud-and-go-my-journey-to-learn-a-new-language-in-30-days/#day_08
This is the choice if you need to use User Credentials. This technique is more complicated, requires protecting the secrets file but will give you refreshable long term tokens.
Service Account Credentials
Service Account JSON key files are the standard method for service-to-service authentication and authorization. Using these keys, Access Tokens valid for one hour are generated. When they expire new ones are created. The max time is 3600 seconds.
This is the choice if you are programmatically accessing Cloud Storage with programs under your control (the service account JSON file must be protected).
Presigned-URLs
This is the standard method of providing access to private Google Cloud Storage objects. This method requires the URL and generates a signature with an expiration so that objects can be accessed for a defined period of time. One of your requirements (which is unrealistic) is that you don't want to use source URLs. The max time is seven-days.
This is the choice if you need to provide access to third-parties to access your Cloud Storage Objects.
IAM Based Access
This method does not use Access Tokens, instead, it uses Identity Tokens. Permissions are assigned to Cloud Storage buckets and objects and not to the IAM member account. This method requires a solid understanding of how Identities work in Google Cloud Storage and is the future direction for Google security - meaning for many services access will be controlled on a service/object basis and not via roles that grant wide access to an entire service in a project. I talk about this in my article on Identity Based Access Control
Summary
You have not clearly defined what will be accessing Cloud Storage, how secrets are stored, if the secrets need to be protected from users (public URL access), etc. The choice depends on a number of factors.
If you read the latest articles on my website I discuss a number of advanced techniques on Identity Based Access Control. These features are starting to appear on a number of Google Services in the beta level commands. This includes Cloud Scheduler, Cloud Pub/Sub, Cloud Functions, Cloud Run, Cloud KMS and soon more. Cloud Storage supports Identity Based Access which requires no permissions at all - the identity is used to control access.

Related

Using object storage with authorization for a web app

I'm contributing to developping a web (front+back) application, which uses OpenID Connect (with auth0) for authentication & authorization.
The web app needs authentication to access some public & some restricted information (restriction are per-user or depending on certain group-related rules).
We want to provide a upload/download features for documents such as .pdf, and we have implemented minIO (pretty similar to AWS S3) for public documents.
However, we can't wrap ou heads around restricted-access files :
should we implement OIDC on minIO for users to access directly the buckets but with temporary access tokens, allowing for fine-grained authorization policy
or should the back-office be the only one to have keys to minIO and be the intermediary between the object storage and users ?
Looking for good practices here, thanks in advance for your help.
Interesting question, since PDF docs are web static content unless they contain sensitive data. I would aim to separate secured (API) and non-secured (web) concerns on this one.
UNSECURED RESOURCES
If there is no security involved, connecting to a bucket from the front end makes sense. The bucket contents can also be distributed to a content delivery network, for best global performance. The PDF can be considered a web resource.
SECURED RESOURCES
Requests for these need to be treated as an API request, if a PDF doc contains sensitive data. APIs should receive an access token and enforce access to documents via scopes and claims.
You might use a Documents API for this. The implementation might still connect to a bucket, but this might be a different bucket that the browser does not have access to.
SUMMARY
This type of solution is often clearer if you think in terms of URL design. Eg the front end might have 2 document URLs:
publicDocs
secureDocs
By default I would treat docs that users upload as secure, unless they select an upload option such as make public.

Use Google Storage Transfer API to transfer data from external GCS into my GCS

I am working on a web application which comprises of ReactJs frontend and Java SpringBoot backend. This application would require users to upload data from their own Google Cloud storage into my Google Cloud Storage.
The application flow will be as follows -
The frontend requests the user for read access on their storage. For this I have used oauth 2.0 access tokens as described here
The generated Oauth token will be passed to the backend.
The backend will also have credentials for my service account to allow it to access my Google Cloud APIs. I have created the service account with required permissions and generated the key using the instructions from here
The backend will use the generated access token and my service account credentials to transfer the data.
In the final step, I want to create a transfer job using the google Storage-Transfer API. I am using the Java API client provided here for this.
I am having difficulty providing the authentication credentials to the transfer api.
In my understanding, there are two different authentications required - one for reading the user's bucket and another for starting the transfer job and writing the data in my cloud storage. I haven't found any relevant documentation or working examples for my use-case. In all the given samples, it is always assumed that the same service account credentials will have access to both the source and sink buckets.
tl;dr
Does the Google Storage Transfer API allow setting different source and target credentials for GCS to GCS transfers? If yes, how does one provide these credentials to the transfer job specification.
Any help is appreciated. Thanks!
This is not allowed for the the GCS Transfer API unfortunately, for this to work it would be required that the Service Account have access to both the source and the sink buckets, as you mentioned.
You can try opening a feature request in Google's Issue Tracker if you'd like so that Google's Product Team can consider such a functionality for newer versions of the API, also you could mention that this is subject is not touched in the documentation, so it can be improved.

How to get short lived access to specific Google Cloud Storage bucket from client mobile app?

I have a mobile app which authenticates users on my server. I'd like to store images of authenticated users in Google Cloud Storage bucket but I'd like to avoid uploading images via my server to google bucket, they should be directly uploaded (or downloaded) from the bucket.
(I also don't want to display another Google login to users to grant access to their bucket)
So my best case scenario would be that when user authenticates to my server, my server also generates short lived access token to specific Google storage bucket with read and write access.
I know that service accounts can generate accessTokens but I couldn't find any documentation if it is a good practice top pass these access tokens from server to client app and if it is possible to limit scope of the access token to specific bucket.
I found authorization documentation quite confusing and asking here what would be best practice approach to achieve access to the cloud storage for my case?
I think you are looking for signed urls.
A signed URL is a URL that provides limited permission and time to
make a request. Signed URLs contain authentication information in
their query string, allowing users without credentials to perform
specific actions on a resource.
Here you can see more about them in GCP. Here you have an explanation of how you can adapt them for your program.

Can GSuite be accessed by means of API key?

Suppose I have a simple node backend application which when ran needs to connect to a specific GSuite instance, query some things (users, groups, etc.) and then close and not run again until needed, which can mean either a very long time or a few seconds. From what I gathered from Google's documentation there may be multiple ways of doing this, including having an OAuth client and follow the whole flow in setting it up, managing token lifecycle, etc.
However I do NOT want to go with this option for now for various reasons and I am wondering if there is any way of getting access by means of an API Key / secret, like many other 3rd party services allow nowadays. Simply put I would like to generate a key pair somewhere on GSuite, no idea where, and use those keys for auth instead of OAuth, something Google suggests is possible, both on the GSuite Admin app (with a broken link that leads nowhere - not surprising) and on GCloud API and Credentials subpage where you setup credentials (however there it says that API Keys can only be used for very limited resources, none of them having anything to do with GSuite).
I think your best option is to see if what you want to do can be done by a service account. You can create a service account, grant administrator privileges to it in GSuite, enable some APIs, and then that account can do a lot of things without using OAuth directly. The credentials for the service account can then be provided to your application as a json key file, which it can use to authenticate to GSuite. You can also grant service accounts permissions to specific objects like files in Drive, but it doesn't sound like that would be sufficient to your needs.
A guide that may be helpful in the details of how to do this is https://m.fin.com/2017/10/04/navigating-the-google-suite-directory-api/

Google Cloud Storage: How can I grant an installed application access to only one bucket?

I'm developing an application that manipulates data in Google Cloud Storage
buckets owned by the user. I would like to set it up so the user can arrange to
grant the application access to only one of his or her buckets, for the sake of
compartmentalization of damage if the app somehow runs amok (or it is
impersonated by a bad actor or whatever).
But I'm more than a bit confused by the documentation around GCS authorization.
The docs on OAuth 2.0 authentication show that there are only three
choices for scopes: read-only, read-write, and full-control. Does this
mean that what I want is impossible, and if I grant access to read/write one
bucket I'm granting access to read/write all of my buckets?
What is extra confusing to me is that I don't understand how this all plays in
with GCS's notion of projects. It seems like I have to create a project to get
a client ID for my app, and the N users also have to create N projects for
their buckets. But then it doesn't seem to matter -- the client ID from project
A can access the buckets from project B. What are project IDs actually for?
So my questions, in summary:
Can I have my installed app request an access token that is good for only a
single bucket?
If not, are there any other ways that developers and/or careful users
typically limit access?
If I can't do this, it means the access token has serious security
implications. But I don't want to have to ask the user to go generate a new one
every time they run the app. What is the typical story for caching the token?
What exactly are project IDs for? Are they relevant to authorization in any
way?
I apologize for the scatter-brained question; it reflects what appears to be
scatter-brained documentation to me. (Or at least documentation that isn't
geared toward the installed application use case.)
I had the same problem as you.
Go to : https://console.developers.google.com
Go to Credentials and create new Client ID
You have to delete the email* in "permissions" of your projet.
And add it manually in the ACL of your bucket.
*= the email of the Service Account. xxxxxxxxxxxx-xxxxxxxxx#developer.gserviceaccount.com
if you are building an app. It's Server to server OAuth.
https://developers.google.com/accounts/docs/OAuth2ServiceAccount
"Can you be clearer about which project I create the client ID on (the developer's project that owns the installed application, or the user's project that own's the bucket)?"
the user's project that own's the bucket
It's the user taht own the bucket who grant access.
It turns out I'm using the wrong OAuth flow if I want to do this. Thanks to Euca
for the inspiration to figure this out.
At the time I asked the question, I was assuming there were multiple projects
involved in the Google Developers Console:
One project for me, the developer, that contained generated credentials for
an "installed application", with the client ID and (supposed) secret baked into
my source code.
One project for each of my users, owning and being billed for a bucket that
they were using the application to access.
Instead of using "installed application" credentials, what I did was switch to
"service account" credentials, generated by the user in the project that owns
their bucket. That allows them to create and download a JSON key file that they
can feed to my application, which then uses the JSON Web Tokens flow of OAuth
2.0 (aka "two-legged OAuth") to obtain authorization. The benefits of this are:
There is no longer a need for me to have my own project, which was a weird
wart in the process.
By default, the service account credentials allow my application to access
only the buckets owned by the project for which they were generated. If the
user has other projects with other buckets, the app cannot access them.
But, the service account has an "email address" just like any other user, and
can be added to the ACLs for any bucket regardless of project, granting
access to that bucket.
About your answer.
Glad you solved your problem.
You can also reduce the access to only ONE bucket of the projet. For example, if you have several buckets and the application does not need access to all.
By default, the service account has FULL access Read, write and ACL of all buckets. I usually limited to the needed bucket.