Cloud storage services and session-based file-URLs - amazon-s3

I have the following use-case that I am seeking a solution for:
Our website shares files to our clients. The files are stored on a 3rd party cloud service, the file access permissions on our website. When a client on our site requests a file that he has permission to see, it will be served directly from the cloud service (instead of through our own webserver, using our CPU, RAM and bandwidth).
I see services like Amazon S3 and Google Cloud Storage use an approach with a signed URL with a timeout for this purpose, but I would prefer a solution where that URL is only available to the client who requested the resource (and not everyone who has the link during the lifecycle of the URL). The reason for this is that it feels wrong to rely on a duration based un an arbitrary length instead of utilizing a one-time token or in any other way validate the access to the resource before the request is completed.
Does any of the major services provide a feature that would allow for this? Or is it considered "safe enough" to protect sensitive data behind a random URL + timeout period (to me it feels like the answer to the latter is "no")?

Related

Using object storage with authorization for a web app

I'm contributing to developping a web (front+back) application, which uses OpenID Connect (with auth0) for authentication & authorization.
The web app needs authentication to access some public & some restricted information (restriction are per-user or depending on certain group-related rules).
We want to provide a upload/download features for documents such as .pdf, and we have implemented minIO (pretty similar to AWS S3) for public documents.
However, we can't wrap ou heads around restricted-access files :
should we implement OIDC on minIO for users to access directly the buckets but with temporary access tokens, allowing for fine-grained authorization policy
or should the back-office be the only one to have keys to minIO and be the intermediary between the object storage and users ?
Looking for good practices here, thanks in advance for your help.
Interesting question, since PDF docs are web static content unless they contain sensitive data. I would aim to separate secured (API) and non-secured (web) concerns on this one.
UNSECURED RESOURCES
If there is no security involved, connecting to a bucket from the front end makes sense. The bucket contents can also be distributed to a content delivery network, for best global performance. The PDF can be considered a web resource.
SECURED RESOURCES
Requests for these need to be treated as an API request, if a PDF doc contains sensitive data. APIs should receive an access token and enforce access to documents via scopes and claims.
You might use a Documents API for this. The implementation might still connect to a bucket, but this might be a different bucket that the browser does not have access to.
SUMMARY
This type of solution is often clearer if you think in terms of URL design. Eg the front end might have 2 document URLs:
publicDocs
secureDocs
By default I would treat docs that users upload as secure, unless they select an upload option such as make public.

How to restrict public user access to s3 buckets or minIO?

I have got a question about minio or s3 policy. I am using a stand-alone minio server for my project. Here is the situation :
There is only one admin account that receives files and uploads them to minio server.
My Users need to access just their own uploaded objects. I mean another user is not supposed to see other people's object publicly (e.g. by visiting direct link in URL).
Admin users are allowed to see all objects in any circumstances.
1. How can i implement such policies for my project considering i have got my database for user authentication and how can i combine them to authenticate the user.
2. If not what other options do i have here to ease the process ?
Communicate with your storage through the application. Do policy checks, authentication or authorization in the app and store/grab files to/from storage and make the proper response. I guess this is the only way you can have limitation on uploading/downloading files using Minio.
If you're using a framework like Laravel built in S3 driver works perfectly with Minio; Otherwise it's just matter of a HTTP call. Minio provides HTTP APIs.

How to get short lived access to specific Google Cloud Storage bucket from client mobile app?

I have a mobile app which authenticates users on my server. I'd like to store images of authenticated users in Google Cloud Storage bucket but I'd like to avoid uploading images via my server to google bucket, they should be directly uploaded (or downloaded) from the bucket.
(I also don't want to display another Google login to users to grant access to their bucket)
So my best case scenario would be that when user authenticates to my server, my server also generates short lived access token to specific Google storage bucket with read and write access.
I know that service accounts can generate accessTokens but I couldn't find any documentation if it is a good practice top pass these access tokens from server to client app and if it is possible to limit scope of the access token to specific bucket.
I found authorization documentation quite confusing and asking here what would be best practice approach to achieve access to the cloud storage for my case?
I think you are looking for signed urls.
A signed URL is a URL that provides limited permission and time to
make a request. Signed URLs contain authentication information in
their query string, allowing users without credentials to perform
specific actions on a resource.
Here you can see more about them in GCP. Here you have an explanation of how you can adapt them for your program.

Long lived key/token based way to download google storage bucket objects with curl?

O.k. my fellow devops and coders. I have spent the last week trying to figure this out with Google (GCP) Cloud Storage objects. Here is my objective.
The solution needs to be light weight as it will be used to download images inside a docker image, hence the curl requirement.
The GCP bucket and object needs to be secure and not public.
I need a "long" lived ticket/key/client_ID.
I have tried the OAuth2.0 setup that Google's documentation mentions but everytime I want to setup an OAuth2.0 key it I do not get the option to have the "offline" access. AND to top it off it requires you to put in source URL's that will be accessing the auth request.
Also Google Cloud Storage does not support the key= like some of their other services. So here I have a an API KEY for my project as well as an OAuth JSON file for my service user and they are useless.
I can get a curl command to work with the temp OAuth bearer key but I need a long term solution for this.
RUN curl -X GET \
-H "Authorization: Bearer ya29.GlsoB-ck37IIrXkvYVZLIr3u_oGB8e60UyUgiP74l4UZ4UkT2aki2TI1ZtROKs6GKB6ZMeYSZWRTjoHQSMA1R0Q9wW9ZSP003MsAnFSVx5FkRd9-XhCu4MIWYTHX" \
-o "/home/shmac/test.tar.gz" \
"https://www.googleapis.com/storage/v1/b/mybucket/o/my.tar.gz?alt=media"
A long term key/ID/secret that will allow me to download a GCP bucket object from any location.
The solution needs to be lightweight as it will be used to download
images inside a docker image, hence the curl requirement.
This is a vague requirement. What is lightweight? No external libraries, everything written in assembly language, must fit in 1 KB, etc.
The GCP bucket and object needs to be secure and not public.
This normal requirement. With some exceptions (static file storage for websites, etc) you want your buckets to be private.
I need a "long" lived ticket/key/client_ID.
My advice is to stop thinking "long-term keys". The trend in security is to implement short-term keys. In Google Cloud Storage, seven-days is considered long-term. 3600 seconds (one hour) is the norm almost everywhere in Google Cloud.
For Google Cloud Storage you have several options. You did not specify the environment so I will include both user credentials, service account, and presigned-url based access.
User Credentials
You can authenticate with User Credentials (eg username#gmail.com) and save the Refresh Token. Then when an Access Token is required, you can generate one from the Refresh Token. In my website article about learning the Go language, I wrote a program on Day #8 which implements Google OAuth, saves the necessary credentials and creates Access Tokens and ID Tokens as required with no further "login" required. The comments in the source code should help you understand how this is done. https://www.jhanley.com/google-cloud-and-go-my-journey-to-learn-a-new-language-in-30-days/#day_08
This is the choice if you need to use User Credentials. This technique is more complicated, requires protecting the secrets file but will give you refreshable long term tokens.
Service Account Credentials
Service Account JSON key files are the standard method for service-to-service authentication and authorization. Using these keys, Access Tokens valid for one hour are generated. When they expire new ones are created. The max time is 3600 seconds.
This is the choice if you are programmatically accessing Cloud Storage with programs under your control (the service account JSON file must be protected).
Presigned-URLs
This is the standard method of providing access to private Google Cloud Storage objects. This method requires the URL and generates a signature with an expiration so that objects can be accessed for a defined period of time. One of your requirements (which is unrealistic) is that you don't want to use source URLs. The max time is seven-days.
This is the choice if you need to provide access to third-parties to access your Cloud Storage Objects.
IAM Based Access
This method does not use Access Tokens, instead, it uses Identity Tokens. Permissions are assigned to Cloud Storage buckets and objects and not to the IAM member account. This method requires a solid understanding of how Identities work in Google Cloud Storage and is the future direction for Google security - meaning for many services access will be controlled on a service/object basis and not via roles that grant wide access to an entire service in a project. I talk about this in my article on Identity Based Access Control
Summary
You have not clearly defined what will be accessing Cloud Storage, how secrets are stored, if the secrets need to be protected from users (public URL access), etc. The choice depends on a number of factors.
If you read the latest articles on my website I discuss a number of advanced techniques on Identity Based Access Control. These features are starting to appear on a number of Google Services in the beta level commands. This includes Cloud Scheduler, Cloud Pub/Sub, Cloud Functions, Cloud Run, Cloud KMS and soon more. Cloud Storage supports Identity Based Access which requires no permissions at all - the identity is used to control access.

How to protect secrets properly?

I am using HERE api in both frontend and backend. If I try to put my app_id and app_code into the frontend code, it will be available to anyone seeing my site.
I can try to create a domain whitelist and put my domain in this. But still, if I set the HTTP header "Referer" to my domain, I am able to access the API from any IP.
So, what do I do?
The Difference Between WHO and WHAT is Accessing the API Server
Before I dive into your problem I would like to first clear a misconception about WHO and WHAT is accessing an API server.
To better understand the differences between the WHO and the WHAT are accessing an API server, let’s use this picture:
So replace the mobile app by web app, and keep following my analogy around this picture.
The Intended Communication Channel represents the web app being used as you expected, by a legit user without any malicious intentions, communicating with the API server from the browser, not using Postman or using any other tool to perform a man in the middle(MitM) attack.
The actual channel may represent several different scenarios, like a legit user with malicious intentions that may be using Curl or a tool like Postman to perform the requests, a hacker using a MitM attack tool, like MitmProxy, to understand how the communication between the web app and the API server is being done in order to be able to replay the requests or even automate attacks against the API server. Many other scenarios are possible, but we will not enumerate each one here.
I hope that by now you may already have a clue why the WHO and the WHAT are not the same, but if not it will become clear in a moment.
The WHO is the user of the web app that we can authenticate, authorize and identify in several ways, like using OpenID Connect or OAUTH2 flows.
OAUTH
Generally, OAuth provides to clients a "secure delegated access" to server resources on behalf of a resource owner. It specifies a process for resource owners to authorize third-party access to their server resources without sharing their credentials. Designed specifically to work with Hypertext Transfer Protocol (HTTP), OAuth essentially allows access tokens to be issued to third-party clients by an authorization server, with the approval of the resource owner. The third party then uses the access token to access the protected resources hosted by the resource server.
OpenID Connect
OpenID Connect 1.0 is a simple identity layer on top of the OAuth 2.0 protocol. It allows Clients to verify the identity of the End-User based on the authentication performed by an Authorization Server, as well as to obtain basic profile information about the End-User in an interoperable and REST-like manner.
While user authentication may let the API server know WHO is using the API, it cannot guarantee that the requests have originated from WHAT you expect, the browser were your web app should be running from, with a real user.
Now we need a way to identify WHAT is calling the API server, and here things become more tricky than most developers may think. The WHAT is the thing making the request to the API server. Is it really a genuine instance of the web app, or is a bot, an automated script or an attacker manually poking around with the API server, using a tool like Postman?
For your surprise, you may end up discovering that It can be one of the legit users manipulating manually the requests or an automated script that is trying to gamify and take advantage of the service provided by the web app.
Well, to identify the WHAT, developers tend to resort to an API key that usually is sent in the headers of the web app. Some developers go the extra mile and compute the key at run-time in the web app, inside obfuscated javascript, thus it becomes a runtime secret, that can be reverse engineered by deobusfaction tools, and by inspecting the traffic between the web app and API server with the F12 or MitM tools.
The above write-up was extracted from an article I wrote, entitled WHY DOES YOUR MOBILE APP NEED AN API KEY?. While in the context of a Mobile App, the overall idea is still valid in the context of a web app. You wish you can read the article in full here, that is the first article in a series of articles about API keys.
Your Problem
I can try to create a domain whitelist and put my domain in this. But still, if I set the HTTP header "Referer" to my domain, I am able to access the API from any IP.
So this seems to be related with using the HERE admin interface, and I cannot help you here...
So, what do I do?
I am using HERE API in both frontend and backend.
The frontend MUST always delegate access to third part APIs into a backend that is under the control of the owner of the frontend, this way you don't expose access credentials to access this third part services in your frontend.
So the difference is that now is under your direct control how you will protect against abuse of HERE API access, because you are no longer exposing to the public the HERE api_id and api_code, and access to it must be processed through your backend, where your access secrets are hidden from public pry eyes, and where you can easily monitor and throttle usage, before your bill skyrockets in the HERE API.
If I try to put my app_id and app_code into the frontend code, it will be available to anyone seeing my site.
So to recap, the only credentials you SHOULD expose in your frontend is the ones to access your backend, the usual api-key and Authorization tokens, or whatsoever you want to name them, not the api_id or api_code to access the HERE API. This approach leaves you only with one access to protect, instead of multiple ones.
Defending an API Server
As I already said, but want to reinforce a web app should only communicate with an API server that is under your control and any access to third part APIs services must be done by this same API server you control. This way you limit the attack surface to only one place, where you will employ as many layers of defence as what you are protecting is worth.
For an API serving a web app, you can employ several layers of dense, starting with reCaptcha V3, followed by Web Application Firewall(WAF) and finally if you can afford it a User Behavior Analytics(UBA) solution.
Google reCAPTCHA V3:
reCAPTCHA is a free service that protects your website from spam and abuse. reCAPTCHA uses an advanced risk analysis engine and adaptive challenges to keep automated software from engaging in abusive activities on your site. It does this while letting your valid users pass through with ease.
...helps you detect abusive traffic on your website without any user friction. It returns a score based on the interactions with your website and provides you more flexibility to take appropriate actions.
WAF - Web Application Firewall:
A web application firewall (or WAF) filters, monitors, and blocks HTTP traffic to and from a web application. A WAF is differentiated from a regular firewall in that a WAF is able to filter the content of specific web applications while regular firewalls serve as a safety gate between servers. By inspecting HTTP traffic, it can prevent attacks stemming from web application security flaws, such as SQL injection, cross-site scripting (XSS), file inclusion, and security misconfigurations.
UBA - User Behavior Analytics:
User behavior analytics (UBA) as defined by Gartner is a cybersecurity process about the detection of insider threats, targeted attacks, and financial fraud. UBA solutions look at patterns of human behavior, and then apply algorithms and statistical analysis to detect meaningful anomalies from those patterns—anomalies that indicate potential threats. Instead of tracking devices or security events, UBA tracks a system's users. Big data platforms like Apache Hadoop are increasing UBA functionality by allowing them to analyze petabytes worth of data to detect insider threats and advanced persistent threats.
All these solutions work based on a negative identification model, by other words they try their best to differentiate the bad from the good by identifying what is bad, not what is good, thus they are prone to false positives, despite the advanced technology used by some of them, like machine learning and artificial intelligence.
So you may find yourself more often than not in having to relax how you block the access to the API server in order to not affect the good users. This also means that these solutions require constant monitoring to validate that the false positives are not blocking your legit users and that at the same time they are properly keeping at bay the unauthorized ones.
Summary
Anything that runs on the client side and needs some secret to access an API can be abused in different ways and you must delegate the access to all third part APIs to a backend under your control, so that you reduce the attack surface, and at the same time protect their secrets from public pry eyes.
In the end, the solution to use in order to protect your API server must be chosen in accordance with the value of what you are trying to protect and the legal requirements for that type of data, like the GDPR regulations in Europe.
So using API keys may sound like locking the door of your home and leave the key under the mat, but not using them is liking leaving your car parked with the door closed, but the key in the ignition.
Going the Extra Mile
OWASP Web Top 10 Risks
The OWASP Top 10 is a powerful awareness document for web application security. It represents a broad consensus about the most critical security risks to web applications. Project members include a variety of security experts from around the world who have shared their expertise to produce this list.