I am attempting to use an S3 bucket as a deployment location for an internal, auto-updating application's files. It would be the location where the new version's files are dumped for the application to puck up on an update. Since this is an internal application, I was hoping to have the URL be private, but to be able to access it using only a URL. I was hoping to look into using third party auto updating software, which means I can't use the Amazon API to access it.
Does anyone know a way to get a URL to a private bucket on S3?
You probably want to use one of the available AWS Software Development Kits (SDKs), which all implement the respective methods to generate these URLs by means of the GetPreSignedURL() method (e.g. Java: generatePresignedUrl(), C#: GetPreSignedURL()):
The GetPreSignedURL operations creates a signed http request. Query
string authentication is useful for giving HTTP or browser access to
resources that would normally require authentication. When using query
string authentication, you create a query, specify an expiration time
for the query, sign it with your signature, place the data in an HTTP
request, and distribute the request to a user or embed the request in
a web page. A PreSigned URL can be generated for GET, PUT and HEAD
operations on your bucket, keys, and versions.
There are a couple of related questions already and e.g. Why is my S3 pre-signed request invalid when I set a response header override that contains a “+”? contains a working sample in C# (aside from the content type issue Ragesh is experiencing of course).
Good luck!
Related
Our application data storage is backed by Google Cloud Storage (and S3 and Azure Blob Storage). We need to give access to this storage to random outside tools (upload from local disk using CLI tools, unload from analytical database like Redshift, Snowflake and others). The specific use case is that users need to upload multiple big files (you can think about it much like m3u8 playlists for streaming videos - it's m3u8 playlist and thousands of small video files). The tools and users MAY not be affiliated with Google in any way (may not have Google account). We also absolutely need to data transfer to be directly to the storage, outside of our servers.
In S3 we use federation tokens to give access to a part of the S3 bucket.
So model scenario on AWS S3:
customer requests some data upload via our API
we give customers S3 credentials, that are scoped to s3://customer/project/uploadId, allowing upload of new files
client uses any tool to upload the data
client uploads s3://customer/project/uploadId/file.manifest, s3://customer/project/uploadId/file.00001, s3://customer/project/uploadId/file.00002, ...
other data (be it other uploadId or project) in the bucket is safe because the given credentials are scoped
In ABS we use STS token for the same purpose.
GCS does not seem to have anything similar, except for Signed URLs. Signed URLs have a problem though that they refer to a single file. That would either require us to know in advance how many files will be uploaded (we don't know) or the client would need to request each file's signed URL separately (strain on our API and also it's slow).
ACL seemed to be a solution, but it's only tied to Google-related identities. And those can't be created on demand and fast. Service users are also and option, but their creation is slow and generally they are discouraged for this use case IIUC.
Is there a way to create a short lived credentials that are limited to a subset of the CGS bucket?
Ideal scenario would be that the service account we use in the app would be able to generate a short lived token that would only have access to a subset of the bucket. But nothing such seems to exist.
Unfortunately, no. For retrieving objects, signed URLs need to be for exact objects. You'd need to generate one per object.
Using the * wildcard will specify the subdirectory you are targeting and will identify all objects under it. For example, if you are trying to access objects in Folder1 in your bucket, you would use gs://Bucket/Folder1/* but the following command gsutil signurl -d 120s key.json gs://bucketname/folderName/** will create a SignedURL for each of the files inside your bucket but not a single URL for the entire folder/subdirectory
Reason : Since subdirectories are just an illusion of folders in a bucket and are actually object names that contain a ‘/’, every file in a subdirectory gets its own signed URL. There is no way to create a single signed URL for a specific subdirectory and allow its files to be temporarily available.
There is an ongoing feature request for this https://issuetracker.google.com/112042863. Please raise your concern here and look for further updates.
For now, one way to accomplish this would be to write a small App Engine app that they attempt to download from instead of directly from GCS which would check authentication according to whatever mechanism you're using and then, if they pass, generate a signed URL for that resource and redirect the user.
Reference : https://stackoverflow.com/a/40428142/15803365
Originally I set up an S3 bucket "bucket.mydomain.com" and used a CNAME in my DNS so I could pull files from there as if it was a subdomain. This worked for http with:
bucket.mydomain.com/image.jpg
or with https like:
s3.amazonaws.com/bucket.mydomain.com/image.jpg
Some files in this bucket were public access but some were "authenticated read" so that I would have to generate a signed URL with expiration in order for them to be read/downloaded.
I wanted to be able to use https without the amazon name in the URL, so I setup a CloudFront distribution with the S3 bucket as the origin. Now I can use https like:
bucket.mydomain.com/image.jpg
The problem I have now is that it seems either all my files in the bucket have to be public read, or they all have to be authenticated read.
How can I force signed URLs to be used for some files, but have other files be public read?
it seems either all my files in the bucket have to be public read, or they all have to be authenticated read
That is -- sort of -- correct, at least in a simple configuration.
CloudFront has a feature called an Origin Access Identity (OAI) that allows it to authenticate requests that it sends to your bucket.
CloudFront also supports controlling viewer access to your resources using CloudFront signed URLs (and signed cookies).
But these two features are independent of each other.
If an OAI is configured, it always sends authentication information to the bucket, regardless of whether the object is private or public.
Similarly, if you enable Restrict Viewer Access for a cache behavior, CloudFront will always require viewer requests to be signed, regardless of whether the object is private or public (in the bucket), because CloudFront doesn't know.
There are a couple of options.
If your content is separated logically by path, the solution is simple: create multiple Cache Behaviors, with Path Patterns to match, like /public/* or /private/* and configure them with individual, appropriate Restrict Viewer Access settings. Whether the object is public in the bucket doesn't matter, since CloudFront will pass-through requests for (e.g.) /public/* without requiring a signed URL if that Cache Behavior does not "Restrict Viewer Access." You can create 25 unique Cache Behavior Path Patterns by default.
If that is not a solution, you could create two CloudFront distributions. One would be without an OAI and without Restrict Viewer Acccess enabled. This distribution can only fetch public objects. The second distribution would have an OAI and would require signed URLs. You would use this for private objects (it would work for public objects, too -- but they would still need signed URLs). There would be no price difference here, but you might have cross-origin issues to contend with.
Or, you could modify your application to sign all URLs for otherwise public content when HTML is being rendered (or API responses, or whatever the context is for your links).
Or, depending on the architecture of your platform, there are probably other more complex approaches that might make sense, depending on the mix of public and private and your willingness to add some intelligence at the edge with Lambda#Edge triggers, which can do things like inspect/modify requests in flight, consult external logic and data sources (e.g. look up a session cookie in DynamoDB), intercept errors, and generate redirects.
Michael's description is good. Amazon has also stated (link below) "Signature Version 2 is being deprecated, and the final support for Signature Version 2 will end on June 24, 2019."
https://docs.aws.amazon.com/AmazonS3/latest/dev/auth-request-sig-v2.html
The Amazon S3 integration docs for Fine Uploader instruct users to create an AJAX handler to sign an S3 upload policy generated by the client after performing server-side verification.
In my application, it would make more sense to construct the policy on the server, sign it, and return the entire package to the client to present to S3 for the upload. Is there any way to configure Fine Uploader to pull a server-generated policy instead of asking the server to validate and sign a client-generated one?
To answer your initial question, it is possible to override some elements of the generated policy, but there are some items, such as the key, that you cannot change via the policy document. This is discussed more in Github issue #1120.
If you want to override portions of the policy document, you'll have to disable chunking (since policy documents aren't part of chunked uploads, as described in the comments). Your best bet is to simply validate the policy/header strings. It's pretty easy to do this, and what elements you validate depending entirely on your application requirements.
I'm building a web application and am looking into using Amazon S3 to store user uploads.
My concern is, I dont want user A to see his download link for a document he uploaded is urltoMyS3/doc1234.pdf and try urltoMyS3/doc1235.pdf and get another users document.
The only way I can think of to do this, is to only allow the web application to connect to S3, then check if the user has access to a file on the web application, have the web app download the file, and then serve it to the client. The problem with this method is the application would have to download the file first and would inevitably slow the download process down for the user.
How is user files typically handled with Amazon S3? Or is it simply not typically used in a scenario where the files should not be public? Is there another service for something like this?
Thanks
You can implement Query String Authentication, which will solve your problem.
Query string authentication is useful for giving HTTP or browser
access to resources that would normally require authentication. The
signature in the query string secures the request. Query string
authentication requests require an expiration date. You can specify
any future expiration time in epoch or UNIX time (number of seconds
since January 1, 1970).
You can do this by generating the appropriate links, see the following
https://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html#RESTAuthenticationQueryStringAuth
If time-bound authentication will not work for (as suggested in other answers). You could consider implementing something like s3fs to mount your S3 bucket as a drive on your web application server. In this manner you can simply make your authentication and then serve up the file directly to the user, without them having any idea that the file resides in S3. Similarly, you can simply write uploaded files directly to this s3fs mount.
S3fs, also allows you to configure a local cache of the S3 directory on your machine for faster access.
This works nicely in a cluster web server environment as well, as you can just have each server mount the s3fs drive and perform/read/writes on it independently.
A link with more info
I'm thinking about whether to host uploaded media files (video and audio) on S3 instead of locally. I need to check user's permissions on each download.
So there would be an action like get_file, which first checks the user's permissions and then gets the file from S3 and sends it using send_file to the user.
def get_file
if #user.can_download(params[:file_id])
# first, download the file from S3 and then send it to the user using send_file
end
end
But in this case, the server (unnecessarily) downloads the file first from S3 and then sends it to the user. I thought the use case for S3 was to bypass the Rails/HTTP server stack for reduced load.
Am I thinking this wrong?
PS. I'm using CarrierWave for file uploads. Not sure if that's relevant.
Amazon S3 provides something called RESTful authenticated reads, which are basically timeoutable URLs to otherwise protected content.
CarrierWave provides support for this. Simply declare S3 access policy to authenticated read:
config.s3_access_policy = :authenticated_read
and then model.file.url will automatically generate the RESTful URL.
Typically you'd embed the S3 URL in your page, so that the client's browser fetches the file directly from Amazon. Note however that this exposes the raw unprotected URL. You could name the file with a long hash instead of something predictable, so it's at least not guessable -- but once that URL is exposed, it's essentially open to the Internet. So if you absolutely always need access control on the files, then you'll need to proxy it like you're currently doing. In that case, you may decide it's just better to store the file locally.