Amazon S3 ACL for read-only and write-once access - amazon-s3

I'm developing a web application and I currently have the following ACL assigned to the AWS account it uses to access its data:
{
"Statement": [
{
"Sid": "xxxxxxxxx", // don't know if this is supposed to be confidential
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::cdn.crayze.com/*"
]
}
]
}
However I'd like to make this a bit more restrictive so that if our AWS credentials were ever compromised, an attacker could not destroy any data.
From the documentation, it looks like I want to allow just the following actions: s3:GetObject and s3:PutObject, but I specifically want the account to only be able to create objects that don't exist already - i.e. a PUT request on an existing object should be denied. Is this possible?

This is not possible in Amazon S3 like you probably envisioned it; however, you can work around this limitation by Using Versioning which is a means of keeping multiple variants of an object in the same bucket and has been developed with use cases like this in mind:
You might enable versioning to prevent objects from being deleted or
overwritten by mistake, or to archive objects so that you can retrieve
previous versions of them.
There are a couple of related FAQs as well, for example:
What is Versioning? - Versioning allows you to preserve, retrieve, and restore every version of every object stored in an Amazon S3 bucket. Once you enable Versioning for a bucket, Amazon S3 preserves existing objects anytime you perform a PUT, POST, COPY, or DELETE operation on them. By default, GET requests will retrieve the most recently written version. Older versions of an overwritten or deleted object can be retrieved by specifying a version in the request.
Why should I use Versioning? - Amazon S3 provides customers with a highly durable storage infrastructure. Versioning offers an additional level of protection by providing a means of recovery when customers accidentally overwrite or delete objects. This allows you to easily recover from unintended user actions and application failures. You can also use Versioning for data retention and archiving. [emphasis mine]
How does Versioning protect me from accidental deletion of my objects? - When a user performs a DELETE operation on an object, subsequent default requests will no longer retrieve the object. However, all versions of that object will continue to be preserved in your Amazon S3 bucket and can be retrieved or restored. Only the owner of an Amazon S3 bucket can permanently delete a version. [emphasis mine]
If you are really paramount about the AWS credentials of the bucket owner (who can be different than the accessing users of course), you can take that one step further even, see How can I ensure maximum protection of my preserved versions?:
Versioning’s MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of
security. [...] If you enable Versioning with MFA Delete
on your Amazon S3 bucket, two forms of authentication are required to
permanently delete a version of an object: your AWS account
credentials and a valid six-digit code and serial number from an
authentication device in your physical possession. [...]

If this is accidental overwrite you are trying to avoid, and your business requirements allow a short time window of inconsistency, you can do the rollback in the Lambda function:
Make it a policy that "no new objects with the same name". Most of the time it will not happen. To enforce it:
Listen for S3:PutObject events in an AWS Lambda function.
When the event is fired, check whether more than one version is present.
If there is more than one version present, delete all but the newest one.
Notify the uploader what happened (it's useful to have the original uploader in x-amz-meta-* of the object. More info here).

You can now lock versions of objects with S3 Object Lock. It's a per-bucket setting, and allows you to place one of two kinds of WORM locks.
"retention period" - can't be changed
"legal hold" - can be changed by the bucket owner at any time
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock.html
As mentioned by #Kijana Woodard below, this does not prevent creation of new versions of objects.

Edit: Applicable if you came here from this question.
Object Locks only work in versioned buckets. If you can not enable versioning for your bucket, but can tolerate brief inconsistencies where files are presumed to exist while DELETE-ing them is still in-flight (S3 is only eventually-consistent) possibly resulting in PUT-after-DELETE failing intermittently if used in a tight-loop, or conversely, successive PUTs falsely succeeding intermittently, then the following solution may be appropriate.
Given the object path, read the Object's Content-Length header (from metadata, HeadObject request). Write the object only if the request succeeds, and where applicable, if length is greater than zero.

Related

Equivalent of AWS s3 x-amz-acl header in Azure and Google Cloud

In AWS S3 when uploading an object you can add "x-amz-acl=bucket-owner-full-control" to url (as query parameter) to make the object belong to the bucket and not the uploader. How do you achieve the same when using Cloud Storage or Azure Storage?
How do you achieve the same when using Cloud Storage or Azure Storage?
In Azure Storage, you don't have to do anything special. The ownership of objects (blobs) always lie with the storage account owner where the blob is being uploaded. They can delegate permissions to manage the blob to some other users but the ownership always remains with the account owner.
For Google Cloud Storage, the equivalent of uploading an object with the x-amz-acl=bucket-owner-full-control is to upload an object with the x-goog-acl=bucket-owner-full-control header. Switching the amz to goog works for most headers. There's a translation table of S3 to GCS headers.
In addition, if you're looking to make sure that all objects in a bucket are accessible by only the bucket owner, you may find it more convenient to use Uniform Bucket Level Access. Once enabled, individual object ownership within the bucket no longer exists, and you no longer need to specify that header with each upload.
You can enable Uniform Bucket Level Access from the UI, the API, or via this command: gsutil uniformbucketlevelaccess set on gs://BUCKET_NAME
Firebase Storage is closer to Dropbox or Google Drive where the owner is technically the bucket, Should you want to track who the owner is, you can however use the metadata
var newMetadata = {
customMetadata : {
'owner': auth().currentUser.uid
}
};
storageItemReference.updateMetadata(newMetadata)
.then((metadata) => {
// Updated metadata for your storage item is returned in the Promise
}).catch((error) => {
// Uh-oh, an error occurred!
});
https://firebase.google.com/docs/storage/web/file-metadata#custom_metadata
If you are finding that users are able to delete storage when they shouldn't, You can also control this behavior from Security Rules
service firebase.storage {
match /b/{bucket}/o {
// A read rule can be divided into read and list rules
match /images/{imageId} {
// Applies to single document read requests
allow get: if <condition>;
// Applies to list and listAll requests (Rules Version 2)
allow list: if <condition>;
// A write rule can be divided into create, update, and delete rules
match /images/{imageId} {
// Applies to writes to nonexistent files
allow create: if <condition>;
// Applies to updates to file metadata
allow update: if <condition>;
// Applies to delete operations
allow delete: if <condition>;
}
}
}
}
Source: https://firebase.google.com/docs/storage/security/core-syntax

Passing AWS role to the application that uses default boto3 configs

I have an aws setup that requires me to assume role and get corresponding credentials in order to write to s3. For example, to write with aws cli, I need to use --profile readwrite flag. If I write code myself with boot, I'd assume role via sts, get credentials, and create new session.
However, there is a bunch of applications and packages relying on boto3's configuration, e.g. internal code runs like this:
s3 = boto3.resource('s3')
result_s3 = s3.Object(bucket, s3_object_key)
result_s3.put(
Body=value.encode(content_encoding),
ContentEncoding=content_encoding,
ContentType=content_type,
)
From documentation, boto3 can be set to use default profile using (among others) AWS_PROFILE env variable, and it clearly "works" in terms that boto3.Session().profile_name does match the variable - but the applications still won't write to s3.
What would be the cleanest/correct way to set them properly? I tried to pull credentials from sts, and write them as AWS_SECRET_TOKEN etc, but that didn't work for me...
Have a look at the answer here:
How to choose an AWS profile when using boto3 to connect to CloudFront
You can get boto3 to use the other profile like so:
rw = boto3.session.Session(profile_name='readwrite')
s3 = rw.resource('s3')
I think the correct answer to my question is one shared by Nathan Williams in the comment.
In my specific case, given that I had to initiate code from python, and was a bit worried about setting AWS settings that might spill into other operations, I used
the fact that boto3 has DEFAULT_SESSION singleton, used each time, and just overwrote this with a session that assumed the proper role:
hook = S3Hook(aws_conn_id=aws_conn_id)
boto3.DEFAULT_SESSION = hook.get_session()
(here, S3Hook is airflow's s3 handling object). After that (in the same runtime) everything worked perfectly

s3 file downloads are unpredictable

We have uploaded several zip files to s3. All are in the hundreds of MB range.
We download the files, typically via a script, it appears that the file size and type both change. The new file size typically is about 300 bytes and the file type once downloaded is xml.
The content of the files look similar to this (whitespace added for clarity):
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>gpdb-5.0.0.0/greenplum-db-5.0.0.0-rhel5-x86_64.zip</Key>
<RequestId>83D2047BDBA195A6</RequestId>
<HostId>tXKFaiRaNjD26j6fcrTjCk858PGBH2RAjLE1aO4+8hovD6mf+hUzJvCdWKKgrDJGaHXsjWbQP2A=</HostId>
</Error>
Any thoughts as to what might be causing this? It does not happen all of the time. It's somewhat intermittent.
As you will note in the S3 API Reference, this isn't a file -- it's an error message.
Files in S3 are called Objects, and the path + filename of the object is referred to as the object key.
A key is the unique identifier for an object within a bucket. Every object in a bucket has exactly one key. [...] Every object in Amazon S3 can be uniquely addressed through the combination of the web service endpoint, bucket name, key, and optionally, a version. For example, in the URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, "doc" is the name of the bucket and "2006-03-01/AmazonS3.wsdl" is the key.
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#BasicsKeys
This error message, which should have been accompanied by a 404 Not Found error code when you tried to access the object, indicates that there is no object in the bucket whose key (path + filename) is the one shown in the error -- the one you requested. You should be able to confirm its absence in the S3 console.
If the object should have been uploaded some time in the past, this error means the object wasn't actually uploaded, or has subsequently been deleted.
If the object was very recently uploaded (typically within seconds), you should not get this error, but it is possible for this error to occur under either of two additional conditions:
If you try to check whether an object exists by sending a GET or HEAD request to the bucket, then uploaded the object. If you do this, a short period of time may elapse before the object is accessible because of internal optimizations inside S3. When you try to fetch a non-existent object, S3 may -- for a brief time -- have an internal concept that the object is not there, even though it has safely been stored. Retry your request.
If you already had an object with the same key, then you deleted it, then you uploaded a new object with the same key, for a brief time after the new upload, you could either get the error above or you could actually download the old object again.
These conditions are somewhat uncommon, but they can occur, particularly if your bucket has a lot of traffic, due to S3's consistency model, which is an engineered tradeoff between performance, reliability, and immediate availability of uploaded objects when the same object has been recently downloaded, attempted to be downloaded, deleted, or overwritten.
The <RequestId> and <HostId> codes in the error response are opaque diagnostic codes that you can provide to AWS support, if you need to submit a support request about a specific problem you are experiencing with S3... they can use these to find the specific request and identify the problem. They are not considered sensitive information, since they have no meaning outside of AWS.
In this case, there is no apparent need to contact AWS support, because it appears that you are simply trying to download an object that is not in the specific bucket from which you tried to download this particular file. If you get alternating success and failure for the exact same file, that's unexpected, and a support case might be in order... but typically, an internal error in S3 should result in a very different response.

S3 notification when file is overwritten, or deleted

since we store our log files on S3 and to meet PCI requirements we have to be notified when someone tampers with the log files.
How can I be notified every time a put request is placed that replaces an existing object, or when an existing object is delete. The alert should not fire if a new object is created unless it replaces an existing one.
S3 does not currently provide deletion or overwrite-only notifications. Deletion notifications were added after the initial launch of the notification feature and can notify you when an object is deleted, but does not notify you when on object is implicitly deleted by overwrite.
However, S3 does have functionality to accomplish what you need, in a way that seems superior to what you are contemplating: object versioning and multi-factor authentication for deletion, both discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html
With versioning enabled on the bucket, an overwrite of a file doesn't remove the old version of the file. Instead, each version of the file has an opaque string, assigned by S3, identifying the Version ID.
If someone overwrites a file, you would then have two versions of the same file in the bucket -- the original one and the new one -- so you not only have evidence of tampering, you also have the original file, undisturbed. Any object with more than one version in the bucket has, by definition, been overwritten at some point.
If you also enable Multi-Factor Authentication (MFA) Delete, then none of the versions of any object can be removed without access to the hardware or virtual MFA device.
As an developer of AWS utilities, tools, and libraries (3rd party; I'm not affiliated with Amazon), I am highly impressed by Amazon's implementation of object versioning in S3, because it works in such a way that client utilities that are unaware of versioning or that versioning is enabled on the bucket should not be affected in any way. This means you should be able to activate versioning on a bucket without changing anything in your existing code. For example:
fetching an object without an accompanying version id in the request simply fetches the newest version of the object
objects in versioned buckets aren't really deleted unless you explicitly delete a particular version; however, you can still "delete an object," and get the expected response back. Subsequently fetching the "deleted" object without specifying an accompanying version id still returns a 404 Not Found, as in the non-versioned environment, with the addition of an unobtrusive x-amz-delete-marker: header included in the response to indicate that the "latest version" of the object is in fact a delete marker placeholder. The individual versions of the "deleted" object remain accessible to version-aware code, unless purged.
other operations that are unrelated to versioning, which work on non-versioned buckets, continue to work the same way they did before versioning was enabled on the bucket.
But, again... with code that is version-aware, including the AWS console (two new buttons appear when you're looking at a versioned bucket -- you can choose to view it with a versioning-aware console view or versioning-unaware console view) you can iterate through the different versions of an object and fetch any version that has not been permanently removed... but preventing unauthorized removal of objects is the point of MFA delete.
Additionally, of course, there's bucket logging, which is typically only delayed by a few minutes from real-time and could be used to detect unusual activity... the history of which would be preserved by the bucket versioning.

Gemfire region with data expiration

Regarding this document, "entry-time-to-live-expiration" means How long the region's entries can remain in the cache without being accessed or updated. The default is no expiration of this type. However, when I use Spring Cache and client-region with following configuration, I find that setting dose not work well with being accessed. Going forward, regarding this document-> XMLTTL tab, it said "Configures a replica region to invalidate entries that have not been modified for 15 seconds.". So I am confused if TTL work for being accessed.
<gfe:client-region id="Customer2" name="Customer2" destroy="false" load-factor="0.5" statistics="true" cache-ref="client-cache">
<gfe:entry-ttl action="DESTROY" timeout="60"/>
<gfe:eviction threshold="5"/>
</gfe:client-region>
So, the documentation you might want refer to is here and here. Perhaps relevant to your situation is...
"Requests for entries that have expired on the consumers will be forwarded to the producer."
Based on your configuration, given you did not set either a ClientRegionShortcut or DataPolicy, your Client Region, "Customer2", defaults to ClientRegionShortcut.LOCAL, which sets a DataPolicy of "NORMAL". DataPolicy.NORMAL states...
"Allows the contents in this cache to differ from other caches. Data that this region is interested in is stored in local memory."
And for the shortcut of "LOCAL"...
"A LOCAL region only has local state and never sends operations to a server. ..."
However, it does not mean the client Region cannot receive data (of interests) from the Server. It simply implies operations are not distributed to the Server. It may be expiring the entry and then repopulating it from the Server (producer).
Of course, I am speculating and have not tested these ideas. You might try setting the Expiration Action to "LOCAL_DESTROY" and/or changing your distribution properties through different ClientRegionShortcuts.
Post back if you are still having problems. I too echo what #hubbardr is asking.
Cheers!