S3 notification when file is overwritten, or deleted - amazon-s3

since we store our log files on S3 and to meet PCI requirements we have to be notified when someone tampers with the log files.
How can I be notified every time a put request is placed that replaces an existing object, or when an existing object is delete. The alert should not fire if a new object is created unless it replaces an existing one.

S3 does not currently provide deletion or overwrite-only notifications. Deletion notifications were added after the initial launch of the notification feature and can notify you when an object is deleted, but does not notify you when on object is implicitly deleted by overwrite.
However, S3 does have functionality to accomplish what you need, in a way that seems superior to what you are contemplating: object versioning and multi-factor authentication for deletion, both discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html
With versioning enabled on the bucket, an overwrite of a file doesn't remove the old version of the file. Instead, each version of the file has an opaque string, assigned by S3, identifying the Version ID.
If someone overwrites a file, you would then have two versions of the same file in the bucket -- the original one and the new one -- so you not only have evidence of tampering, you also have the original file, undisturbed. Any object with more than one version in the bucket has, by definition, been overwritten at some point.
If you also enable Multi-Factor Authentication (MFA) Delete, then none of the versions of any object can be removed without access to the hardware or virtual MFA device.
As an developer of AWS utilities, tools, and libraries (3rd party; I'm not affiliated with Amazon), I am highly impressed by Amazon's implementation of object versioning in S3, because it works in such a way that client utilities that are unaware of versioning or that versioning is enabled on the bucket should not be affected in any way. This means you should be able to activate versioning on a bucket without changing anything in your existing code. For example:
fetching an object without an accompanying version id in the request simply fetches the newest version of the object
objects in versioned buckets aren't really deleted unless you explicitly delete a particular version; however, you can still "delete an object," and get the expected response back. Subsequently fetching the "deleted" object without specifying an accompanying version id still returns a 404 Not Found, as in the non-versioned environment, with the addition of an unobtrusive x-amz-delete-marker: header included in the response to indicate that the "latest version" of the object is in fact a delete marker placeholder. The individual versions of the "deleted" object remain accessible to version-aware code, unless purged.
other operations that are unrelated to versioning, which work on non-versioned buckets, continue to work the same way they did before versioning was enabled on the bucket.
But, again... with code that is version-aware, including the AWS console (two new buttons appear when you're looking at a versioned bucket -- you can choose to view it with a versioning-aware console view or versioning-unaware console view) you can iterate through the different versions of an object and fetch any version that has not been permanently removed... but preventing unauthorized removal of objects is the point of MFA delete.
Additionally, of course, there's bucket logging, which is typically only delayed by a few minutes from real-time and could be used to detect unusual activity... the history of which would be preserved by the bucket versioning.

Related

What HTTP method should I use for an endpoint that updates a status field of multiple entities

I like to use the correct HTTP methods when I'm creating an API. And usually it's very straightforward. POST for creating entities, PUT for updating them, GET for retrieving etc.
But I have a use-case here where I will create an endpoint that updates the status of multiple objects given 1 identifier.
e.g.:
/api/v1/entity/update-status
But note that I mentioned multiple objects. The initial thought of my team would be to use map it as POST, but it won't actually be creating anything, plus if you were to call the same endpoint multiple times with the same identifier, nothing would change after the first time. Making it idempotent.
With this in mind, my idea was to create it as a PUT or even PATCH endpoint.
What do you smart people think?
I imagine PATCH would be the most correct way. Although if you use a PUT it would also not be incorrect.
The difference between the PUT and PATCH requests is reflected in the
way the server processes the enclosed entity to modify the resource
identified by the Request-URI. In a PUT request, the enclosed entity
is considered to be a modified version of the resource stored on the
origin server, and the client is requesting that the stored version be
replaced. With PATCH, however, the enclosed entity contains a set of
instructions describing how a resource currently residing on the
origin server should be modified to produce a new version. The PATCH
method affects the resource identified by the Request-URI, and it also
MAY have side effects on other resources; i.e., new resources may be
created, or existing ones modified, by the application of a PATCH.
Whilst it is a convention in REST APIs that POST is used to create a resource it doesn't necessarily have to be constrained to this purpose.
Referring back to the definition of POST in RFC 7231:
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics. For example, POST is used for the following functions (among others):
Providing a block of data, such as the fields entered into an HTMl form, to a data-handling process
Posting a message to a bulletin board, newsgroup, mailing list, blog, or similar group of articles;
*Creating a new resource that has yet to be identified by the origin server; and *
Appending data to a resource's existing representation(s).
Clearly creation is only one of those purposes and updating existing resources is also legitimate.
The PUT operation is not appropriate for your intended operation because again, per RFC, a PUT is supposed to replace the content of the target resource (URL). The same also applies to PATCH but, since it is intended for partial updates of the target resource you can target it to the URL of the collection.
So I think your viable options are:
POST /api/v1/entity/update-status
PATCH /api/v1/entity
Personally, I would choose to go with the PATCH as I find it semantically more pleasing but the POST is not wrong. Using PATCH doesn't gain you anything in terms of communicating an idempotency guarantee to a consumer. Per RFC 5789: "PATCH is neither safe nor idempotent" which is the same as POST.

Notifications when an S3 object is overwritten with versioning ON

Is there a way to configure an S3 bucket to enable notifications/alerts whenever another version of an object is created? I am using versioning to prevent overriding an existing object but I would like also to be able to detect such events.
With versioning, an overwrite will not actually delete the existing object but it well add another object to the same key but with a different version. I would like to configure the bucket to receive a notification only when an object is created on top of another object (again, since we have versioning).

s3 file downloads are unpredictable

We have uploaded several zip files to s3. All are in the hundreds of MB range.
We download the files, typically via a script, it appears that the file size and type both change. The new file size typically is about 300 bytes and the file type once downloaded is xml.
The content of the files look similar to this (whitespace added for clarity):
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>gpdb-5.0.0.0/greenplum-db-5.0.0.0-rhel5-x86_64.zip</Key>
<RequestId>83D2047BDBA195A6</RequestId>
<HostId>tXKFaiRaNjD26j6fcrTjCk858PGBH2RAjLE1aO4+8hovD6mf+hUzJvCdWKKgrDJGaHXsjWbQP2A=</HostId>
</Error>
Any thoughts as to what might be causing this? It does not happen all of the time. It's somewhat intermittent.
As you will note in the S3 API Reference, this isn't a file -- it's an error message.
Files in S3 are called Objects, and the path + filename of the object is referred to as the object key.
A key is the unique identifier for an object within a bucket. Every object in a bucket has exactly one key. [...] Every object in Amazon S3 can be uniquely addressed through the combination of the web service endpoint, bucket name, key, and optionally, a version. For example, in the URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, "doc" is the name of the bucket and "2006-03-01/AmazonS3.wsdl" is the key.
http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#BasicsKeys
This error message, which should have been accompanied by a 404 Not Found error code when you tried to access the object, indicates that there is no object in the bucket whose key (path + filename) is the one shown in the error -- the one you requested. You should be able to confirm its absence in the S3 console.
If the object should have been uploaded some time in the past, this error means the object wasn't actually uploaded, or has subsequently been deleted.
If the object was very recently uploaded (typically within seconds), you should not get this error, but it is possible for this error to occur under either of two additional conditions:
If you try to check whether an object exists by sending a GET or HEAD request to the bucket, then uploaded the object. If you do this, a short period of time may elapse before the object is accessible because of internal optimizations inside S3. When you try to fetch a non-existent object, S3 may -- for a brief time -- have an internal concept that the object is not there, even though it has safely been stored. Retry your request.
If you already had an object with the same key, then you deleted it, then you uploaded a new object with the same key, for a brief time after the new upload, you could either get the error above or you could actually download the old object again.
These conditions are somewhat uncommon, but they can occur, particularly if your bucket has a lot of traffic, due to S3's consistency model, which is an engineered tradeoff between performance, reliability, and immediate availability of uploaded objects when the same object has been recently downloaded, attempted to be downloaded, deleted, or overwritten.
The <RequestId> and <HostId> codes in the error response are opaque diagnostic codes that you can provide to AWS support, if you need to submit a support request about a specific problem you are experiencing with S3... they can use these to find the specific request and identify the problem. They are not considered sensitive information, since they have no meaning outside of AWS.
In this case, there is no apparent need to contact AWS support, because it appears that you are simply trying to download an object that is not in the specific bucket from which you tried to download this particular file. If you get alternating success and failure for the exact same file, that's unexpected, and a support case might be in order... but typically, an internal error in S3 should result in a very different response.

S3 Copy Object with new metadata

I am trying to set the Cache-Control Header on all our existing files in the s3 storage by executing a copy to the exact same key but with new metadata. This is supported by the s3 api through the x-amz-metadata-directive: REPLACE Header. In the documentation to the s3 api compatability at https://docs.developer.swisscom.com/service-offerings/dynamic.html#s3-api the Object Copy method is neither listed as supported or unsupported.
The copy itself works fine (to another key), but the option to set new metadata does not seem to work with either copying to the same or a different key. Is this not supported by the ATMOS s3-compatible API and/or is there any other way to update the metadata without having to read all the content and write it back to the storage?
I am currently using the Amazon Java SDK (v. 1.10.75.1) to make the calls.
UPDATE:
After some more testing it seems that the issue I am having is more specific. The copy works and I can change other metadata like Content-Disposition or Content-Type successfully. Just the Cache-Control is ignored.
As requested here is the code I am using to make the call:
BasicAWSCredentials awsCreds = new BasicAWSCredentials(accessKey, sharedsecret);
AmazonS3 amazonS3 = new AmazonS3Client(awsCreds);
amazonS3.setEndpoint(endPoint);
ObjectMetadata metadata = amazonS3.getObjectMetadata(bucketName, storageKey).clone();
metadata.setCacheControl("private, max-age=31536000");
CopyObjectRequest copyObjectRequest = new CopyObjectRequest(bucketName, storageKey, bucketName, storageKey).withNewObjectMetadata(metadata);
amazonS3.copyObject(copyObjectRequest);
Maybe the Cache-Control header on the PUT (Copy) request to the API is dropped somewhere on the way?
According to the latest ATMOS Programmer's Guide, version 2.3.0, Table 11 and 12, there's nothing specified that COPY of objects are unsupported, or supported either.
I've been working with ATMOS for quite some time, and what I believe is that the S3 copy function is somehow internally translated to a sequence of commands using the ATMOS object versioning (page 76). So, they might translate the Amazon copy operation to "create a version", and then, "delete or truncate the old referenced object". Maybe I'm totally wrong (since I don't work for EMC :-)) and they handle that in a different way... but, that's how I see through reading the native ATMOS API's documentation.
What you could try to do:
Use the native ATMOS API (which is a bit painful, yes, I know), and then, create a version of the original object (page 76), update the metadata of such version (User Metadata, page 12), and then restore the version to the top-level object (page 131). After that, check if the metadata will be properly returned in the S3 API.
That's my 2 cents. If you decide to try such solution, post it here if that worked.

Amazon S3 ACL for read-only and write-once access

I'm developing a web application and I currently have the following ACL assigned to the AWS account it uses to access its data:
{
"Statement": [
{
"Sid": "xxxxxxxxx", // don't know if this is supposed to be confidential
"Action": [
"s3:*"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::cdn.crayze.com/*"
]
}
]
}
However I'd like to make this a bit more restrictive so that if our AWS credentials were ever compromised, an attacker could not destroy any data.
From the documentation, it looks like I want to allow just the following actions: s3:GetObject and s3:PutObject, but I specifically want the account to only be able to create objects that don't exist already - i.e. a PUT request on an existing object should be denied. Is this possible?
This is not possible in Amazon S3 like you probably envisioned it; however, you can work around this limitation by Using Versioning which is a means of keeping multiple variants of an object in the same bucket and has been developed with use cases like this in mind:
You might enable versioning to prevent objects from being deleted or
overwritten by mistake, or to archive objects so that you can retrieve
previous versions of them.
There are a couple of related FAQs as well, for example:
What is Versioning? - Versioning allows you to preserve, retrieve, and restore every version of every object stored in an Amazon S3 bucket. Once you enable Versioning for a bucket, Amazon S3 preserves existing objects anytime you perform a PUT, POST, COPY, or DELETE operation on them. By default, GET requests will retrieve the most recently written version. Older versions of an overwritten or deleted object can be retrieved by specifying a version in the request.
Why should I use Versioning? - Amazon S3 provides customers with a highly durable storage infrastructure. Versioning offers an additional level of protection by providing a means of recovery when customers accidentally overwrite or delete objects. This allows you to easily recover from unintended user actions and application failures. You can also use Versioning for data retention and archiving. [emphasis mine]
How does Versioning protect me from accidental deletion of my objects? - When a user performs a DELETE operation on an object, subsequent default requests will no longer retrieve the object. However, all versions of that object will continue to be preserved in your Amazon S3 bucket and can be retrieved or restored. Only the owner of an Amazon S3 bucket can permanently delete a version. [emphasis mine]
If you are really paramount about the AWS credentials of the bucket owner (who can be different than the accessing users of course), you can take that one step further even, see How can I ensure maximum protection of my preserved versions?:
Versioning’s MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of
security. [...] If you enable Versioning with MFA Delete
on your Amazon S3 bucket, two forms of authentication are required to
permanently delete a version of an object: your AWS account
credentials and a valid six-digit code and serial number from an
authentication device in your physical possession. [...]
If this is accidental overwrite you are trying to avoid, and your business requirements allow a short time window of inconsistency, you can do the rollback in the Lambda function:
Make it a policy that "no new objects with the same name". Most of the time it will not happen. To enforce it:
Listen for S3:PutObject events in an AWS Lambda function.
When the event is fired, check whether more than one version is present.
If there is more than one version present, delete all but the newest one.
Notify the uploader what happened (it's useful to have the original uploader in x-amz-meta-* of the object. More info here).
You can now lock versions of objects with S3 Object Lock. It's a per-bucket setting, and allows you to place one of two kinds of WORM locks.
"retention period" - can't be changed
"legal hold" - can be changed by the bucket owner at any time
https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock.html
As mentioned by #Kijana Woodard below, this does not prevent creation of new versions of objects.
Edit: Applicable if you came here from this question.
Object Locks only work in versioned buckets. If you can not enable versioning for your bucket, but can tolerate brief inconsistencies where files are presumed to exist while DELETE-ing them is still in-flight (S3 is only eventually-consistent) possibly resulting in PUT-after-DELETE failing intermittently if used in a tight-loop, or conversely, successive PUTs falsely succeeding intermittently, then the following solution may be appropriate.
Given the object path, read the Object's Content-Length header (from metadata, HeadObject request). Write the object only if the request succeeds, and where applicable, if length is greater than zero.