Deleting logs file in Amazon s3 bucket according to created date - asp.net-mvc-4

How to delete the log files in Amazon s3 according to date.? I have log files in a logs folder folder inside my bucket.
string sdate = datetime.ToString("yyyy-MM-dd");
string key = "logs/" + sdate + "*" ;
AmazonS3 s3Client = AWSClientFactory.CreateAmazonS3Client();
DeleteObjectRequest delRequest = new DeleteObjectRequest()
.WithBucketName(S3_Bucket_Name)
.WithKey(key);
DeleteObjectResponse res = s3Client.DeleteObject(delRequest);
I tried this but doesn't seem to work. I can delete individual files if I put the whole name in the key. But I want to delete all the log files created for a particular date.

You can use S3's Object Lifecycle feature, specifically Object Expiration, to delete all objects under a given prefix and over a given age. It's not instantaneous, but it beats have to make myriad individual requests. To delete everything, just make the age small.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html

Related

How to push data from AWS IoT MQTT broker to a random file in S3 bucket

I have created a rule to forward all messages published to any topic e.g. foo/bar of my AWS IoT core managed MQTT broker to a nested folder in S3 bucket. For that, I am using key section. I can send data to nested folder like a/b/c. The problem is - it takes c as destination file and this file gets updated with new data as it arrives. Is there any configuration that I can do to put data in bucket in a new file (with any random name) as it arrives (similar to how it happens when we forward data from firehose to S3)
You can change your key to use the newuuid() function. e.g.
a/b/${newuuid()}
This will write the data to a file in the a/b folder with a filename that is a generated UUID.
The key in AWS IoT S3 Actions allow you to use the IoT SQL Reference Functions to form the folder and filename.
The documentation for the key states:
The path to the file where the data is written. For example, if the value of this argument is "${topic()}/${timestamp()}", the topic the message was sent to is "this/is/my/topic,", and the current timestamp is 1460685389, the data is written to a file called "1460685389" in the "this/is/my/topic" folder on Amazon S3.
If you don't want to use a timestamp then you could form the name of the file using other functions such as a random float (rand()), calculate a hash (md5()), a UUID (newuuid()) or the trace id of the message (traceid()).

S3 java SDK - set expiry to object

I am trying to upload a file to S3 and set an expire date for it using Java SDK.
This is the code i got:
Instant expiration = Instant.now().plus(3, ChronoUnit.DAYS);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setExpirationTime(Date.from(expiration));
metadata.setHeader("Expires", Date.from(expiration));
s3Client.putObject(bucketName, keyName, new FileInputStream(file), metadata);
The object has no expire data on it in the S3 console.
What can I do?
Regards,
Ido
These are two unrelated things. The expiration time shown in the console is x-amz-expiration, which is populated by the system, by lifecycle policies. It is read-only.
x-amz-expiration
Amazon S3 will return this header if an Expiration action is configured for the object as part of the bucket's lifecycle configuration. The header value includes an "expiry-date" component and a URL-encoded "rule-id" component.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectHEAD.html
Expires is a header which, when set on an object, is returned in the response when the object is downloaded.
Expires
The date and time at which the object is no longer able to be cached. For more information, go to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
It isn't possible to tell S3 when to expire (delete) a specific object -- this is only done as part of bucket lifecycle policies, as described in the User Guide under Object Lifecycle Management.
Following the documentation the method setExpirationTime() using for internal needs and do not define expiration time for the uploaded object
public void setExpirationTime(Date expirationTime)
For internal use only. This will not set the object's expiration
time, and is only used to set the value in the object after receiving
the value in a response from S3.
So you can’t directly set expiration date for particular object. To solve this problem you can:
Define lifecycle rule for a bucket(remove bucket with objects after number of days)
Define lifecycle rule for bucket level to remove objects with specific tag or prefix after numbers of days
To define those rules use documentation:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html

boto - more concise way to get key's value from bucket?

I'm trying to figure out a concise way to get data from s3 via boto
my current code looks like this. s3 manager is simply a class that does all the s3 setup for my app.
log.debug("generating downloader")
downloader = s3_manager()
log.debug("accessing bucket")
bucket_archive = downloader.s3_buckets['#archive']
log.debug("getting key")
key = bucket_archive.get_key(archive_filename)
log.debug("getting key into string")
source = key.get_contents_as_string()
the problem is that , looking at my debug logs, i'm making two requests to amazon s3:
key = bucket_archive.get_key(archive_filename)
source = key.get_contents_as_string()
looking at the docs [ http://boto.readthedocs.org/en/latest/ref/s3.html ] , it seems that the call to get_key checks to see if it exists , while the second call gets the actual data. does anyone know of a method to do both at once ? a more concise way of doing this with one request is preferable for our app.
The get_key() method performs a HEAD request on the object to verify that it exists. If you are certain that the bucket and key exist and would prefer not to have the overhead of a HEAD request, you can simply create a Key object directly. Something like this would work:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket', validate=False)
key = bucket.new_key('myexistingkey')
contents = key.get_contents_as_string()
The validate=False on the call to get_bucket eliminates a GET request that also is intended to validate that the bucket exists.

Is there a fast way of accessing line in AWS S3 file?

I have a collection of JSON messages in a file stored on S3 (one message per line). Each message has a unique key as part of the message. I also have a simple DynamoDB table where this key is used as the primary key. The table contains the name of the S3 file where the corresponding JSON message is located.
My goal is to extract a JSON message from the file given the key. Of course, the worst case scenario is when the message is the very last line in the file.
What is the fastest way of extracting the message from the file using the boto library? In particular, is it possible to somehow read the file line by line directly? Of course, I can read the entire contents to a local file using boto.s3.key.get_file() then open the file and read it line by line and check for the id to match. But is there a more efficient way?
Thanks much!
S3 cannot do this. That said, you have some other options:
Store the record's length and position (byte offset) instead of the line number in DynamoDB. This would allow you to retrieve just that record using the Range: header.
Use caching layer to store { S3 object key, line number } => { position, length } tuples. When you want to look up a record by { S3 object key, line number }, reference the cache. If you don't already have this data, you have to fetch the whole file like you do now -- but having fetched the file, you can calculate offsets for every line within it, and save yourself work down the line.
Store the JSON record in DynamoDB directly. This may or may not be practical, given the 64 KB item limit.
Store each JSON record in S3 separately. You could then eliminate the DynamoDB key lookup, and go straight to S3 for a given record.
Which is most appropriate for you depends on your application architecture, the way in which this data is accessed, concurrency issues (probably not significant given your current solution), and your sensitivities for latency and cost.
you can use the built-in readline with streams:
const readline = require('readline');
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const params = {Bucket: 'yourbucket', Key: 'somefile.txt'};
const readStream = s3.getObject(params).createReadStream();
const lineReader = readline.createInterface({
input: readStream,
});
lineReader.on('line', (line) => console.log(line));
You can use S3 SELECT to accomplish this. Also works on parquet files.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-select.html

Need to set multiple canned ACLs on S3 item

We have a service where customers of ours give us access to their S3 buckets and we push items into those S3 buckets. We need to be able to do 2 things:
Set the permissions on the item to be publicly readable
Set the owner of the bucket to have full permissions to the item
Here is what I already know:
I cannot have 2 canned-ACLs with the PUT
Problem:
I "could" set ACL headers, but AFAIK there is no way to set the "owner-has-full-permissions" via header without knowing information about the owner (Like cannonical_id or email), correct? Is there a "uri" version of "owner-has-full-permissions" like there is for "public-read" (e.g. "http://acs.amazonaws.com/groups/global/AllUsers")?
I don't want to have to make 2 separate calls (one to get the buckets owner info) and one to put the item with both permissions.
I had the same problem, the following code can get the permissions you require.
AccessControlList accessControlList = new AccessControlList();
accessControlList.grantPermission(GroupGrantee.AllUsers, Permission.Read);
accessControlList.grantPermission(new CanonicalGrantee(s3Client.getS3AccountOwner()
.getId()), Permission.FullControl);
putReq.setAccessControlList(accessControlList);