Not able to backup the log files during instance termination issued by Auto Scaling Policy - amazon-s3

I am having EC2 instances with auto scaling enabled on it.
Now as part of scale down policy when one of the instance is issued termination, the log files remaining on that instance need to be backed up on s3, but I am not finding any way to perform s3 logging of log files for that instance. I have tried putting the needed script in rc0.d directory through chkconfig with highest priority. I also tried to put my script in /lib/systemd/system/halt.service (or reboot.service or poweroff.service), but no luck till now.
I have found some threads related to this on stack overflow and AWS forum but no proper solution found till now.
Can any one please let me know the solution to this problem?

The only reliable way I have found of achieving this behaviour is to use rsyslog/syslog to transfer the log files to a central host as soon as they are written to the syslog subsystem.
This means you will need to run another instance that receives the log files and ships them to S3, or use an SQS-based system such as logstash.
Unfortunately there is no other way to ensure all of your log messages will be stored on S3 - you can not guarantee that your script will finish before autoscaling "pulls the plug".

Related

Mount S3 bucket as an NFS share on an EC2 instance

long time reader but I've usually been able to find the answers I've been looking for in existing posts - but this time I've not been able to.
I am essentially teaching myself AWS CDK from scratch, I've only really just started with it so not finding anything which helps me on my mission may be a result of not knowing enough yet to be asking the right questions... so please bare with me.
Thus far I've used the AWS CDK with Python to create a stack which creates an S3 bucket, and also fires up an EC2 instance with an AWS file storage gateway AMI loaded on it (so running Amazon Linux). This deploys and runs fine - however now I'd like to programmatically set up the S3 bucket to be accessed via an NFS share on the EC2 instance. From what I've seen I'd assumed it is or should be fairly trivial however I keep getting a bit lost in documentation and internet hunts and not quite sure I'm looking in the right places or asking search engines the right questions to unlock the path to achieve this.
It looks like I should be able to script something up to make it happen when the instance is start using user-data but I'm a bit lost. Is anyone able to throw me some crumbs to follow to find a good way of achieving this, or a better way of achieving what I want to happen (which is basically accessing the S3 bucket contents as though they are files on an EC2 instance) - if not tell me how to do it if it's trivial enough?
Much appreciated :)
Dan
You are on good track. user_data can be used for that.
I don't have full code to give you as its use case specific (e.g. which OS are you using?), but the user_data would have to download and install s3fs:
s3fs allows Linux and macOS to mount an S3 bucket via FUSE. s3fs preserves the native object format for files, allowing use of other tools like AWS CLI.
However, S3 is an object storage system, and it can't be really mounted on an instance like you would do with NFS or EBS storage solutions. But with s3fs-fuse you can mimic such a behavior. And for some use-cases it will be sufficient.
So what you can do, is to setup the user_data script through console, verify that it works, and then basically just copy and paste to CDK. Its more of a trial-and-see approach, but this is the best way to learn.

AWS elasticbeanstalk automating deletion of logs published to S3

I have enabled publishing of logs from AWS elasticbeanstalk to AWS S3 by following these instructions: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.loggingS3.title.html
This is working fine. My question is how do I automate the deletion of old logs from S3, say over one week old? Ideally I'd like a way to configure this within AWS but I can't find this option. I have considered using logrotate but was wondering if there is a better way. Any help is much appreciated.
I eventually discovered how to do this. You can create an S3 Lifecycle rule to delete particular files or all files in a folder more than N days old. Note: you can also archive instead of delete or archive for a while before deleting, among other things- it's a great feature.
Reference: http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectExpiration.html
and http://docs.aws.amazon.com/AmazonS3/latest/dev/manage-lifecycle-using-console.html

Good logging strategy for AppHarbor and Amazon S3

I'm hosting an application on AppHarbor that uses NLog for logging. I've been trying the Logentries add-on, which is a nice service to pipe all the application logging through to and then view via their web interface. That has now come to the end of its free trial and I'd like to look at doing my own logging before paying for that service.
Because I'm using AppHarbor, they recommend not writing to the file system because it's wiped on each deploy and, when in flow, I do multiple deployments per day. I'm using S3 for storing images anyway, so it seems natural to store logs there as well.
The problem I can see with that approach is that I would be firing log statements to a text file stored on S3, which I would need to append to. Once the site gets some traffic, there will be multiple, simultaneous calls to store log entries, which will probably end up locking the write mechanism. Is there a better way to do this that I'm not aware of? Maybe batching the log entries somehow before sending them across? I'm using Raven as my database so may look at writing logs directly into Raven if there's no better option.
It doesn't look like there are NLog targets for S3 or RavenDB, but there are a bunch or other options: http://nlog-project.org/wiki/Targets

AWS EC2- Synching source code files with S3 - is it a proper approach?

On an app server in which a few source files change frequently, Is the following approach recommended?
Use a cron job with S3tools to sync the source files with S3 private bucket (every 15 mins for example).
On server start up - Use user data script to sync with the sources bucket to retrieve the latest sources.
Advantages:
1. No need to attach EBS for app server just to save a few files
2. Similar setup to all app servers
3. Sources automatically backed up.
4. As a byproduct, distributes code to multiple app servers automatically.
Disadvantages:
keeping source code on S3
other?
What do you think about this methodology? Is this the right way to use EC2 when source code change frequently (a few times a day) please recommend the best approach to run EC2 instances where sources change often.
I think you're better off using a proper source code repository, like Subversion or Git, rather than storing the source files on S3. That way you can have a central location for the source files while avoiding the update consistency problems that kdgregory mentioned.
You can put the source repository on one of your own servers outside of EC2, or host it on an EC2 instance (make sure the repository files are on an EBS volume in the latter case).
If you're going to be running a large number of EC2 instances, then it will be less effort to have them sync themselves from a central location (ie, you sync to private bucket, app-servers sync from that bucket).
HOWEVER, recognize that updates to an S3 bucket are atomic only at the object level, and more importantly, are not guaranteed to be immediately consistent (although I recall seeing a recent note that the us-west endpoint does offer read-after-write consistency).
This means that your app-servers may load a set of new files that are internally inconsistent -- some will be old, some will be new. If this is a problem for you, then you should implement a scheme that uploads directly to the app-servers, and ensures changeset consistency (perhaps by uploading to a temporary directory that is then renamed).

Fastest / best way copy data between S3 to EC2?

I have a fairly large amount of data (~30G, split into ~100 files) I'd like to transfer between S3 and EC2: when I fire up the EC2 instances I'd like to copy the data from S3 to EC2 local disks as quickly as I can, and when I'm done processing I'd like to copy the results back to S3.
I'm looking for a tool that'll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I'm not looking for pointers to basic libraries; I'm looking for something fast and reliable.
Unfortunately, Adam's suggestion won't work as his understanding of EBS is wrong (although I wish he was right and often thought myself it should work that way)... as EBS has nothing to do with S3, but it will only give you an "external drive" for EC2 instances that are separate, but connectable to the instances. You still have to do copying between S3 and EC2, even though there are no data transfer costs between the two.
You didn't mention an operating system of your instance, so I cannot give tailored information. A popular command line tool I use is http://s3tools.org/s3cmd ... it is based on Python and therefore, according to info on its website it should work on Win as well as Linux, although I use it ALL the time on Linux. You could easily whip up a quick script that uses its built in "sync" command that works similar to rsync, and have it triggered every time you're done processing your data. You could also use the recursive put and get commands to get and put data only when needed.
There are graphical tools like Cloudberry Pro that have some command line options for Windows too that you can setup schedule commands. http://s3tools.org/s3cmd is probably the easiest.
By now, there is a sync command in the AWS Command line tools, that should do the trick: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
On startup:
aws s3 sync s3://mybucket /mylocalfolder
before shutdown:
aws s3 sync /mylocalfolder s3://mybucket
Of course, the details are always fun to work out eg. how can parallel it is (and can you make it more parallel and is that any faster goven the virtual nature of the whole setup)
Btw hope you're still working on this... or somebody is. ;)
I think you might be better off using an Elastic Block Store to store your files instead of S3. An EBS is akin to a 'drive' on S3 that can be mounted into your EC2 instance without having to copy the data each time, thereby allowing you to persist your data between EC2 instances without having to write to or read from S3 each time.
http://aws.amazon.com/ebs/
Install s3cmd Package as
yum install s3cmd
or
sudo apt-get install s3cmd
depending on your OS
then copy data with this
s3cmd get s3://tecadmin/file.txt
also ls can list the files.
for more detils see this
For me the best form is:
wget http://s3.amazonaws.com/my_bucket/my_folder/my_file.ext
from PuTTy