What's the recommended way to write Serilog logs to Amazon S3? - amazon-s3

I'm looking to use Serilog to write structured log data to an Amazon S3 bucket, then analyze using Databricks. I assumed there would be an S3 sink for Serilog but I found I was wrong. I think perhaps using the File sink along with something else might be the ticket, but I'm unsure what that might look like. I suppose I could mount the S3 bucket on my EC2 instance and write to it, but I'm told that's problematic. Could one of you fine folks point me in the right direction?

I would now recommend to use Serilog.Sinks.AmazonS3 which was created for exactly the scenario described.
Disclaimer: I'm the maintainer of the project :)

As of this writing, there are no Sinks that write to Amazon S3, so you'd have to write your own.
I'd start by taking a look at the Serilog.Sinks.AzureBlobStorage sink, as it probably can serve as a base for you to write a sink for Amazon S3.
Links to the source code for several other sinks are available in the wiki and can give you some more ideas too: https://github.com/serilog/serilog/wiki/Provided-Sinks

Related

What is the best approach to handle file upload in graphql?

I'm looking for a way to handle file upload in my backend powered by prisma (graphcool). However I am a beginner and it looks very intimidating and I don't know anything about how file upload works. What is the best aproach to do this ? Can I do it using prisma ? I have red about Amazon S3 buckets but it looks like a complicated approach to begin with.
There are many ways to accomplish this. I will lay out a few solutions and resources and you should have something to get you going.
Two common ways to handle this primarily differ in the method used to persist the uploaded files, upload directly to the server file system vs. upload to a cloud service, typically S3.
For most use cases, the second option, uploading to a cloud service will be superior due to ease of scalability, ease of backing up data in something like S3 and additional security features such as signed urls. One more neat thing about using S3 in particular is that you can take advantage of AWS Athena, described (by AWS) as
an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
๐Ÿ‘Note that the examples below all leverage the awesome work by Jayden Seric, who has done tons of work on uploads using GraphQL.๐Ÿ‘
Upload to FS, API and react example: Upload files to node server file system
middleware for apollo-upload-server: small abstraction on top of apollo-upload-server which simplifies server development. This example also shows how to integrate with S3.

AWS S3 ETL tool options

Trying to get a handle on what I would use to schedule and run jobs to move data into S3, run scripts on it and move it around s3 afterward.
My requirement is to be able to ingest from API's and also directly from databases. Some formats to ingest will be XML, and others could be flat files. The raw files need to be joined and transformed and turned into a format that graphs could be produced with.
What is AWS glue is like as an ETL tool? My specific question is can you see the finished pipelines showing the data sources and processing parts in a graphical view once they are created?
I have used Azure Data Factory - and it had a graphical UI to view and monitor the pipelines which I found quite useful. Just wondering if AWS glue has a similar thing.
If not - would Nifi on AWS S3 be a good way to do this?
Thanks
If you are looking for the best GUI, I would recommend NiFi. It is commonly used with S3 and has many connectors out of the box for other data sources. It becomes even more interesting if you want to do things outside of the AWS cloud.
That being said, I would think that Glue will also get the job done.
Running Data Factory when you have a heavy AWS footprint feels like an anti-pattern.
Full Disclosure: Have not worked with Glue/Data Factory and work for Cloudera, the driving force behind NiFi
I'm currently using AWS Glue to extract data from DB into s3, manipulate the data and save it back to Redshift/S3 or send via API to my client. AWS Glue GUI is not that good, you won't see a diagram of your flow and sometimes you will need to use other tools like step functions, airflow to orchestrate your job. Also, most of my jobs I have to use PySpark because AWS Glue methods are too limited.
Related to monitoring, you can see if there is an error, how many CPU and memory is been consumed by your job, s3 bytes read/written. If you want additional information you need to use logger or print to send it to the logs.

How to track Amazon AWS S3 bucket downloads using Google Analytics Measurement Protocol?

I'm using AWS S3 as my CDN to store files. Often these are directly linked from places all over the world. I'd like to track the file downloads in the S3 bucket using Google Analytics. It appears Google Analytics Measurement Protocol may be able to do this. But since I'm new to both the AWS environment and GAMP, I was hoping I'm not the first to ever do this. Anyone know of a way this can be accomplished?
I doubt this is possible without you doing extra work on top.
You could create a proxy site that, when hit, records an event to Google Analytics and then redirects to the download page/bucket.
You could also maybe have some script/job/etc scrape events from the AWS dashboards and write them to Google Analytics, although this would probably be less than real-time.
You can turn on logging for the buckets you care about, then download the little logfile fragments that Amazon delivers and feed them into an off-the-shelf analytics package such as Webalizer. If you're willing to spend the time and effort to build a pipeline and massage the data so that it fits.
I've written about how to do that here:
https://www.expatsoftware.com/articles/2007/11/roll-your-own-web-stats-for-amazon-s3.html
If you just want the reports today, there are a handful of 3rd party services built around doing this for you, so if you have ~$10/month to spend that's probably the best solution.
S3stat (https://www.s3stat.com/) is my suggestion. But then it should be since it's also my product.

Monitoring AWS account spends

I am planning to build a dashboard to monitor the AWS expenditure, after googling I realized AWS has no API so that developers can hook and build an app to get the real time data. Is there any way to achieve it. Kindly help me out
I believe, you are looking to monitor current AWS usage.
AWS provides optoins for same through "AWS programmatic billing access".
Once you enable it, AWS will upload csv file of your current usage every few hours to specified S3 bucket.
You need to write a program using your favourite programming language AWS S3 SDK to download and parse csv file and get real time data.
Newvem has a very good set of How to Guides available to work with AWS.
One of the guide
http://www.newvem.com/how-to-set-up-programmatic-billing-access-for-your-aws-account/
talks about enabling programmatic billing access.
Also refer, http://www.newvem.com/how-to-track-costs-of-amazon-s3-cloud-objects/ , this talks about how to track cost of Amazon S3.
2) As mentioned by Mike, AWS also provides a way where you can get billing alert using Cloudwatch.
I hope above helps.
I recommend to refer Newvem how to guides to get more insight into AWS and its offerings.
Thanks,
Taral Shah
If you're looking to monitor actual spending data, #Taral is correct. AWS Programmatic Billing Access is the best tool for recording the data. However you don't actually have to write a dashboard tool to view it, there are a lot of them out there already.
The company I work for, Cloudability, automatically imports all of your AWS Detailed Billing data and let's you build out all of the AWS spending and usage reports you need without ever having to write any code or mess with any spreadsheets.
If you want to learn more, there's a good blog post at http://blog.cloudability.com/introducing-cloudabilitys-aws-cost-analytics-powered-by-aws-detailed-billing-files/
For more information about Cloudwatch enabled monitroing refer
http://aws.amazon.com/about-aws/whats-new/2012/05/10/announcing-aws-billing-alerts/ for more
To learn AWS faster way, refer how to guides of Newvem at
http://www.newvem.com/amazon-cloud-knowledge-center/how-to-guides/
Regards
Taral
First thing is to enable detailed billing export to a S3 bucket (see here)
Then I wrote a simplistic server in Python (BSD licenced) that retrieves your detailed bill and breaks it down per service-type and usage type (see it on this GitHib repo).
Thus you can check anytime what your costs are and which services cost you the most etc.
If you tag your EC2 instances, S3 buckets etc, they will also show up on a dedicated line.
CloudWatch has an "estimated billing" API, which will get you most of the way there. See this ServerFault question for more detail: https://serverfault.com/questions/350971/how-can-i-monitor-daily-spending-on-aws
If you are looking for more detail you will need to download your CSV-formatted bill and parse it. But your question is too generic to provide any specifically useful answer. Even this will not be real time though.

Is there a way to have complete control of your EBS volume snapshots?

From this question I have learned that they are stored on s3 but we can't see them.
My questions is really asking if there is a way to stored them elsewhere?
EBS snapshots are backed by S3 but not in a bucket that your account has access to. They can't be accessed directly via S3.
What are you trying to accomplish? No there is not a way, that I know of, to store them somewhere else, and why would you want to? S3 is cheap and redundant, and easy to restore from.
S3 might have some performance blights, but only when you are trying to do something it is not intended for.