How to track Amazon AWS S3 bucket downloads using Google Analytics Measurement Protocol? - amazon-s3

I'm using AWS S3 as my CDN to store files. Often these are directly linked from places all over the world. I'd like to track the file downloads in the S3 bucket using Google Analytics. It appears Google Analytics Measurement Protocol may be able to do this. But since I'm new to both the AWS environment and GAMP, I was hoping I'm not the first to ever do this. Anyone know of a way this can be accomplished?

I doubt this is possible without you doing extra work on top.
You could create a proxy site that, when hit, records an event to Google Analytics and then redirects to the download page/bucket.
You could also maybe have some script/job/etc scrape events from the AWS dashboards and write them to Google Analytics, although this would probably be less than real-time.

You can turn on logging for the buckets you care about, then download the little logfile fragments that Amazon delivers and feed them into an off-the-shelf analytics package such as Webalizer. If you're willing to spend the time and effort to build a pipeline and massage the data so that it fits.
I've written about how to do that here:
https://www.expatsoftware.com/articles/2007/11/roll-your-own-web-stats-for-amazon-s3.html
If you just want the reports today, there are a handful of 3rd party services built around doing this for you, so if you have ~$10/month to spend that's probably the best solution.
S3stat (https://www.s3stat.com/) is my suggestion. But then it should be since it's also my product.

Related

What is the best approach to handle file upload in graphql?

I'm looking for a way to handle file upload in my backend powered by prisma (graphcool). However I am a beginner and it looks very intimidating and I don't know anything about how file upload works. What is the best aproach to do this ? Can I do it using prisma ? I have red about Amazon S3 buckets but it looks like a complicated approach to begin with.
There are many ways to accomplish this. I will lay out a few solutions and resources and you should have something to get you going.
Two common ways to handle this primarily differ in the method used to persist the uploaded files, upload directly to the server file system vs. upload to a cloud service, typically S3.
For most use cases, the second option, uploading to a cloud service will be superior due to ease of scalability, ease of backing up data in something like S3 and additional security features such as signed urls. One more neat thing about using S3 in particular is that you can take advantage of AWS Athena, described (by AWS) as
an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
๐Ÿ‘Note that the examples below all leverage the awesome work by Jayden Seric, who has done tons of work on uploads using GraphQL.๐Ÿ‘
Upload to FS, API and react example: Upload files to node server file system
middleware for apollo-upload-server: small abstraction on top of apollo-upload-server which simplifies server development. This example also shows how to integrate with S3.

AWS S3 ETL tool options

Trying to get a handle on what I would use to schedule and run jobs to move data into S3, run scripts on it and move it around s3 afterward.
My requirement is to be able to ingest from API's and also directly from databases. Some formats to ingest will be XML, and others could be flat files. The raw files need to be joined and transformed and turned into a format that graphs could be produced with.
What is AWS glue is like as an ETL tool? My specific question is can you see the finished pipelines showing the data sources and processing parts in a graphical view once they are created?
I have used Azure Data Factory - and it had a graphical UI to view and monitor the pipelines which I found quite useful. Just wondering if AWS glue has a similar thing.
If not - would Nifi on AWS S3 be a good way to do this?
Thanks
If you are looking for the best GUI, I would recommend NiFi. It is commonly used with S3 and has many connectors out of the box for other data sources. It becomes even more interesting if you want to do things outside of the AWS cloud.
That being said, I would think that Glue will also get the job done.
Running Data Factory when you have a heavy AWS footprint feels like an anti-pattern.
Full Disclosure: Have not worked with Glue/Data Factory and work for Cloudera, the driving force behind NiFi
I'm currently using AWS Glue to extract data from DB into s3, manipulate the data and save it back to Redshift/S3 or send via API to my client. AWS Glue GUI is not that good, you won't see a diagram of your flow and sometimes you will need to use other tools like step functions, airflow to orchestrate your job. Also, most of my jobs I have to use PySpark because AWS Glue methods are too limited.
Related to monitoring, you can see if there is an error, how many CPU and memory is been consumed by your job, s3 bytes read/written. If you want additional information you need to use logger or print to send it to the logs.

Putting an aged sites files on Amazon AWS (S3)

For the last 6 or years, I've been running an online video hosting site.
Back then, hosting your content on services like S3 wasn't the biggie that it is today, so everything is currently stored on our very expensive dedicated servers. The overages on data are much more expensive.
My question is this: What is the process of moving TB's of data on to a service like Amazon Aws where the site is all coded to link to files on our current server.
Editing hundreds-of-thousands of video links to point to the new Amazon AWS locations surely wouldn't be the ideal?
In this situation, is there a more "easy" approach to this dilemma.
All files are in a singular folder structure. IE; example.com/files/video.mp4
Would it possibly be more of a, leave the current files were they are and just have all future videos put on AWS?
I feel like I'm missing a piece of how this works as if for example IMGUR (who has trillions of images) wanted to move from AWS to another similar storage, it would be impossible to re-link trillions of links.
Any help is appreciated.

Monitoring AWS account spends

I am planning to build a dashboard to monitor the AWS expenditure, after googling I realized AWS has no API so that developers can hook and build an app to get the real time data. Is there any way to achieve it. Kindly help me out
I believe, you are looking to monitor current AWS usage.
AWS provides optoins for same through "AWS programmatic billing access".
Once you enable it, AWS will upload csv file of your current usage every few hours to specified S3 bucket.
You need to write a program using your favourite programming language AWS S3 SDK to download and parse csv file and get real time data.
Newvem has a very good set of How to Guides available to work with AWS.
One of the guide
http://www.newvem.com/how-to-set-up-programmatic-billing-access-for-your-aws-account/
talks about enabling programmatic billing access.
Also refer, http://www.newvem.com/how-to-track-costs-of-amazon-s3-cloud-objects/ , this talks about how to track cost of Amazon S3.
2) As mentioned by Mike, AWS also provides a way where you can get billing alert using Cloudwatch.
I hope above helps.
I recommend to refer Newvem how to guides to get more insight into AWS and its offerings.
Thanks,
Taral Shah
If you're looking to monitor actual spending data, #Taral is correct. AWS Programmatic Billing Access is the best tool for recording the data. However you don't actually have to write a dashboard tool to view it, there are a lot of them out there already.
The company I work for, Cloudability, automatically imports all of your AWS Detailed Billing data and let's you build out all of the AWS spending and usage reports you need without ever having to write any code or mess with any spreadsheets.
If you want to learn more, there's a good blog post at http://blog.cloudability.com/introducing-cloudabilitys-aws-cost-analytics-powered-by-aws-detailed-billing-files/
For more information about Cloudwatch enabled monitroing refer
http://aws.amazon.com/about-aws/whats-new/2012/05/10/announcing-aws-billing-alerts/ for more
To learn AWS faster way, refer how to guides of Newvem at
http://www.newvem.com/amazon-cloud-knowledge-center/how-to-guides/
Regards
Taral
First thing is to enable detailed billing export to a S3 bucket (see here)
Then I wrote a simplistic server in Python (BSD licenced) that retrieves your detailed bill and breaks it down per service-type and usage type (see it on this GitHib repo).
Thus you can check anytime what your costs are and which services cost you the most etc.
If you tag your EC2 instances, S3 buckets etc, they will also show up on a dedicated line.
CloudWatch has an "estimated billing" API, which will get you most of the way there. See this ServerFault question for more detail: https://serverfault.com/questions/350971/how-can-i-monitor-daily-spending-on-aws
If you are looking for more detail you will need to download your CSV-formatted bill and parse it. But your question is too generic to provide any specifically useful answer. Even this will not be real time though.

High load image uploader/resizer in conjunction with Amazon S3

we are running a product oriented service, that requires us to daily download and resize thousands and thousands of photos from various web sources and then upload them to Amazon's S3 bucket and use Cloud Front to serve them...
now the problem is that downloading and resizing is really resource consuming and it would take a lot of hours to process them all...
What we are looking for is a service, that would do this for us fast and of course for a reasonable price...
anybody knows such a service? I tried to google it but don't really know how to form the search to get what I need