amazon S3 bucket level stats - amazon-s3

I'd like to know if there's a way for me to have bucket-level stats in amazon s3.
Basically i want to charge customers for storage and GET requests on my system (which is hosted on s3).
So i created a specific bucket for each client, but i can't seem to get the stats just for a specific bucket.
I see the API lets me
GET Bucket
or
GET Bucket requestPayment
But i just can't find how to get the number of requests issued to said bucket and the total size of the bucket.
Thanks for help !
Regards

I don't think that what you are trying to achieve is possible using Amazon API. The GET Bucket request does not contain usage statistics (requests, etc) other than the timestamp of the latest modification (LastModified).
My suggestion would be that you enable logging in your buckets and perform the analysis that you want from there.
S3 starting page gives you an overview on it:
Amazon S3 also supports logging of requests made against your Amazon S3 resources. You can configure your Amazon S3 bucket to create access log records for the requests made against it. These server access logs capture all requests made against a bucket or the objects in it and can be used for auditing purposes.
And I am sure there is plenty of documentation on that matter.
HTH.

Related

How to stop AWS ElasticBeanstalk from creating an S3 Bucket or inserting into it?

It created an S3 bucket. If I delete it, it just creates a new one. How can I set it to not create a bucket or to stop write permissions from it?
You cannot prevent AWS Elastic Beanstalk from creating S3 Bucket as it stores your application and settings as a bundle in that bucket and executes deployments. That bucket is required till the time you run/deploy your application using AWS EB. Please be vary of deleting these buckets as this may cause your deployments/applications to crash. Although, you may remove older objects (which may not be in use).
Take a look at this link for a detailed information on how EB uses S3 buckets for deployments https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/AWSHowTo.S3.html

Presigned S3 URLs with Cloudfront

I want to append my pre-signed URL to a CloudFront URL to use instead
any idea how to achieve this?
Use an Amazon CloudFront Signed URL instead of attempting to use an Amazon S3 pre-signed URL with CloudFront.
See: Using Signed URLs - Amazon CloudFront
I find the question relevant, it matches my needs. I have files stored in S3 Singapore and external consumers in Europe. AWS default bandwidth quality is quite poor (takes several minutes to download a 50 MB file for quite a few of my end users), so I'd like to optimize their network path through a layer of "dumb" CDN (not leveraging any caching, just using it for more qualitative network paths).
Turns out "Amazon S3 Transfer Acceleration" does exactly that:
https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html
============
Why Use Amazon S3 Transfer Acceleration?
You might want to use Transfer Acceleration on a bucket for various reasons, including the following:
You have customers that upload to a centralized bucket from all over the world.
You transfer gigabytes to terabytes of data on a regular basis across
continents.
You are unable to utilize all of your available bandwidth over the Internet when uploading to Amazon S3.
Getting Started with Amazon S3 Transfer Acceleration
To get started using Amazon S3 Transfer Acceleration, perform the following steps:
Enable Transfer Acceleration on a bucket
Transfer data to and from the acceleration-enabled bucket by using one of the following s3-accelerate endpoint domain names:
bucketname.s3-accelerate.amazonaws.com – to access an acceleration-enabled bucket.
============
Remarks:
It's more expensive than S3 + Cloudfront. You pay normal S3 bandwidth + something like 0.04 USD / GB for the acceleration (whereas when using Cloudfront, the S3 <> Cloudfront bandwidth is free)
You will probably need to re-sign the URLs. Usually the host is part of the signature, and acceleration requires using a different host. However, this is just normal S3 signing, not the completely different Cloudfront signing.

Best approach for setting up AEM S3 Data Store

We have an existing setup of AEM 6.1 which uses TarMK for data storage. To migrate the all assets to S3, I followed all steps here: https://docs.adobe.com/docs/en/aem/6-1/deploy/platform/data-store-config.html#Data%20Store%20Configurations (Amazon S3 Data Store). Apparently, the data synced to S3 but when I checked the disk usage report, I still see that assets are using disk space even for existing and newly added assets. What's the purpose of using S3 for assets if they still use the disk space? Or am I doing something wrong? How can I verify that my setup is really using S3? Here is my S3DataStore.config
accessKey="xxxxxxxxxx"
secretKey="xxxxxxxxxx"
s3Bucket="dev-aem-assets-local"
s3Region="eu-west-1"
connectionTimeout="120000"
socketTimeout="120000"
maxConnections="40"
writeThreads="30"
maxErrorRetry="10"
continueOnAsyncUploadFailure=B"true"
cacheSize="0"
minRecordLength="10"
Another question is: Do I need to do the same setup on publisher? Or is it ok just to do it on author and use publisher as is by replicating the binary data?
There are a few parts to your questiob so I'll break down the answer into logical blocks. Shout if I miss anything.
Your setup for migration is correct and S3 will use disk space. This is for the write-through cache.
AEM uses write-through cache for writing to S3 and all the settings for this cache are in your S3 config file. Any writes to data store are first written to this cache. Asynchronous background threads then uploaded to the S3 bucket. This mechanism makes AEM very responsive as it's not blocked by slow S3 writes. Also, data reads for recently written blobs are fast because they don't need slow reads from S3. In short, S3 IO traffic is too slow for AEM so this cache boosts the performance. You cannot disable it as it is required for asynchronous write to S3. You can reduce the size but it's recommended to be at least 50% of your S3 bucket size.
You can verify your S3 setup by looking at your logs for messages related to AWS (grep for aws).
As for publisher, yes you need to migrate from your old publisher to the new publisher. Assuming that you are not using binary-less replication, you will need a different S3 bucket for your publisher. In general, you migrate from author to author and publisher to publisher for a standard implementation.
You can also verify your S3 dat usage by looking at the S3 bucket and the traffic on it. If versioning is enabled on your S3 bucket all the blobs will show version stamping.
Async upload of blobs can be monitored from logs and IP traffic monitoring will show activities related to your S3 bucket. The most useful way is to see the network traffic between your AEM server and S3 end-point.

Serving files on S3 bucket through Cloudflare

I wanted to serve S3 bucket files through Cloudflare network, but encountered some issues. Integration instructions are given here, but they are suitable only for new buckets since bucket is required to be named subdomain.domain.com while my bucket is named domain.
Are there any other solutions to use CloudFlare with S3 without copying files from one bucket to another - like setting some redirects etc.? The problem is that my bucket contains more than 6 million files and that take 200 GB of storage.
Amazon S3 pricing rules are also hard to understand. I struggle to find information how much it costs to transfer information from one bucket to another if they are in the same location.
Thanks for answers.
Unfortunately Amazon S3 requires that the cname conforms to the bucket name as you already found out. So basically you'll have to fix the name.
Here https://serverfault.com/questions/349460/how-to-move-files-between-two-s3-buckets-with-minimum-cost you can find how to copy files between buckets with minimum cost. Within the same zone, and with the right tools, you will not incur bandwidth costs, only the duplicate storage costs for the duration and access costs, details in the linked answer.
Your link to the cloudflare docs doesnt seem to be working anymore, this is the correct link: https://support.cloudflare.com/hc/en-us/articles/200168926-How-do-I-use-CloudFlare-with-Amazon-s-S3-Service-

What's the best way to serve images across an EC2 cluster on AWS?

We want to be able to have a folder that can securely serve images across a cluster of web servers. What's the best way to handle this with Amazon Web Services (AWS)? Amazon S3? Amazon Elastic Block Store (EBS)? Amazon Cloudfront?
EDIT: Answer no longer needed...thanks.
I'm not sure what your main goal is or if you have read about the services you ask about. But I will try to explain it as far as I've understood AWS and your choices:
S3 is a STORAGE (with buckets and objects, a sort of folder structure with meta access)
EBS is a VOLUME (these are attached to an EC2 instance as extra drive you can access as a local harddrive)
CloudFront is a WEB-CACHE (you select which datacenter you want them in, and then you point at a S3 bucket and Amazon will replicate the content for you)
So we only need to figure out what you mean by "securely" as there are two options as I see it:
You can protect buckets in the S3 or make access levels with accounts, for "administrator access" only and PUBLIC READABLE...
You can store the data in a EBS volume and keep them there, then they are very secure and NOT public, but shareable (I believe) among the servers (I've planned to check out this myself within the next week)
You cannot protect "cloudfront" data as it's controlled by the Bucket permissions from S3...
Hope you can use this a little. I've not stated anything regarding SPEED nor COST, thats for you to benchmark/test with your data requirements. :o)