Presigned S3 URLs with Cloudfront - amazon-s3

I want to append my pre-signed URL to a CloudFront URL to use instead
any idea how to achieve this?

Use an Amazon CloudFront Signed URL instead of attempting to use an Amazon S3 pre-signed URL with CloudFront.
See: Using Signed URLs - Amazon CloudFront

I find the question relevant, it matches my needs. I have files stored in S3 Singapore and external consumers in Europe. AWS default bandwidth quality is quite poor (takes several minutes to download a 50 MB file for quite a few of my end users), so I'd like to optimize their network path through a layer of "dumb" CDN (not leveraging any caching, just using it for more qualitative network paths).
Turns out "Amazon S3 Transfer Acceleration" does exactly that:
https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html
============
Why Use Amazon S3 Transfer Acceleration?
You might want to use Transfer Acceleration on a bucket for various reasons, including the following:
You have customers that upload to a centralized bucket from all over the world.
You transfer gigabytes to terabytes of data on a regular basis across
continents.
You are unable to utilize all of your available bandwidth over the Internet when uploading to Amazon S3.
Getting Started with Amazon S3 Transfer Acceleration
To get started using Amazon S3 Transfer Acceleration, perform the following steps:
Enable Transfer Acceleration on a bucket
Transfer data to and from the acceleration-enabled bucket by using one of the following s3-accelerate endpoint domain names:
bucketname.s3-accelerate.amazonaws.com – to access an acceleration-enabled bucket.
============
Remarks:
It's more expensive than S3 + Cloudfront. You pay normal S3 bandwidth + something like 0.04 USD / GB for the acceleration (whereas when using Cloudfront, the S3 <> Cloudfront bandwidth is free)
You will probably need to re-sign the URLs. Usually the host is part of the signature, and acceleration requires using a different host. However, this is just normal S3 signing, not the completely different Cloudfront signing.

Related

AWS Next.js on EC2 caching

I'm planning to use Next.js SSR/SSG/ISR on Amazon's EC2 and store images on S3 Bucket. Also to add CloudFront CDN on top of it.
The question is:
Should I cache images from S3 in Next.js (which is in EC2) thus "doubling" images (origin in S3, optimised instances in EC2 Next.js cache), or it makes no sense, since everything is located within one cloud (AWS) and covered with CDN layer (CloudFront)?
Or there is a way to move next.js caching to CloudFront?
I do understand that next/image is providing image optimisation (different sizes and quality), but I'm bothered by "doubling" the images, thus paying more for storage.
P.S. I've seen this question, I'm just not experienced with lambda, so currently looking for something I understand already.
Cloudfront gives you the option to have different origin for different behaviours and you can also apply different cache policy per behaviour. What you can do is have a behaviour for /images which will go to S3 and Default behaviour will point to Ec2 origin.

Best approach for setting up AEM S3 Data Store

We have an existing setup of AEM 6.1 which uses TarMK for data storage. To migrate the all assets to S3, I followed all steps here: https://docs.adobe.com/docs/en/aem/6-1/deploy/platform/data-store-config.html#Data%20Store%20Configurations (Amazon S3 Data Store). Apparently, the data synced to S3 but when I checked the disk usage report, I still see that assets are using disk space even for existing and newly added assets. What's the purpose of using S3 for assets if they still use the disk space? Or am I doing something wrong? How can I verify that my setup is really using S3? Here is my S3DataStore.config
accessKey="xxxxxxxxxx"
secretKey="xxxxxxxxxx"
s3Bucket="dev-aem-assets-local"
s3Region="eu-west-1"
connectionTimeout="120000"
socketTimeout="120000"
maxConnections="40"
writeThreads="30"
maxErrorRetry="10"
continueOnAsyncUploadFailure=B"true"
cacheSize="0"
minRecordLength="10"
Another question is: Do I need to do the same setup on publisher? Or is it ok just to do it on author and use publisher as is by replicating the binary data?
There are a few parts to your questiob so I'll break down the answer into logical blocks. Shout if I miss anything.
Your setup for migration is correct and S3 will use disk space. This is for the write-through cache.
AEM uses write-through cache for writing to S3 and all the settings for this cache are in your S3 config file. Any writes to data store are first written to this cache. Asynchronous background threads then uploaded to the S3 bucket. This mechanism makes AEM very responsive as it's not blocked by slow S3 writes. Also, data reads for recently written blobs are fast because they don't need slow reads from S3. In short, S3 IO traffic is too slow for AEM so this cache boosts the performance. You cannot disable it as it is required for asynchronous write to S3. You can reduce the size but it's recommended to be at least 50% of your S3 bucket size.
You can verify your S3 setup by looking at your logs for messages related to AWS (grep for aws).
As for publisher, yes you need to migrate from your old publisher to the new publisher. Assuming that you are not using binary-less replication, you will need a different S3 bucket for your publisher. In general, you migrate from author to author and publisher to publisher for a standard implementation.
You can also verify your S3 dat usage by looking at the S3 bucket and the traffic on it. If versioning is enabled on your S3 bucket all the blobs will show version stamping.
Async upload of blobs can be monitored from logs and IP traffic monitoring will show activities related to your S3 bucket. The most useful way is to see the network traffic between your AEM server and S3 end-point.

Serving files on S3 bucket through Cloudflare

I wanted to serve S3 bucket files through Cloudflare network, but encountered some issues. Integration instructions are given here, but they are suitable only for new buckets since bucket is required to be named subdomain.domain.com while my bucket is named domain.
Are there any other solutions to use CloudFlare with S3 without copying files from one bucket to another - like setting some redirects etc.? The problem is that my bucket contains more than 6 million files and that take 200 GB of storage.
Amazon S3 pricing rules are also hard to understand. I struggle to find information how much it costs to transfer information from one bucket to another if they are in the same location.
Thanks for answers.
Unfortunately Amazon S3 requires that the cname conforms to the bucket name as you already found out. So basically you'll have to fix the name.
Here https://serverfault.com/questions/349460/how-to-move-files-between-two-s3-buckets-with-minimum-cost you can find how to copy files between buckets with minimum cost. Within the same zone, and with the right tools, you will not incur bandwidth costs, only the duplicate storage costs for the duration and access costs, details in the linked answer.
Your link to the cloudflare docs doesnt seem to be working anymore, this is the correct link: https://support.cloudflare.com/hc/en-us/articles/200168926-How-do-I-use-CloudFlare-with-Amazon-s-S3-Service-

Why do I need Amazon S3 and Cloudfront?

I've read a lot of articles stating that I should be using Amazon S3 in conjunction with the CDN Cloudfront. I'm currently not doing this. I'm simply using Cloudfront with my standard shared hosting package.
Is it OK to use Cloudfront on its own with my standard shared hosting package? Surely there is no added benefit to using S3 also as the files are already located within Cloudfront.
Any enlightenment on this is much appreciated.
Leigh
S3 allows you to do things like static webhosting, with logging and redirection. I.E www.example.com redirects to example.com. You can then use Cloudfront to place your assets as close to the end user as possible ("nearest edge location"). An excellent guide on how to do this is in the AWS docs. Two main things are that S3 supports https, and changes to files in S3 are reflected instantly. Because Cloudfront is a CDN, you have to manually expire files if you change them, otherwise is could take up to 24 hours to reflect your changes.
http://docs.aws.amazon.com/gettingstarted/latest/swh/website-hosting-intro.html
A quick comparison between the two is given here:
http://www.bucketexplorer.com/documentation/cloudfront--amazon-s3-vs-amazon-cloudfront.html
There is no problem of using CloudFront against your own origin server comparing to a S3 server.
There are some benefits of using S3:
Data transfer is faster between S3 and CloudFront
Don't need to worry about the stability and maintenance of origin S3 server
Multiple origin regions
There are also benefits if you use your own server:
Cost saving of S3 hosting (this depends on whether you need to pay for your own server)
Easy for customization should you need it
Data storage location for company/country regulation
So it's all depending on your specific circumstances, such as how much you pay for your hosting package, do you need low-level configuration of your origin server, and how sensitivity your data is.
I would say for majority of the small/medium projects, S3 is a perfect place to store data.

amazon S3 bucket level stats

I'd like to know if there's a way for me to have bucket-level stats in amazon s3.
Basically i want to charge customers for storage and GET requests on my system (which is hosted on s3).
So i created a specific bucket for each client, but i can't seem to get the stats just for a specific bucket.
I see the API lets me
GET Bucket
or
GET Bucket requestPayment
But i just can't find how to get the number of requests issued to said bucket and the total size of the bucket.
Thanks for help !
Regards
I don't think that what you are trying to achieve is possible using Amazon API. The GET Bucket request does not contain usage statistics (requests, etc) other than the timestamp of the latest modification (LastModified).
My suggestion would be that you enable logging in your buckets and perform the analysis that you want from there.
S3 starting page gives you an overview on it:
Amazon S3 also supports logging of requests made against your Amazon S3 resources. You can configure your Amazon S3 bucket to create access log records for the requests made against it. These server access logs capture all requests made against a bucket or the objects in it and can be used for auditing purposes.
And I am sure there is plenty of documentation on that matter.
HTH.