Cloudfront and S3 folder problems - amazon-s3

I have created one bucket and one cloudfront distribution.
I followed this resource https://opsdocks.com/posts/multiple-websites-one-s3/ because I am going to host 3 different domains with only one bucket.
I have this folder structure in my bucket:
folder1 (domain1)
--- index
--- web1
--- web2
folder2 (domain2)
--- index
--- web1
--- web2
folder3 (domain3)
--- index
--- web1
--- web2
I used the lambda#edge function of the resource:
def lambda_handler(event, context):
request = event['Records'][0]['cf']['request']
request['origin']['s3']['path'] = "/"+request['headers']['host'][0]['value']
request['headers']['host'][0]['value'] = request['origin']['s3']['domainName']
return request
It works correctly with the path of the following type: folder1/web1.
However, I have tried so hard to include in the script a behavior when accessing to folder1/ gets a redirection to folder1/index but it downloads an empty file all of the time.
I have made a lot of tests changing the function in so many ways but I do not know why is still doing the same behavior.
Thank you

Related

rclone failing with "AccessControlListNotSupported" on cross-account copy -- AWS CLI Works

Quick Summary now that I think I see the problem:
rclone seems to always send ACL with a copy request, with a default value of "private". This will fail in a (2022) default AWS bucket which (correctly) assumes "No ACL". Need a way to suppress ACL send in rclone.
Detail
I assume an IAM role and attempt to do an rclone copy from a data center Linux box to a default options private no-ACL bucket in the same account as the role I assume. It succeeds.
I then configure a default options private no-ACL bucket in another account than the role I assume. I attach a bucket policy to the cross-account bucket that trusts the role I assume. The role I assume has global permissions to write S3 buckets anywhere.
I test the cross-account bucket policy by using the AWS CLI to copy the same linux box source file to the cross-account bucket. Copy works fine with AWS CLI, suggesting that the connection and access permissions to the cross account bucket are fine. DataSync (another AWS service) works fine too.
Problem: an rclone copy fails with the AccessControlListNotSupported error below.
status code: 400, request id: XXXX, host id: ZZZZ
2022/08/26 16:47:29 ERROR : bigmovie: Failed to copy: AccessControlListNotSupported: The bucket does not allow ACLs
status code: 400, request id: XXXX, host id: YYYY
And of course it is true that the bucket does not support ACL ... which is the desired best practice and AWS default for new buckets. However the bucket does support a bucket policy that trusts my assumed role, and that role and bucket policy pair works just fine with the AWS CLI copy across account, but not with the rclone copy.
Given that AWS CLI copies just fine cross account to this bucket, am I missing one of rclone's numerous flags to get the same behaviour? Anyone think of another possible cause?
Tested older, current and beta rclone versions, all behave the same
Version Info
os/version: centos 7.9.2009 (64 bit)
os/kernel: 3.10.0-1160.71.1.el7.x86_64 (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.18.5
go/linking: static
go/tags: none
Failing Command
$ rclone copy bigmovie s3-standard:SOMEBUCKET/bigmovie -vv
Failing RClone Config
type = s3
provider = AWS
env_auth = true
region = us-east-1
endpoint = https://bucket.vpce-REDACTED.s3.us-east-1.vpce.amazonaws.com
#server_side_encryption = AES256
storage_class = STANDARD
#bucket_acl = private
#acl = private
Note that I've tested all permutations of the commented out lines with similar result
Note that I have tested with and without the private endpoint listed with same results for both AWS CLI and rclone, e.g. CLI works, rclone fails.
A log from the command with the -vv flag
2022/08/25 17:25:55 DEBUG : Using config file from "PERSONALSTUFF/rclone.conf"
2022/08/25 17:25:55 DEBUG : rclone: Version "v1.55.1" starting with parameters ["/usr/local/rclone/1.55/bin/rclone" "copy" "bigmovie" "s3-standard:SOMEBUCKET" "-vv"]
2022/08/25 17:25:55 DEBUG : Creating backend with remote "bigmovie"
2022/08/25 17:25:55 DEBUG : fs cache: adding new entry for parent of "bigmovie", "MYDIRECTORY/testbed"
2022/08/25 17:25:55 DEBUG : Creating backend with remote "s3-standard:SOMEBUCKET/bigmovie"
2022/08/25 17:25:55 DEBUG : bigmovie: Need to transfer - File not found at Destination
2022/08/25 17:25:55 ERROR : bigmovie: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
AccessControlListNotSupported The bucket does not allow ACLs8DW1MQSHEN6A0CFAd3Rlnx/XezTB7OC79qr4QQuwjgR+h2VYj4LCZWLGTny9YAy985be5HsFgHcqX4azSDhDXefLE+U=
2022/08/25 17:25:55 ERROR : Attempt 1/3 failed with 1 errors and: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>

AWS S3 lifecycle configuration filters to disable the storage class glacier to some folder in the same bucket

I have a bucket that contains many folders and I have applied a lifecycle rule to the whole bucket initially and now I would like to exclude some folders (PREfixes like: folderA and FolderB/ContentA) within a bucket?
Bucket Structure:s3://mybucket/
-folderA
-FolderB/ContentA
LifecycleConfiguration:
Rules:
-
Prefix: organization_excluded/
Status: Disabled
Transitions:
-
StorageClass: GLACIER
TransitionInDays: 1
-
Prefix: dbo/organization_excluded/
Status: Disabled
Transitions:
-
StorageClass: GLACIER
TransitionInDays: 1
-
Status: Enabled
Transitions:
-
StorageClass: GLACIER
TransitionInDays: 1
In the above filter rule after one day entire content changed to the storage class as Glacier but my requirement would be the PREfixes like: folderA and FolderB/ContentA should not changed to Glacier as I have disabled, it's because of TransitionInDays set to 1, if I remove the TransitionInDays parameter for the prefix folders stack is failed, is there thing I have missed here.
Having Status: Disabled it only means Amazon S3 does not take any action on disabled rules check here. So basically those rules are useless. Atm there is no way to exclude directly some prefix.

Cloudfront distribution fails to load subdirectories in one stack but works in another

I'm using cloudformation (via serverless framework) to deploy static sites to S3 and set up a cloudfront distribution that is aliased from a route53 domain.
I have this working for two domains, each are new domains that were created in route53. I am trying the same set up with an older domain that I am transferring to route53 from an existing registrar.
The cloudfront distribution for this new domain fails to load sub directories. I.e https://[mydistid].cloudfront.net/sub/dir/ does not load the resource at https://[mydistid].cloudfront.net/sub/dir/index.html
There is a common gotcha covered in other SO questions. You must specify the s3 bucket as a custom origin, in order for CloudFront to apply the Default Root Object to sub directories.
I have done this, as can be seen from my serverless.yml CloudFrontDistribution resource:
XxxxCloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
Aliases:
- ${self:provider.environment.CUSTOM_DOMAIN}
Origins:
- DomainName: ${self:provider.environment.BUCKET_NAME}.s3.amazonaws.com
Id: Xxxx
CustomOriginConfig:
HTTPPort: 80
HTTPSPort: 443
OriginProtocolPolicy: https-only
Enabled: 'true'
DefaultRootObject: index.html
CustomErrorResponses:
- ErrorCode: 404
ResponseCode: 200
ResponsePagePath: /error.html
DefaultCacheBehavior:
AllowedMethods:
- DELETE
- GET
- HEAD
- OPTIONS
- PATCH
- POST
- PUT
TargetOriginId: Xxxx
Compress: 'true'
ForwardedValues:
QueryString: 'false'
Cookies:
Forward: none
ViewerProtocolPolicy: redirect-to-https
ViewerCertificate:
AcmCertificateArn: ${self:provider.environment.ACM_CERT_ARN}
SslSupportMethod: sni-only
This results in the a CF distribution with the s3 bucket as a 'Custom Origin' in AWS.
However, when accessed the sub directories route to the error page rather than the default root object in that directory.
What is extremely odd is that this uses the same config as another stack that is fine. The only diff I can see (so far) is that the working stack has a route53 created domain whereas this uses a domain that originated from another registrar, so I'll see what happens once the name server migration completes. I'm skeptical this will resolve the issue though as the CF distribution shouldn't be affected by the route53 domain status
I have both stacks working now. The problem was the use of the S3 REST API url
${self:provider.environment.BUCKET_NAME}.s3.amazonaws.com
Changing both to the s3 website url works:
${self:provider.environment.BUCKET_NAME}.s3-website-us-east-1.amazonaws.com
I have no explanation as to why the former URL worked with 1 stack but not the other.
The other change that I needed to make was to set the OriginProtocolPolicy of CustomOriginConfig to http-only, this is because s3 websites don't support https.
Here is my updated CloudFormation config:
XxxxCloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
Aliases:
- ${self:provider.environment.CUSTOM_DOMAIN}
Origins:
- DomainName: ${self:provider.environment.BUCKET_NAME}.s3-website-us-east-1.amazonaws.com
Id: Xxxx
CustomOriginConfig:
HTTPPort: 80
OriginProtocolPolicy: http-only
Enabled: 'true'
DefaultRootObject: index.html
CustomErrorResponses:
- ErrorCode: 404
ResponseCode: 200
ResponsePagePath: /error.html
DefaultCacheBehavior:
AllowedMethods:
- DELETE
- GET
- HEAD
- OPTIONS
- PATCH
- POST
- PUT
TargetOriginId: Xxxx
Compress: 'true'
ForwardedValues:
QueryString: 'false'
Cookies:
Forward: none
ViewerProtocolPolicy: redirect-to-https
ViewerCertificate:
AcmCertificateArn: ${self:provider.environment.ACM_CERT_ARN}
SslSupportMethod: sni-only

Fluentd grep + output logs

I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions
The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
Now, the fluentd.conf i use sets s3 stuff like this:
<match **>
# docs: https://docs.fluentd.org/v0.12/articles/out_s3
# note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
type copy
<store>
type s3
log_level info
s3_bucket "#{ENV['S3_BUCKET_NAME']}"
s3_region "#{ENV['S3_BUCKET_REGION']}"
aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
format json
time_slice_format %Y/%m/%d
time_slice_wait 1m
flush_interval 10m
utc
include_time_key true
include_tag_key true
buffer_chunk_limit 128m
buffer_path /var/log/fluentd-buffers/s3.buffer
</store>
<store>
...
</store>
</match>
So, what i would like to do is to have something like a grep plugin
<store>
type grep
<regexp>
key type
pattern client-action
</regexp>
</store>
Which would send logs into a separate s3 bucket to the one defined for all logs
I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
I found your example yaml file at the official fluent github repo.
If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.
The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.
so your above config turns into
<match kubernetes.* >
#type s3
# user log s3 bucket
...
and for system logs match every other tag except kubernetes.*

Amazon S3 + Fog warning: connecting to the matching region will be more performant

I get the following warning while querying Amazon S3 via the Fog gem:
[WARNING] fog: followed redirect to my-bucket.s3-external-3.amazonaws.com, connecting to the matching region will be more performant
How exactly do I "connect to the matching region"?
Set the :region option in the Fog connection parameters to the name of the region in which your bucket exists.
For example, I have a bucket called "bucket-a" in region "eu-west-1" and my s3 key and secret are in variables s3_key and s3_secret respectively.
I can connect to this region directly by opening my Fog connection as follows:
s3 = Fog::Storage.new(provider: 'AWS', aws_access_key_id: s3_key, aws_secret_access_key: s3_secret, region: 'eu-west-1')
And now when I list the contents, no region warning is issued:
s3.directories.get('bucket-a').files
If you want to do this for all your buckets, rather than on a bucket-by-bucket basis you can set the following:
Fog::Storage::AWS::DEFAULT_REGION = 'eu-west-1'