I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions
The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
Now, the fluentd.conf i use sets s3 stuff like this:
<match **>
# docs: https://docs.fluentd.org/v0.12/articles/out_s3
# note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
type copy
<store>
type s3
log_level info
s3_bucket "#{ENV['S3_BUCKET_NAME']}"
s3_region "#{ENV['S3_BUCKET_REGION']}"
aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
format json
time_slice_format %Y/%m/%d
time_slice_wait 1m
flush_interval 10m
utc
include_time_key true
include_tag_key true
buffer_chunk_limit 128m
buffer_path /var/log/fluentd-buffers/s3.buffer
</store>
<store>
...
</store>
</match>
So, what i would like to do is to have something like a grep plugin
<store>
type grep
<regexp>
key type
pattern client-action
</regexp>
</store>
Which would send logs into a separate s3 bucket to the one defined for all logs
I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
I found your example yaml file at the official fluent github repo.
If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.
The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.
so your above config turns into
<match kubernetes.* >
#type s3
# user log s3 bucket
...
and for system logs match every other tag except kubernetes.*
Related
Quick Summary now that I think I see the problem:
rclone seems to always send ACL with a copy request, with a default value of "private". This will fail in a (2022) default AWS bucket which (correctly) assumes "No ACL". Need a way to suppress ACL send in rclone.
Detail
I assume an IAM role and attempt to do an rclone copy from a data center Linux box to a default options private no-ACL bucket in the same account as the role I assume. It succeeds.
I then configure a default options private no-ACL bucket in another account than the role I assume. I attach a bucket policy to the cross-account bucket that trusts the role I assume. The role I assume has global permissions to write S3 buckets anywhere.
I test the cross-account bucket policy by using the AWS CLI to copy the same linux box source file to the cross-account bucket. Copy works fine with AWS CLI, suggesting that the connection and access permissions to the cross account bucket are fine. DataSync (another AWS service) works fine too.
Problem: an rclone copy fails with the AccessControlListNotSupported error below.
status code: 400, request id: XXXX, host id: ZZZZ
2022/08/26 16:47:29 ERROR : bigmovie: Failed to copy: AccessControlListNotSupported: The bucket does not allow ACLs
status code: 400, request id: XXXX, host id: YYYY
And of course it is true that the bucket does not support ACL ... which is the desired best practice and AWS default for new buckets. However the bucket does support a bucket policy that trusts my assumed role, and that role and bucket policy pair works just fine with the AWS CLI copy across account, but not with the rclone copy.
Given that AWS CLI copies just fine cross account to this bucket, am I missing one of rclone's numerous flags to get the same behaviour? Anyone think of another possible cause?
Tested older, current and beta rclone versions, all behave the same
Version Info
os/version: centos 7.9.2009 (64 bit)
os/kernel: 3.10.0-1160.71.1.el7.x86_64 (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.18.5
go/linking: static
go/tags: none
Failing Command
$ rclone copy bigmovie s3-standard:SOMEBUCKET/bigmovie -vv
Failing RClone Config
type = s3
provider = AWS
env_auth = true
region = us-east-1
endpoint = https://bucket.vpce-REDACTED.s3.us-east-1.vpce.amazonaws.com
#server_side_encryption = AES256
storage_class = STANDARD
#bucket_acl = private
#acl = private
Note that I've tested all permutations of the commented out lines with similar result
Note that I have tested with and without the private endpoint listed with same results for both AWS CLI and rclone, e.g. CLI works, rclone fails.
A log from the command with the -vv flag
2022/08/25 17:25:55 DEBUG : Using config file from "PERSONALSTUFF/rclone.conf"
2022/08/25 17:25:55 DEBUG : rclone: Version "v1.55.1" starting with parameters ["/usr/local/rclone/1.55/bin/rclone" "copy" "bigmovie" "s3-standard:SOMEBUCKET" "-vv"]
2022/08/25 17:25:55 DEBUG : Creating backend with remote "bigmovie"
2022/08/25 17:25:55 DEBUG : fs cache: adding new entry for parent of "bigmovie", "MYDIRECTORY/testbed"
2022/08/25 17:25:55 DEBUG : Creating backend with remote "s3-standard:SOMEBUCKET/bigmovie"
2022/08/25 17:25:55 DEBUG : bigmovie: Need to transfer - File not found at Destination
2022/08/25 17:25:55 ERROR : bigmovie: Failed to copy: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
AccessControlListNotSupported The bucket does not allow ACLs8DW1MQSHEN6A0CFAd3Rlnx/XezTB7OC79qr4QQuwjgR+h2VYj4LCZWLGTny9YAy985be5HsFgHcqX4azSDhDXefLE+U=
2022/08/25 17:25:55 ERROR : Attempt 1/3 failed with 1 errors and: s3 upload: 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?>
I am currently setting up my aws s3 bucket for different environments so I can have data in dev, tqa, stg, and prd. The name of my bucket in dev is s3.dev.kafka.sink while in tqa it is named as s3.tqa.kafka.sink each associated with its correct env. The documentation in the Kafka Connect website doesn't specify how to be set the environments, so I did the following way, however I keep getting errors that the bucket name is not named properly.
I put it in the secret yaml file
apiVersion: kubernetes-client.io/v1
kind: ExternalSecret
metadata:
name: kafka-sink-s3-secret
namespace: namespace
spec:
backendType: secretManager
data:
-key: s3.tqa.kafka.sink
name: bucket_name
property: bucket_name
While in deployment file
env:
-name: bucket_name
valueFrom:
secretKeyRef:
name:kaka-sink-s3-secret
key: bucket_name
And I will specify the bucket name in the config:
"s3.bucket.name":"'"$bucket_name"'"
But it fails to deploy. Any idea how can i specify as s3.{{ENV}}.kafka.sink so it runs the correct bucket name in their own env in aws
Out of the box, Kafka Connect doesn't have any way to access environment variables other than those defined by the AWS SDK (the keys and profile, at least)
Sounds like you will need to use a ConfigProvider of the Kafka Connect API
Here's one example on Github, which you'd need to compile and load into your Docker images - https://github.com/giogt/kafka-env-config-provider
Inside the connector properties, use like this
"bucket.name": "${env:ENVIRONMENT_VARIABLE_NAME}"
You should be able to use Helm to better separate/template out the full bucket name within the secret/deployment resource definition
I recently implemented AWS Signature version 4 using the REST API. This is verified by an extensive regression test working perfectly.
The problem I'm experiencing is that the regression test succeeds when run against a bucket residing in the eu-central-1 region, but consistently fails with the Accessed Denied error message for buckets residing in us-east-1 or us-west-2.
Here are snippets from successful and failed attempts.
eu-central-1 : successful
HTTP request:
GET./
host:s3.eu-central-1.amazonaws.com.x-amz-content-sha256:e3b0...b855.x-amz-date:Wed, 25 May 2016 03:13:21 +0000
host;x-amz-content-sha256;x-amz-date.e3b0...b855
Signed string:
AWS4-HMAC-SHA256
Credential=AKIAJZN7UY6XHIZPWIKQ/20160525/eu-central-1/s3/aws4_request,
SignedHeaders=host;x-amz-content-sha256;x-amz-date,
Signature=cf5f...4dc8
Server response:
<?xml version="1.0" encoding="UTF-8"?>
<ListAllMyBucketsResult
xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Owner>
<ID>100a...a575</ID>
</Owner>
<Buckets>
<Bucket>
. . .
</Bucket>
</Buckets>
</ListAllMyBucketsResult>
us-east-1 : failed
HTTP request:
GET./
host:s3.us-east-1.amazonaws.com.x-amz-content-sha256:e3b0...b855.x-amz-date:Wed, 25 May 2016 03:02:27 +0000
host;x-amz-content-sha256;x-amz-date.e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Signed string:
AWS4-HMAC-SHA256
Credential=AKIAJZN7UY6XHIZPWIKQ/20160525/us-east-1/s3/aws4_request,
SignedHeaders=host;x-amz-content-sha256;x-amz-date,
Signature=01e97...4d00
Server response:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>92EEF2A86ECA88EF</RequestId>
<HostId>i3wTU6OzBrlX89xR4KnnezBx1Tb2IGN2wtgPJMRtKLjHxF/B6VdCQqPz1279J7e5</HostId>
</Error>
us-west-2 : failed
HTTP request:
GET./
host:s3.us-west-2.amazonaws.com.x-amz-content-sha256:e3b0...b855.x-amz-date:Wed, 25 May 2016 07:04:47 +0000
host;x-amz-content-sha256;x-amz-date.e3b0...b855
Signed string:
AWS4-HMAC-SHA256
Credential=AKIAJZN7UY6XHIZPWIKQ/20160525/us-west-2/s3/aws4_request,
SignedHeaders=host;x-amz-content-sha256;x-amz-date,
Signature=cf70...36b9
Server response:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>DB143DBF0F316EB8</RequestId>
<HostId>5hWJ0AHM466QcT+BK4UaEFpqXFNaJFEuAPlN/ZZPBhL+NDYBoGaySRkXQ3BRdyfy9PBDuSb0oHA=</HostId>
</Error>
Attempts made to date include:
I found references (like here) where when using US Standard (i.e., us-east-1) the REST endpoint should not include "us-east-1". I have not yet found this written officially. I therefore created a us-west-2 bucket, in the hope that the REST endpoint needs to contain "us-west-2", but that also fails.
I searched on Google and StackOverflow for possible reasons for "Access Denied", which led me to adding a bucket policy that gives permissions to all -- to no avail.
The permissions of the EU and US accounts in the AWS console look the same, so no hint there, yet.
I added logging to the buckets in the hope of seeing a failure entry, but nothing is logged until authentication is completed.
Does anyone have an idea why AWS v4 authentication will consistently succeed for an eu-central-1 bucket, but equally fail for us-east-1 and us-east-2 buckets?
Here's your issue.
For unknown reasons,¹ eu-central-1 is an oddball in S3. The REST endpoint works with two variations in hostname: bucket.s3.eu-central-1.amazonaws.com or bucket.s3-eu-central-1.amazonaws.com.
The difference is the dot or dash after s3.
All other regions (as of now) except us-east-1 and ap-northeast-2 (which is just like eu-central-1) work only with the dash after s3, e.g. bucket.s3-us-west-2.amazonaws.com... not with a dot.
And us-east-1 expects either bucket.s3.amazonaws.com or bucket.s3-external-1.amazonaws.com.
And finally, any region will work with just bucket.s3.amazonaws.com within a few minutes after the original creation of a bucket, because the DNS is integrated with the bucket location database and automatically routes requests to the right place, for each bucket.
But note that when you sign the requests, you always use the actual region name in the signing algorithm itself -- not the endpoint -- as you appear to already be doing.
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
¹I'll speculate that this convention is actually the "new normal" for new regions -- it's more consistent with other AWS services. S3 is one of the oldest, so it makes sense that legacy design decisions are more likely to exist, as seems to be the case, here.
I am trying to get the out_s3 for fluentd working from the past 2 days, I am unable to see the logs on my s3
This is my current config:
<match web.all>
type s3
aws_key_id ......
as_sec_key ......
s3_bucket ......
path logs/
buffer_path /var/log/td-agent/s3
s3_region ap-southeast-1
time_slice_format %Y%m%d%H%M
time_slice_wait 1M
utc
buffer_chunk_limit 256m
</match>
If I try to match the 'web.all' and store it to a file, it works properly
<match web.all>
type file
path /var/log/td-agent/web-all.log
</match>
For some reason and not knowing how to debug, I am unable to put it on s3. Any direction on how to go about in debugging this?
EDIT
2015-10-18 22:46:51 -0400 [error]: unexpected error error_class=RuntimeError error=#<RuntimeError: can't call S3 API. Please check your aws_key_id / aws_sec_key or s3_region configuration. error = #<AWS::S3::Errors::InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.>>
But the key provided is valid. I am able to see change in access key "last used time" update every time I restart td-agent.
I have created an AMI image and installed Hadoop from the Cloudera CDH2 build. I configured my core-site.xml as so:
<property>
<name>fs.default.name</name>
<value>s3://<BUCKET NAME>/</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value><ACCESS ID></value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value><SECRET KEY></value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>
But I get the following error message when I start up the hadoop daemons in the namenode log:
2010-11-03 23:45:21,680 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.default.name): s3://<BUCKET NAME>/ is not of scheme 'hdfs'.
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:177)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:198)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1006)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1015)
2010-11-03 23:45:21,691 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
However, I am able to execute hadoop commands from the command line like so:
hadoop fs -put sun-javadb-common-10.5.3-0.2.i386.rpm s3://<BUCKET NAME>/
hadoop fs -ls s3://poc-jwt-ci/
Found 3 items
drwxrwxrwx - 0 1970-01-01 00:00 /
-rwxrwxrwx 1 16307 1970-01-01 00:00 /sun-javadb-common-10.5.3-0.2.i386.rpm
drwxrwxrwx - 0 1970-01-01 00:00 /var
You will notice there is a / and a /var folders in the bucket. I ran the hadoop namenode -format when I first saw this error, then restarted all services, but still receive the weird Invalid URI for NameNode address (check fs.default.name): s3://<BUCKET NAME>/ is not of scheme 'hdfs'.
I also notice that the file system created looks like this:
hadoop fs -ls s3://<BUCKET NAME>/var/lib/hadoop-0.20/cache/hadoop/mapred/system
Found 1 items
-rwxrwxrwx 1 4 1970-01-01 00:00 /var/lib/hadoop0.20/cache/hadoop/mapred/system/jobtracker.info
Any ideas of what's going on?
First I suggest you just use Amazon Elastic MapReduce. There is zero configuration required on your end. EMR also has a few internal optimizations and monitoring that works in your benefit.
Second, do not use s3: as your default FS. First, s3 is too slow to be used to store intermediate data between jobs (a typical unit of work in hadoop is a dozen to dozens of MR jobs). it also stores the data in a 'proprietary' format (blocks etc). So external apps can't effectively touch the data in s3.
Note that s3: in EMR is not the same s3: in the standard hadoop distro. The amazon guys actually alias s3: as s3n: (s3n: is just raw/native s3 access).
You could also use Apache Whirr for this workflow like this:
Start by downloading the latest release (0.7.0 at this time) from http://www.apache.org/dyn/closer.cgi/whirr/
Extract the archive and try to run ./bin/whirr version. You need to have Java installed for this to work.
Make your Amazon AWS credentials available as environment variables:
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
Update the Hadoop EC2 config to match your needs by editing recipes/hadoop-ec2.properties. Check the Configuration Guide for more info.
Start a cluster Hadoop by running:
./bin/whirr launch-cluster --config recipes/hadoop-ec2.properties
You can see verbose logging output by doing tail -f whirr.log
Now you can login to your cluster and do your work.
./bin/whirr list-cluster --config recipes/hadoop-ec2.properties
ssh namenode-ip
start jobs as needed or copy data from / to S3 using distcp
For more explanations you should read the Quick Start Guide and the 5 minutes guide.
Disclaimer: I'm one of the committers.
I think you should not execute bin/hadoop namenode -format, because it is used for format the hdfs. In the later version, hadoop has move these functions in a separate scripts file which called "bin/hdfs". After you set the configuration parameters in core-site.xml and other configuration files, you can use S3 as the underlying file system directly.
Use
fs.defaultFS = s3n://awsAccessKeyId:awsSecretAccessKey#BucketName in your /etc/hadoop/conf/core-site.xml
Then do not start your datanode or namenode, if you have services that need your datanode and namenode this will not work..
I did this and can access my bucket using commands like
sudo hdfs dfs -ls /
Note if you have awsSecretAccessKey's with "/" character then you will have to url encode this.
Use s3n instead of s3.
hadoop fs -ls s3n://<BUCKET NAME>/etc