Send large size logs from fluentd to s3 - amazon-s3

I am sending logs to s3 from fluentd configured as a sidecar, but I want log file(zip) that is being uploaded to s3 to be larger in size, is there any configuration to specify the size? In my buffer is filled with 1.1 mb of logs, but on s3 I only have 1 Kb of logs.
By specifying timekey 1h I am getting logs every 1 hour but, not all the logs are shipped to s3.
I took reference from: https://docs.fluentd.org/configuration/buffer-section and https://docs.fluentd.org/output/s3
my s3 output plugin configuration for s3 is
<match **>
#type s3
aws_key_id xxx
aws_sec_key xxx
s3_bucket bucket-name-x
s3_region eu-north-1
path logs/
<buffer tag,time>
#type file
path /var/log/fluent/s3
timekey 1h
timekey_wait 600s
chunk_limit_size 256m
</buffer>
time_slice_format %Y%m%d%H
</match>

Related

how to use fields from record transformer in S3 path fluentd-output-s3 plugin

we have below record transformer config in our fluentd pipeline:
<filter docker.**>
#type record_transformer
enable_ruby true
<record>
servername as1
hostname "#{Socket.gethostname}"
project xyz
env prod
service ${record["docker"]["labels"]["com.docker.compose.service"]}
</record>
remove_keys $.docker.container_hostname,$.docker.id, $.docker.image_id,$.docker.labels.com.docker.compose.config-hash, $.docker.labels.com.docker.compose.oneoff, $.docker.labels.com.docker.compose.project, $.docker.labels.com.docker.compose.service
</filter>
we are using S3 plugin to push logs to S3. now we want to save logs on S3 with custome path like ProjectName/ENv/service for this we create S3 output plugin like below:
<store>
#type s3
s3_bucket test
s3_region us-east-1
store_as gzip_command
path logs
s3_object_key_format %{path}/${project}/${env}/${service}/%Y/%m/%d/%{time_slice}_%{index}.%{file_extension}
<buffer tag,time,project,env,service>
type file
path /var/log/td-agent/container-buffer-s3
timekey 300 # 1 minutes
timekey_wait 1m
timekey_use_utc true
chunk_limit_size 256m
</buffer>
time_slice_format %Y%m%d%H
</store>
Unfortunately this is not working for us. getting below warning logs:
{"time":"2021-08-07 17:59:49","level":"warn","message":"chunk key placeholder 'project' not replaced. template:logs/${project}/${env}/${service}/%Y/%m/%d/%{time_slice}_%{index}.gz","worker_id":0}
looking forward for guidance or any suggestions on this.
this config is correct and its working for us.
<store>
#type s3
s3_bucket test
s3_region us-east-1
store_as gzip_command
path logs
s3_object_key_format %{path}/${project}/${env}/${service}/%Y/%m/%d/%{time_slice}_%{index}.%{file_extension}
<buffer tag,time,project,env,service>
type file
path /var/log/td-agent/container-buffer-s3
timekey 300 # 1 minutes
timekey_wait 1m
timekey_use_utc true
chunk_limit_size 256m
</buffer>
time_slice_format %Y%m%d%H
</store>

Logs shipped with wrong timestamp and timekey ignored

I want to ship my Vault logs to s3. Based on this issue I did this:
## vault input
<source>
#type tail
path /var/log/vault_audit.log
pos_file /var/log/td-agent/vault.audit_log.pos
<parse>
#type json
</parse>
tag s3.vault.audit
</source>
## s3 output
<match s3.*.*>
#type s3
s3_bucket vault
path logs/
<buffer time>
#type file
path /var/log/td-agent/s3
timekey 30m
timekey_wait 5m
chunk_limit_size 256m
</buffer>
time_slice_format %Y/%m/%d/%H%M
</match>
What I'd expect is for my logs to be shipped to S3 every 30 minutes, and be formatted in directories as ie: logs/2019/05/01/1030
Instead my logs are shipped every 2-3ish minutes on average, and the output time format in S3 is starting from the epoch ie: logs/1970/01/01/0030_0.gz
(the time is correctly set on my system)
Here is sample configuration which worked fine for me.
You need to make sure, you pass time to buffer section and also try to provide what kind of format it should be explicitly.
Check whether your match expression is working fine by checking agent start up logs. Also, try with <match s3.**>
<match>
#type s3
s3_bucket somebucket
s3_region "us-east-1"
path "logs/%Y/%m/%d/%H"
s3_object_key_format "%{path}/%{time_slice}_%{index}.%{file_extension}"
include_time_key true
time_format "%Y-%m-%dT%H:%M:%S.%L"
<buffer tag,time>
#type file
path /fluentd/buffer/s3
timekey_wait 5m
timekey 30m
chunk_limit_size 64m
flush_at_shutdown true
total_limit_size 256m
overflow_action block
</buffer>
<format>
#type json
</format>
time_slice_format %Y%m%d%H%M%S
</match>

Fluentd grep + output logs

I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets.
One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions
The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in.
Now, the fluentd.conf i use sets s3 stuff like this:
<match **>
# docs: https://docs.fluentd.org/v0.12/articles/out_s3
# note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
type copy
<store>
type s3
log_level info
s3_bucket "#{ENV['S3_BUCKET_NAME']}"
s3_region "#{ENV['S3_BUCKET_REGION']}"
aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
format json
time_slice_format %Y/%m/%d
time_slice_wait 1m
flush_interval 10m
utc
include_time_key true
include_tag_key true
buffer_chunk_limit 128m
buffer_path /var/log/fluentd-buffers/s3.buffer
</store>
<store>
...
</store>
</match>
So, what i would like to do is to have something like a grep plugin
<store>
type grep
<regexp>
key type
pattern client-action
</regexp>
</store>
Which would send logs into a separate s3 bucket to the one defined for all logs
I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes.
I found your example yaml file at the official fluent github repo.
If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.
The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here.
Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.
so your above config turns into
<match kubernetes.* >
#type s3
# user log s3 bucket
...
and for system logs match every other tag except kubernetes.*

Is it possible to set host in fluentd s3_output

Is it possible to set somehow %{host} (not %{hostmane}: it is point to local fluentd server) in the fluentd S3 path like:
s3://logs/2018/07/10/web01/misc-20180710-07_1.gz
host is one of the message field "host":"ip-10-78-46-14"
<match ** >
#type s3
s3_bucket logs
s3_region us-west-2
path %Y/%m/%d
time_slice_format %Y%m%d-%H
s3_object_key_format %{path}/%{host}/misc-%{time_slice}_%{index}.%{file_extension}
include_timestamp true
<buffer host,tag,time>
#type file
path /buffers/s3/infra/misc/
timekey 1h # 1 hour partition
timekey_wait 10m
timekey_use_utc true # use utc
chunk_limit_size 256m
</buffer>
<format>
#type json
</format>
</match>
This config is not working...
Thank you,
AP

fluentd s3 output plugin configuration

I am trying to get the out_s3 for fluentd working from the past 2 days, I am unable to see the logs on my s3
This is my current config:
<match web.all>
type s3
aws_key_id ......
as_sec_key ......
s3_bucket ......
path logs/
buffer_path /var/log/td-agent/s3
s3_region ap-southeast-1
time_slice_format %Y%m%d%H%M
time_slice_wait 1M
utc
buffer_chunk_limit 256m
</match>
If I try to match the 'web.all' and store it to a file, it works properly
<match web.all>
type file
path /var/log/td-agent/web-all.log
</match>
For some reason and not knowing how to debug, I am unable to put it on s3. Any direction on how to go about in debugging this?
EDIT
2015-10-18 22:46:51 -0400 [error]: unexpected error error_class=RuntimeError error=#<RuntimeError: can't call S3 API. Please check your aws_key_id / aws_sec_key or s3_region configuration. error = #<AWS::S3::Errors::InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.>>
But the key provided is valid. I am able to see change in access key "last used time" update every time I restart td-agent.