Logstash sprintf formatting for elasticsearch output plugin not working - indexing

I am having trouble using sprintf to reference the event fields in the elasticsearch output plugin and I'm not sure why. Below is the event received from Filebeat and sent to Elasticsearch after filtering is complete:
{
"beat" => {
"hostname" => "ca86fed16953",
"name" => "ca86fed16953",
"version" => "6.5.1"
},
"#timestamp" => 2018-12-02T05:13:21.879Z,
"host" => {
"name" => "ca86fed16953"
},
"tags" => [
[0] "beats_input_codec_plain_applied",
[1] "_grokparsefailure"
],
"fields" => {
"env" => "DEV"
},
"source" => "/usr/share/filebeat/dockerlogs/logstash_DEV.log",
"#version" => "1",
"prospector" => {
"type" => "log"
},
"bgp_id" => "42313900",
"message" => "{<some message here>}",
"offset" => 1440990627,
"input" => {
"type" => "log"
},
"docker" => {
"container" => {
"id" => "logstash_DEV.log"
}
}
}
I am trying to index the files this based on filebeat's environment. Here is my config file:
input {
http { }
beats {
port => 5044
}
}
filter {
grok {
patterns_dir => ["/usr/share/logstash/pipeline/patterns"]
break_on_match => false
match => { "message" => ["%{RUBY_LOGGER}"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "%{[fields][env]}-%{+yyyy.MM.dd}"
}
stdout { codec => rubydebug }
}
I would think the referenced event fields would have already been populated by the time it reaches the elasticsearch output plugin. However, on the kibana end, it doesnt not register the formatted index. Instead, its since like this:
What have I done wrong?

In Elasticsearch Output plugin docs:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template
Should you require support for other index names, or would like to
change the mappings in the template in general, a custom template can
be specified by setting template to the path of a template file.
Setting manage_template to false disables this feature. If you require
more control over template creation, (e.g. creating indices
dynamically based on field names) you should set manage_template to
false and use the REST API to apply your templates manually.
By default, elasticsearch requires you to specify a custom template if using different index names other than logstash-%{+YYYY.MM.dd}. To disable, we need to include the manage_template => false key.
So with this new set of info, the working config should be:
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "%{[fields][env]}-%{+yyyy.MM.dd}"
manage_template => false
}
stdout { codec => rubydebug }
}

Related

Create field from file name condition in logstash

I have several logs with the following names, where [E-1].[P-28], [E-1].[P-45] and [E-1].[P-51] are operators that generate these logs (They do not appear within the data. I can only identify them by obtaining from the file name)
p2sajava131.srv.gva.es_11101.log.online.[E-1].[P-28].21.01.21.log
p1sajava130.srv.gva.es_11101.log.online.[E-1].[P-45].21.03.04.log
p1sajava130.srv.gva.es_11101.log.online.[E-1].[P-51].21.03.04.log
...
is it posible to use translate filter create a new field?
somethink like:
translate{
field => "[log.file.path]"
destination => "[operator_name]"
dictionary => {
if contains "[E-1].[P-28]" => "OPERATOR-1"
if contains "[E-1].[P-45]" => "OPERATOR-2"
if contains "[E-1].[P-51]" => "OPERATOR-3"
thanx
I don't have ELK here so I can't test but this should works
if [log][file][path] =~ "[E-1].[P-28]" {
mutate {
add_field => { "[operator][name]" => "OPERATOR-1" }
}
}
if [log][file][path] =~ "[E-1].[P-45]" {
mutate {
add_field => { "[operator][name]" => "OPERATOR-2" }
}
}
if [log][file][path] =~ "[E-1].[P-51]" {
mutate {
add_field => { "[operator][name]" => "OPERATOR-3" }
}
}

Using RabbitMQ fields in Logstash output

I want to use some fields from RabbitMQ messages into Logstah Elasticsearch output (like a index name, etc).
If I use [#metadata][rabbitmq_properties][timestamp] in filter it works nice, but not in output statement (config below).
What am I doing wrong?
input {
rabbitmq {
host => "rabbitmq:5672"
user => "user"
password => "password"
queue => "queue "
durable => true
prefetch_count => 1
threads => 3
ack => true
metadata_enabled => true
}
}
filter {
if [#metadata][rabbitmq_properties][timestamp] {
date {
match => ["[#metadata][rabbitmq_properties][timestamp]", "UNIX"]
}
}
}
output {
elasticsearch {
hosts => ['http://elasticsearch:9200']
index => "%{[#metadata][rabbitmq_properties][IndexName]}_%{+YYYY.MM.dd}"
}
stdout {codec => rubydebug}
}
check with replace function as mentioned below.
input {
rabbitmq {
host => "rabbitmq:5672"
user => "user"
password => "password"
queue => "queue "
durable => true
prefetch_count => 1
threads => 3
ack => true
metadata_enabled => true
}
}
filter {
if [#metadata][rabbitmq_properties][timestamp] {
date {
match => ["[#metadata][rabbitmq_properties][timestamp]", "UNIX"]
}
}
mutate {
replace => {
"[#metadata][index]" => "%{[#metadata][rabbitmq_properties][IndexName]}_%{+YYYY.MM.dd}"
}
}
}
output {
elasticsearch {
hosts => ['http://elasticsearch:9200']
index => "%{[#metadata][index]}_%{+YYYY.MM.dd}"
}
stdout {codec => rubydebug}
}

only strings in influxdb

i've this config file in logstash
input {
redis{
host => "localhost"
data_type => "list"
key => "vortex"
threads => 4
type => "testrecord"
codec => "plain"
}
}
filter {
kv {
add_field => {
"test1" => "yellow"
"test" => "ife"
"feild" => "pink"
}
}
}
output {
stdout { codec => rubydebug }
influxdb {
db => "toast"
host => "localhost"
measurement => "myseries"
allow_time_override => true
use_event_fields_for_data_points => true
exclude_fields => ["#version", "#timestamp", "sequence", "message", "type", "host"]
send_as_tags => ["bar", "feild", "test1", "test"]
}
}
and a list in redis with the following data:
foo=10207 bar=1 sensor2=1 sensor3=33.3 time=1489686662
everything works fine but every field in influx is defined as string regardless of values.
does anybody know how to get around this issue?
The mutate filter may be what you're looking for here.
filter {
mutate {
convert => {
"value" => "integer"
"average" => "float"
}
}
}
It means you need to know what your fields are before-hand, but it will convert them into the right data-type.

Logstash with multiple kafka inputs

I am trying to filter kafka events from multiple topics, but once all events from one topic has been filtered logstash is not able to fetch events from the other kafka topic. I am using topics with 3 partitions and 2 replications Here is my logstash config file
input {
kafka{
auto_offset_reset => "smallest"
consumer_id => "logstashConsumer1"
topic_id => "unprocessed_log1"
zk_connect=>"192.42.79.67:2181,192.41.85.48:2181,192.10.13.14:2181"
type => "kafka_type_1"
}
kafka{
auto_offset_reset => "smallest"
consumer_id => "logstashConsumer1"
topic_id => "unprocessed_log2"
zk_connect => "192.42.79.67:2181,192.41.85.48:2181,192.10.13.14:2181"
type => "kafka_type_2"
}
}
filter{
if [type] == "kafka_type_1"{
csv {
separator=>" "
source => "data"
}
}
if [type] == "kafka_type_2"{
csv {
separator => " "
source => "data"
}
}
}
output{
stdout{ codec=>rubydebug{metadata => true }}
}
Its a very late reply but if you wanted to take input multiple topic and output to another kafka multiple output, you can do something like this :
input {
kafka {
topics => ["topic1", "topic2"]
codec => "json"
bootstrap_servers => "kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092"
decorate_events => true
group_id => "logstash-multi-topic-consumers"
consumer_threads => 5
}
}
output {
if [kafka][topic] == "topic1" {
kafka {
codec => "json"
topic_id => "new_topic1"
bootstrap_servers => "output-kafka-1:9092"
}
}
else if [kafka][topic] == "topic2" {
kafka {
codec => "json"
topic_id => "new_topic2"
bootstrap_servers => "output-kafka-1:9092"
}
}
}
Be careful while detailing your bootstrap servers, give name on which your kafka has advertised listeners.
Ref-1: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-group_id
Ref-2: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-decorate_events
The previous answer didn't work for me and it seems it doses not recognize conditional statements in output, Here is my answer which correct and valid at least for my case where I have defined tags in input for both Kafka consumers and documents (in my case they are logs) are ingested into separate indexes related to their consumer topics .
input {
kafka {
group_id => "35834"
topics => ["First-Topic"]
bootstrap_servers => "localhost:9092"
codec => json
tags => ["First-Topic"]
}
kafka {
group_id => "35834"
topics => ["Second-Topic"]
bootstrap_servers => "localhost:9092"
codec => json
tags => ["Second-Topic"]
}
}
filter {
}
output {
if "Second-Topic" in [tags]{
elasticsearch {
hosts => ["localhost:9200"]
document_type => "_doc"
index => "logger"
}
stdout { codec => rubydebug
}
}
else if "First-Topic" in [tags]{
elasticsearch {
hosts => ["localhost:9200"]
document_type => "_doc"
index => "saga"
}
stdout { codec => rubydebug
}
}
}
Probably this is what you need:
input {
kafka {
client_id => "logstash_server"
topics => ["First-Topic", "Second-Topic"]
codec => "json"
decorate_events = true
bootstrap_servers => "localhost:9092"
}
}
filter { }
output {
if [#metadata][kafka][topic] == "First-Topic" {
elasticsearch {
hosts => ["localhost:9200"]
index => "logger"
}
}
else if [#metadata][kafka][topic] == "Second-Topic" {
elasticsearch {
hosts => ["localhost:9200"]
index => "saga"
}
}
else {
elasticsearch {
hosts => ["localhost:9200"]
index => "catchall"
}
}
}
There's no need on having two separate inputs of Kafka if they point to the same Bootstrap, you just have to specify the list of topics you want to read from Logstash.
You could also add the "stdout { codec => rubydebug }" if you want to, but that's usually used when debugging, in a prod environment that would cause a lot of noise. 'document_type => "_doc"' can also be used if you want but is not a must, and in the new version of Elasticsearch (8.0) that option is already deprecated, I would simply get rid of it.
And I also added a final "else" statement to the output, if for some reason any of the statements match, it's also important to send the events to any other default index, in this case "catchall".

Variables in logstash config not substituted

None of the variables in prefix are substituted - why?
It was working with on old version of logstash (1.5.4) but doesn't anymore with 2.3.
Part of the output filter in logstash.cfg (dumps to s3):
output {
if [bucket] == "bucket1" {
s3 {
bucket => "bucket1"
access_key_id => "****"
secret_access_key => "****"
region => "ap-southeast-2"
prefix => "%{env}/%{year}/%{month}/%{day}/"
size_file => 50000000 #50mb
time_file => 1
codec => json_lines # save log as json line (no newlines)
temporary_directory => "/var/log/temp-logstash"
tags => ["bucket1"]
}
}
..
}
Example dataset (taken from stdout):
{
"random_person" => "Kenneth Cumming 2016-04-14 00:53:59.777647",
"#timestamp" => "2016-04-14T00:53:59.917Z",
"host" => "192.168.99.1",
"year" => "2016",
"month" => "04",
"day" => "14",
"env" => "dev",
"bucket" => "bucket1"
}
Just in case, here is the filter:
filter {
mutate {
add_field => {
"request_uri" => "%{[headers][request_path]}"
}
}
grok {
break_on_match => false # default behaviour is to stop matching after first match, we don't want that
match => { "#timestamp" => "%{NOTSPACE:date}T%{NOTSPACE:time}Z"} # break timestamp field into date and time
match => { "date" => "%{INT:year}-%{INT:month}-%{INT:day}"} # break date into year month and day fields
match => { "request_uri" => "/%{WORD:env}/%{NOTSPACE:bucket}"} # break request uri into environment and bucket fields
}
mutate {
remove_field => ["request_uri", "headers", "#version", "date", "time"]
}
}
It's a known issue that field variables aren't allowed in 'prefix'.