Multiple Logstash instances causing duplication of lines

Multiple Logstash instances causing duplication of lines - redis

We're receiving logs using Logstash with the following configuration:
input {
udp {
type => "logs"
port => 12203
}
}
filter {
grok {
type => "tracker"
pattern => '%{GREEDYDATA:message}'
}
date {
type => "tracker"
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
output{
tcp{
type => "logs"
host => "host"
port => 12203
}
}
We're then picking the logs up on the machine "host" with the following settings:
input {
tcp {
type => "logs"
port => 12203
}
}
output {
pipe {
command => "python /usr/lib/piperedis.py"
}
}
From here, we're doing parsing of the lines and putting them into a Redis database. However, we've discovered an interesting problem.
Logstash 'wraps' the log message in a JSON style package i.e.:
{\"#source\":\"source/\",\"#tags\":[],\"#fields\":{\"timestamp\":[\"2013-09-16 15:50:47,440\"],\"thread\":[\"ajp-8009-7\"],\"level\":[\"INFO\"],\"classname\":[\"classname\"],\"message\":[\"message"\]}}
We then, on receiving it and passing it on on the next machine, take that as the message and put it in another wrapper! We're only interested in the actual log message and none of the other stuff (source path, source, tags, fields, timestamp e.t.c.)
Is there a way we can use filters or something to do this? We've looked through the documentation but can't find any way to just pass the raw log lines between instances of Logstash.
Thanks,
Matt

The logstash documentation is wrong - it indicates that the default "codec" is plain but in fact it doesn't use a codec - it uses an output format.
To get a simpler output, change your output to something like
output {
pipe {
command => "python /usr/lib/piperedis.py"
message_format => "%{message}"
}
}

Why not just extract those messages from stdout?
line = sys.stdin.readline()
line_json = json.loads(line)
line_json['message'] # will be your #message

Related

Logstash current date logstash.conf as backup_add_prefix (s3 input plugin)

I want to add the current date to every filename that is incoming to my s3 bucket.
My current config looks like this:
input {
s3 {
access_key_id => "some_key"
secret_access_key => "some_access_key"
region => "some_region"
bucket => "mybucket"
interval => "10"
sincedb_path => "/tmp/sincedb_something"
backup_add_prefix =>'%{+yyyy.MM.dd.HH}'
backup_to_bucket => "mybucket"
additional_settings => {
force_path_style => true
follow_redirects => false
}
}
}
Is there a way to use the current date in backup_add_prefix =>'%{+yyyy.MM.dd.HH}'
because the current syntax does not work as it produces: "
%{+yyyy.MM.dd.HH}test_file.txt" in my bucket.

Though it's not supported in s3 input plugin directly, it can be achieved. Use the following steps:
Go to logstash home path.
Open the file vendor/bundle/jruby/2.3.0/gems/logstash-input-s3-3.4.1/lib/logstash/inputs/s3.rb. The exact path will depend on your lagstash version.
Look for the method backup_to_bucket.
There is a line backup_key = "#{#backup_add_prefix}#{object.key}"
Add following lines before the above line:
t = Time.new
date_s3 = t.strftime("%Y.%m.%d")
Now change the backup_key to #{#backup_add_prefix}#{date_s3}#{object.key}
Now you are done. Restart your logstash pipeline. It should be able to achieve the desired result.

How can I use 'puppetlabs/rabbitmq' module to set up HA rabbitMQ?

I am in no ways an expert on RabbitMQ, but I am trying to puppetize the setup of a RabbitMQ cluster. In the documentation a co-worker of mine wrote I need to implement the equivalent of executing ...
rabbitmqctl set_policy HA '^(?!amq.).*' '{"ha-mode": "all"}
... in my puppet manifest. I tried this ...
rabbitmq_policy { 'HA':
pattern => '^(?!amq.).*',
priority => 0,
applyto => 'all',
definition => {
'ha-mode' => 'all',
'ha-sync-mode' => 'automatic',
},
}
... but I get this error when I do my "puppet agent -t" on my rabbit code:
Error: Failed to apply catalog: Parameter name failed on Rabbitmq_policy[HA]: Invalid value "HA". Valid values match /^\S+#\S+$/. at /etc/puppetlabs/code/environments/production/modules/core/wraprabbitmq/manifests/init.pp:59
What am I doing wrong? Also do I have/need to have something like this ...
rabbitmq_vhost { 'myvhost':
ensure => present,
}
... if I am setting up HA rabbitMQ?
Update: Thanks Matt.
I am using this now:
rabbitmq_policy { 'HA#/':
pattern => '^(?!amq.).*',
priority => 0,
applyto => 'all',
definition => {
'ha-mode' => 'all',
'ha-sync-mode' => 'automatic',
},
}
Also I did not need to use this:
rabbitmq_vhost { 'myvhost':
ensure => present,
}

Checking the source code here: https://github.com/puppetlabs/puppetlabs-rabbitmq/blob/master/lib/puppet/type/rabbitmq_policy.rb#L21-L24
we see that the name parameter for that type needs to be 'combination of policy#vhost to create policy for.' Your value of 'HA' does not follow that nomenclature and thus fails the regexp check of /^\S+#\S+$/.
You need to put a name following the format of 'policy#vhost' for the rabbitmq_policy resource and then your code will compile.

perl6 'do(file)' equivalent

In perl5 I used to 'do (file)' for configuration files like this:
---script.pl start ---
our #conf = ();
do '/path/some_conf_file';
...
foreach $item (#conf) {
$item->{rules} ...
...
---script.pl end ---
---/path/some_conf_file start ---
# arbitrary code to 'fill' #conf
#conf = (
{name => 'gateway',
rules => [
{verdict => 'allow', srcnet => 'gw', dstnet => 'lan2'}
]
},
{name => 'lan <-> lan2',
rules => [
{srcnet => 'lan', dstnet => 'lan2',
verdict => 'allow', dstip => '192.168.5.0/24'}
]
},
);
---/path/some_conf_file end ---
Also Larry Wall's "Programming Perl" also mentions this method:
But do FILE is still useful for such things as reading program
configuration files. Manual error checking can be done this way:
# read in config files: system first, then user
for $file ("/usr/share/proggie/defaults.rc",
"$ENV{HOME}/.someprogrc") {
unless ($return = do $file) {
warn "couldn't parse $file: $#" if $#;
warn "couldn't do $file: $!" unless defined $return;
warn "couldn't run $file" unless $return;
} }
Benefits:
does not require write your own parser each time - perl parse and
create data structures for you;
faster/simpler: native perl data
structures/types without overheads for converting from external format (like YAML);
does not require manipulate #INC to load the
module from somewhere compared to module as conf file;
less extra
code compared to modules as conf file;
"syntax" of "configuration file" is powerful as perl itself;
"ad hoc" format;
Disadvantages:
no isolation: we can execute/destroy anything from "configuration
file";
How do I get the same with perl6?
Is there way to do it better in perl6 (without Disadvantages) and without parsing own syntax, grammars, module including?
Something like "Load hashes or arrays from text representation from file"?

You can use EVALFILE($file) (ref. http://doc.perl6.org/language/5to6-perlfunc#do).
As you pointed out, using EVALFILE has disadvantages, so I'm not going to add anything in that direction :-)
Here's a sample configuration file:
# Sample configuration (my.conf)
{
colour => "yellow",
pid => $*PID,
homedir => %*ENV<HOME> ~ "/.myscript",
data_source => {
driver => "postgres",
dbname => "test",
user => "test_user",
}
}
and here's a sample script using it:
use v6;
# Our configuration is in this file
my $config_file = "my.conf";
my %config := EVALFILE($config_file);
say "Hello, world!\n";
say "My homedir is %config<homedir>";
say "My favourite colour is %config<colour>";
say "My process ID is %config<pid>";
say "My database configuration is:";
say %config<data_source>;
if $*PID != %config<pid> {
say "Strange. I'm not the same process that evaluated my configuration.";
}
else {
say "BTW, I am still the same process after reading my own configuration.";
}

Logstash: how to use filter to match filename when using s3

I am new to logstash. I have some logs stored in AWS S3 and I am able to import them to logstash. My question is: is it possible to use the grok filter to add tags based on the filenames? I try to use:
grok {
match => {"path" => "%{GREEDYDATA}/%{GREEDYDATA:bitcoin}.err.log"}
add_tag => ["bitcoin_err"]
}
This is not working. I guess the reason is "path" only working with file inputs.
Here is the structure of my S3 buckets:
my_buckets
----A
----2014-07-02
----a.log
----b.log
----B
----2014-07-02
----a.log
----b.log
I am using this inputs conf:
s3 {
bucket => "my_buckets"
region => "us-west-1"
credentials => ["XXXXXX","XXXXXXX"]
}
What I want is that, for any log messages in:
"A/2014-07-02/a.log": they will have tag ["A","a"].
"A/2014-07-02/b.log": they will have tag ["A","b"].
"B/2014-07-02/a.log": they will have tag ["B","a"].
"B/2014-07-02/b.log": they will have tag ["B","b"].
Sorry about my english....

There is no "path" in S3 inputs. I mount the S3 storage on my server and use the file inputs. With file inputs, I can use the filter to match the path now.

With Logstash 6.0.1, I was able to get key for each file from S3. In your case, you can use this key (or path) in filter to add tags.
Example:
input {
s3 {
bucket => "<bucket-name>"
prefix => "<prefix>"
}
}
filter {
mutate {
add_field => {
"file" => "%{[#metadata][s3][key]}"
}
}
...
}
Use this above file field in filter to add tags.
Reference:
Look for eye8 answer in this issue

If you want to use tags based on filename, I think that this will work (I have not test it):
filter {
grok {
match => [ "path", "%{GREEDYDATA:content}"]
}
mutate {
add_tag => ["content"]
}
}
"content" tag will be the filename, now it's up to you to modify the pattern to create differents tags with the specific part of the filename.

How to purge data older than 30 days from Redis Server

I am using Logstash, Redis DB, ElasticSearch and Kibana 3 for my centalize log server. It's working fine and I am able to see the logs in Kibana. Now I want to keep only 30 days log in ElasticSearch and Redis Server. Is it possible to purge data from Redis?
I am using the below configuration
indexer.conf
input {
redis {
host => "127.0.0.1"
port => 6379
type => "redis-input"
data_type => "list"
key => "logstash"
format => "json_event"
}
}
output {
stdout { debug => true debug_format => "json"}
elasticsearch {
host => "127.0.0.1"
}
}
shipper.conf
input {
file {
type => "nginx_access"
path => ["/var/log/nginx/**"]
exclude => ["*.gz", "error.*"]
discover_interval => 10
}
}
filter {
grok {
type => nginx_access
pattern => "%{COMBINEDAPACHELOG}"
}
}
output {
stdout { debug => true debug_format => "json"}
redis { host => "127.0.0.1" data_type => "list" key => "logstash" }
}
As per this configuration the shipper file is sending data to Redis DB with the key "logstash". From the redis db documents I came to know that we can set TTL for any key with expire command to purge them. But when I am searching for the key "logstash" in redis db keys logstash or keys *I am not getting any result. Please let me know if my question is not understandable. Thanks in advance.

Redis is a key:value store. Keys are unique by definition. So if you want to store several logs, you need to add a new entry, with a new key and associated value, for each log.
So it seems to me you have a fundamental flaw here, as you're always using the same key for all your logs. Try with a different key for each log (not sure how to do that).
Then set TTL to 30 days.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple Logstash instances causing duplication of lines - redis

Why not just extract those messages from stdout? line = sys.stdin.readline() line_json = json.loads(line) line_json['message'] # will be your #message

Related

Logstash current date logstash.conf as backup_add_prefix (s3 input plugin)

How can I use 'puppetlabs/rabbitmq' module to set up HA rabbitMQ?

perl6 'do(file)' equivalent

Logstash: how to use filter to match filename when using s3

How to purge data older than 30 days from Redis Server

Categories

Resources