Using redis with logstash - redis

I'm wondering what are the pros and cons of using redis as a broker in an infrastructure?
At the moment, all my agents are sending to a central NXLog server which proxies the requests to logstash --> ES.
What would I gain by using a redis server in between my nxlog collector and logstash? To me, it seems pointless as nxlog has already good mem and disk buffers in case logstash is down.
What would I gain?
Thank you

On a heavy load : calling ES (HTTP) directly can be dangerous and you can have problems if ES break down .
Redis can handle More (Much more) Write request and send it in asynch logic to ES(HTTP).

I started using redis because I felt that it would separat the input and the filter part.
At least during periodes in which I change the configuration a lot.
As you know if you change the logstash configuration you have to restart the thing. All clients (in my case via syslog) are doomed to reconnect to the logstash daemon when he is back in business.
By putting an indexer in front which holds the relativly static input configuration and pusing everything to redis I am able to restart logstash without causing hickups throughout the datacenter.
I encountered some issues, because our developers hadn't found time (yet) to reduce the amount of useless logs send to syslog, thus overflowing the server. Before we had logstash they overflowed the disk space for logs - more general issue though... :)

When used with Logstash, Redis acts as a message queue. You can have multiple writers and multiple readers.
By using Redis (or any other queueing service) allows you to scale Logstash horizontaly by adding more servers to the 'cluster'. This will not matter for small operations but can be extremely useful for larger installations.

When using Logstash with Redis, you can configure Redis to only store all the log entries in memory which would like a in memory queue (like memcache).
You mat come to the point where the number of logs sent will not be processed by Logstash and it can bring down your system on constant basis (observed in our environment).
If you feel Redis is an overhead for your disk, you can configure it to store all the logs in memory until they are processed by logstash.

As we built our ELK infrastructure, we originally had a lot of problems with the logstash indexer (reading from redis). Redis would back up and eventually die. I believe this was because, in the hope of not losing log files, redis was configured to persist the cache to disk once in a while. When the queue got "too large" (but still within available disk space), redis would die, taking all of the cached entries with it.
If this is the best redis can do, I wouldn't recommend it.
Fortunately, we were able to resolve the issues with the indexer, which typically kept the redis queue empty. We set our monitoring to alert quickly when the queue did back up, and it was a good sign that the indexer was unhappy again.
Hope that helps.

Related

Unresponsive redis:works fine after FLUSHALL

We observered a sporadic issue (once in every month or 2) in our redis. Suddenly we see that redis read/write calls from our application returns a failure. At that point we are still able to connect to redis from redis-cli and doing a FLUSHALL resolves the issue.
Has anyone encountered this kind of issue. Our redis version is 4.0.9. Since we are not observing any issue in connecting to redis, we are not suspecting issue with Jedis Client.
What kind of redis calls? Something that writes the data or reads the data?
All in all, just as an idea, examine your redis.conf file
It has Maxmemory configuration directive, and when redis reaches max memory limit it acts according to eviction policy.
If the policy is set to 'noeviction', the behavior sounds like what you've described (from their documentation):
return errors when the memory limit was reached and the client is trying to execute commands that could result in more memory to be used (most write commands, but DEL and a few more exceptions).
You can configure redis as a cache so that it will delete some entries by itself.
Read about this here

Redis as a queue - Configuration review

I need to setup a Redis DB (2.8), which i suppose to use as a queue, which means that it's must be fully persistent (no message can be missed).
I'm pretty new with Redis, and i would like a get a review for my configuration:
I want to use both AOF and RDB persistence models, while always will be selected as appendfsync policy. According to their decontamination, always is not recommended, but i must select this option as i use Redis as a queue, and i can't endure any massages missing.
I would like a create a Master-Slave-Slave cluster using Sentinel with automatic failover.
Redis service will be automatically started after server boot.
Any kind of comments and suggestions will be great. The administration point of view is more important to me (persistence, backup, restore, high availability, etc).

Transferring results from Zookeeper to webserver

In my project I am calculating about 10-100mbs of data on a zookeeper worker. I then use HTTP PUT to transfer the data from the worker process to my webserver, which eventually gets delivered to the client. Is there anyway using Zookeeper or Curator to transfer that data or am I on my own to get the data out of the Worker process and onto a process outside my ensemble?
I wouldn't recommend to use Zookeeper to transfer data, especially of such relatively large size. It is not really designed to do it. Zookeeper works best when it used to synchronize distributed processes or to store some relatively small configuration data that is shared among multiple hosts.
There is a hard limit of 1 Mb per ZK node and if you try to push it to the limit, Zookeeper clients may get timeouts and go into disconnected state while Zookeeper service processes large chunk of data.

How to push a requestDto to Redis and have it persisted until it's been read?

I'd like to take a RequestDTO that has been POST'd to a ServiceStack service and push that to Redis with the built in messaging capabilities provided by ServiceStack.Server RedisMqServer. This message should be durable (if it was successfully pushed) in the sense that it should survive reboots of the Redis server (if that were to happen) and other unfortunate events. It should persist until it has been read/processed by another yet to be determined service.
Is this Pub/Sub, Request/Response, or just Request/no Response?
Thank you, Stephen
You need to ensure your Redis Server is configured to persist to disk, the docs on redis persistence describe how to configure Redis RDB Snapshots and support for Append Only File.
Important note about Append Only File in Redis:
The suggested (and default) policy is to fsync every second. It is both very fast and pretty safe. The always policy is very slow in practice (although it was improved in Redis 2.0) – there is no way to make fsync faster than it is.
In practice means there is a chance to lose data if redis-server process terminated unexpectedly.
ServiceStack's Rabbit MQ Server provides a more durable option for MQ Services which has true ACK support so a message is only removed from the MQ Broker when the client explicitly acknowledges to do so.

Redis PUBLISH/SUBSCRIBE limits

I'm considering Redis for a section of the architecture of a new project. It will consist of a lot of clients (node.js connections) SUBSCRIBING to particular keys with one process PUBLISHING to those keys as needed.
I'm curious about the limits of the PUBLISH/SUBSCRIBE commands and how to mitigate those. An obvious limit is the amount of file descriptors open on the machine with Redis so at some point I'll need to implement Master-Slave or Consistent Hashing to multiple Redis instances.
Does anyone have any solutions about how to scale this architecture with Redis' PubSub?
Redis PubSub scales really easily since the Master/Slave replication automatically publishes to all slaves.
The easiest way is to load balance the connections to node.js with for instance HAProxy, run a Redis slave on each webserver that syncs with a single master that publishes the messages.
I can't give you exact numbers since that greatly depends on the underlying system, but this should scale extremely well. And you don't need to manage the clients and which server they connect to manually. You obviously need some way to handle session state, so you might need to do that anyway, but that's a lot easier to do in the load balancer than in your application.