Background of problem
Hello all , i have made a project in golang gin , and I have integrated the redis Clusterclient in it using "github.com/go-redis/redis/v7"
P.S. Redis that I am using is a redis cluster hosted on AWS
the redis commands that I am using are simply redis.Get and redis.Set only.
Now I have made one API and used caching in it, and when I run it locally, response times are around 200 to 300ms, which is awesome (thanks to Redis)
Main Problem
now when I start doing the load testing on the same API with around 100 concurrent users , response time gets significantly increased ( around 4 seconds). I used spans to monitor the time taken by a different part of the code, and I got this
Getting from primary, getting from secondary are for the redis.Get command
Setting the primary , setting the secondary are for redis.Set
both commands are taking around 1 sec to execute, which is unacceptable,
can anyone please tell me some way, so that I can tackle this problem
and reduce the time for the redis commands to execute
Ok so I have solved this somehow.
Firstly I have updated my golang redis client library from go-redis/v7 to go-redis/v8 . And it made a significant improvement. I will advise everyone to do the same.
But still I am suffering from high response time , so the next step for me wa sto change the redis infra. Earlier I was using a redis cluster which only had 1 shard, but now I have moved to another redis having 4 shards..
And it made a huge difference , my response goes from 1200ms to 180ms. Kindly note that these response time are coming when I am doing a load testing with 100 concurrent users with an average of about 130rps
So in short upgrade your redis client , upgrade your redis infra
Related
I have a single redis pod running in my k8s cluster, and I would like to get an idea of how many requests per second my redis server is currently handling in my production environment. I have tried redis-cli monotor, which prints out live requests on the console but I cannot seem to find a way to get a numerical measure that simply tells me something like "redis server is handling x request per second on average in the past 24 hours". Any pointers would be highly appreciated.
We have redis cluster which holds more than 2 million and these keys has been updated with the time interval of 1 minute. Now we have a requirement to take the snapshot of the redis db in a particular interval For eg every 10 minute. This snapshot should not pause the redis command execution.
Is there any async way of taking snapshot from redis ?
It would be really helpful if we get any suggestion on open source tools or frameworks.
The Redis BGSAVE is async and takes a snapshot.
It calls the fork() function of the OS. According to the Redis manual,
Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great
Two million updates in one minutes, that is 30K+ QPS.
So you really have to try it out, run the benchmark that similutes your business, then issue BGSAVE, monitor the I/O and CPU usage of your system, and see if there's a spike in your redis calling latency.
Then issue LASTSAVE, which will tell you when your last success snapshot happened. So you can adjust your backup schedule.
Our Pipeline:
VMware-Netflow -> Logstash -> Redis -> Logstash-indexer -> 3xElastic
Data I have gathered:
I notiticed in kibana that the flows coming in were 1 hour old, then
2, then 3 and so on.
Running 'redis-cli llen netflow' shows a very large number that is slowly increasing.
Running 'redis-cli INFO shows pretty constant input at 80kbps and output at 1kbps. I would think these should be near equal.
The cpu load on all nodes is pretty negligible.
What I've tried:
I ensured that the logstash-indexer was sending to all 3 elastic nodes.
I launched many additional logstash instances on the indexers, redis now shows 40 clients.
I am not sure what else to try.
TLDR: rebooted all three elasticsearch nodes, and life is good again.
I inadvertently disabled elasticsearch as an output, and sent my netflows into the ether. The queue size in redis dropped down to 0 in minutes. Although sad, this did prove that it was elasticsearch not logstash or redis.
I watched the elastic instances, and it seemed like something was wrong with the communication between them. All three showed logs indicating that 2/3 were dropping out of the cluster, and taking forever to respond to cluster pings. What I think was happening, is writes were accepted by elastic, and just bounced around a while before being written successfully.
Upon rebooting them all, they negotiated correctly, and writes are happening as they should.
I'm using ElastiCache Redis and storing small piece of data (~5-10MB) in it. Everything works perfect for a while and then suddenly it responds lot longer than usually (like 2000ms instead of 100ms). Most of actions that I'm doing is simple select single entry from Redis and then providing it to client. I noticed this problem only in benchmarks, not in real usage.
According to Google and StackOverflow it can be related to Redis Persistence, but I found that persistence is disabled in group options of ElastiCache.
I used redis-stat to monitor stuff in Redis, and seems like there are regular CPU usage spikes by system every n-minutes.
Anyone knows what kind of thing can cause such problem?
I'm interested in SignalR + Redis solution for implementing a server application that is scalable. And my concern is that Redis cluster is not production ready yet! So my question is:
Is Redis a bottleneck in SignalR + Redis when it comes to scaling out? If it is, is there any Linux-based solution that solves the problem?
On a single redis server you can easily handle up to 10K concurrent clients using pubsub. If you are still evaluating what to use, this should be more than you need at your current stage.
Redis cluster is supposed to be production ready by the end of the year or early 2014. You can actually download it and try it already. Lots of people are using it now and reporting the odd bug. The creator of redis is focused on making the cluster work and as of now it is very mature.
By using the proxy you could have up to 1000 nodes simultaneously, with over 10K clients on pubsub, so 10 million of concurrent users. The limit of the cluster is theoritecally of 16384 nodes, but a maximum of 1000 is recommended right now.
Unless you are of facebook scale, you can probably use redis for your case use (and even when you are twitter scale, given twitter uses redis intensively for storing all the timelines on redis)
I've been asked to add some references on a comment, so here you are the relevant links:
On the number of concurrent connections per redis process http://redis.io/topics/clients
On how twitter is using redis http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html
On cluster size/specs http://redis.io/topics/cluster-spec
Is Redis a bottleneck in SignalR + Redis when it comes to scaling out? If it is, is there any Linux-based solution that solves the problem?
I don't think so. Check the below article on how to scale out using Redis
http://www.asp.net/signalr/overview/performance-and-scaling/scaleout-with-redis