since redis is single-threaded, then our concurrent requests become serialized requests when accessing redis. What is the significance of using redis? - redis

We usually use redis for caching in the Spring‘s project. My problem is that since redis is single-threaded, then our concurrent requests become serialized requests when accessing redis. then,what is the significance of using redis?
Is it only because of "It's not very frequent that CPU becomes your bottleneck with Redis, as usually Redis is either memory or network bound.
......
using pipelining Redis running on an average Linux system can deliver even 1 million requests per second......
"?
I am learning redis, Redis document FAQ

You've basically asked two questions in one question:
What is the significance of using Redis.
Well, Redis is known to be fast because it keeps the data in memory. If you ask whether being a single-threaded application is very restrictive - well, its a product, that works like this by design, maybe it could be even more performant if it was multithreaded, it depends on actual implementation under the hood after all.
In any case, it offers much more than just a "get data in memory":
- Many primitives to work with
- Configurable persistence
- Replication of data
And much more
If the question is whether the in-memory cache will be faster (you've mentioned Spring framework, so you're at Java Land) - then yes.
In fact, Spring Cache support Guava Cache (spring 5/spring boot 2 use Caffeine for the same purpose instead) - and yes it will be faster in a head-to-head comparison with Redis. But what if you have a distributed application with many instances and one instance calculated something and put it to cache, how do you get the same information from another instance without distributing the information between the instances. Well, there are tools like Hazelcast but it's out of scope for this question, the point is that when the application is beyond basic, the tasks like cache synchronization /keeping it up-to-date becomes much less obvious.
If you can deliver 1 million operations per second.
Now this question is too vague to answer:
What is the hardware that runs Redis?
What are the network configurations? (after all Redis calls are done over the network)
How often do you persist on disk (Redis has configurations for that)
Do you use replication and split the load between many Redis servers reaching an overall much faster throughput?
What commands exactly are being running under that hood?
In any case, when it comes to benchmarking you can set up your system in the option way and use the tool offered by Redis itself:
Redis Benchmarking Chapter in Redis tutorial
The tool is called redis-benchmark you can run it with various parameters and see how fast redis really is:
Here is an example (I encourage you to read the full article in the link):
$ redis-benchmark -t set,lpush -n 100000 -q
SET: 74239.05 requests per second
LPUSH: 79239.30 requests per second
This says: Connect to redis server available on localhost, run (-n) 100000 requests in a quiet mode (-q parameter) and run only tests specific for two commands: set and lpush

Related

Evcache vs redis

I have read that netflix uses evcache , which is a wrapper over memcache and evcache proves better than memcache
In general it is said that redis server as a better cache than memcache, was trying to find the comparisons of redis and evcache.
Does redis scale as well as evcache or memcache? I am assuming that evcache scaling is tried and tested (hence works good for netflix)
EVCache is a functionality add wrapper over memcache. It is an application that Netflix devs wrote to add functionality they need in their cache layer while using memcache as the underlying data store. You can write your own EVCache to use redis as the data store
Comparing redis to Evcache is not the correct comparison as they operate on two different layers.
Does redis scale as well as evcache or memcache?
Redis can scale to many hundreds of thousands of requests per second.
In general, redis is preferred over memcache because of its many in built data structures
Redis is single threaded so once CPU usage hits 80+% it is better to run another instance instead of giving it a bigger server

Redis: Efficient cluster of servers for large key set

I have a very large set of keys, 200M keys, with small values, <100 bytes, to store and I'm trying to use Redis. The problem is such that I have 10 Redis DB to split the keys over, but currently I'm on a single server with those 10 Redis DB. By a Redis DB I mean using SELECT. From my calculations it looks like I'm going to blow out memory. I think I'll need over 4TB of memory for this case! What are my options? First, my calculation is based on 10000 keys with 100 byte values taking 220MB of RAM (this is from a table I found). So simply put (2*10^8 / 10^4) * 220MB = 4.4TB.
If my calculation looks correct, what are my options? I've read on different posts that Redis VM is no longer an option. Can I use a Redis cluster? This still appears to require too many servers to be practical. I understand I could switch to another DB, but I'd like that to be the last resort option.
Firstly, using shared databases (i.e. the SELECT command) isn't a recommended practice since all of these databases are essentially managed by the same Redis process. It is preferable having 10 separate Redis processes (even on the same server) in order to avoid contention (more info here).
Next, there are ways to reduce the memory footprint of your database. You could, for example, perform client-side compression (see here) or consider other optimizations such as using Hashes to keep multiple values (as described here).
That said, a Redis server is ultimately bound by the amount of RAM that the host provides. Once you've reached that limit you'll need to shard your database and use a Redis cluster. Since you're already using multiple databases this shouldn't pose a big challenge as your code should already be compatible with that to a degree. Sharding can be done in one of three approaches: client, proxy or Redis Cluster. Client-side sharding can be implemented in your code or by the Redis client that you're using (if the client library that you're using supports that). Redis Cluster (v3) is expected to be released in the very near future and already has a stable release candidate. As for proxy-based sharding, there are several open source solutions out there, including Twitter's twemproxy, Netflix's dynomite and codis. Additional information about sharding and partitioning can be found here.
Disclaimer: I work at Redis Labs. Lastly, AFAIK there's only one Redis-as-a-Service provider that already provides built-in support for clustering Redis. Redis Labs' Redis Cloud is a fully-managed service that can scale seamlessly to any required capacity. Our clusters support both the '{}' hashtag standard as well as sharding by RegEx - more about this can be found here.
You can use LMDB with Dynomite to store data beyond your memory capacity. LMDB uses both disk and memory to store data. Dynomite make LMDB to be distributed.
We have done a POC with this combo and they work nicely together.
For more information, please check out our open issue here:
https://github.com/Netflix/dynomite/issues/254

Does Redis persist data?

I understand that Redis serves all data from memory, but does it persist as well across server reboot so that when the server reboots it reads into memory all the data from disk. Or is it always a blank store which is only to store data while apps are running with no persistence?
I suggest you read about this on http://redis.io/topics/persistence . Basically you lose the guaranteed persistence when you increase performance by using only in-memory storing. Imagine a scenario where you INSERT into memory, but before it gets persisted to disk lose power. There will be data loss.
Redis supports so-called "snapshots". This means that it will do a complete copy of whats in memory at some points in time (e.g. every full hour). When you lose power between two snapshots, you will lose the data from the time between the last snapshot and the crash (doesn't have to be a power outage..). Redis trades data safety versus performance, like most NoSQL-DBs do.
Most NoSQL-databases follow a concept of replication among multiple nodes to minimize this risk. Redis is considered more a speedy cache instead of a database that guarantees data consistency. Therefore its use cases typically differ from those of real databases:
You can, for example, store sessions, performance counters or whatever in it with unmatched performance and no real loss in case of a crash. But processing orders/purchase histories and so on is considered a job for traditional databases.
Redis server saves all its data to HDD from time to time, thus providing some level of persistence.
It saves data in one of the following cases:
automatically from time to time
when you manually call BGSAVE command
when redis is shutting down
But data in redis is not really persistent, because:
crash of redis process means losing all changes since last save
BGSAVE operation can only be performed if you have enough free RAM (the amount of extra RAM is equal to the size of redis DB)
N.B.: BGSAVE RAM requirement is a real problem, because redis continues to work up until there is no more RAM to run in, but it stops saving data to HDD much earlier (at approx. 50% of RAM).
For more information see Redis Persistence.
It is a matter of configuration. You can have none, partial or full persistence of your data on Redis. The best decision will be driven by the project's technical and business needs.
According to the Redis documentation about persistence you can set up your instance to save data into disk from time to time or on each query, in a nutshell. They provide two strategies/methods AOF and RDB (read the documentation to see details about then), you can use each one alone or together.
If you want a "SQL like persistence", they have said:
The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.
The answer is generally yes, however a fuller answer really depends on what type of data you're trying to store. In general, the more complete short answer is:
Redis isn't the best fit for persistent storage as it's mainly performance focused
Redis is really more suitable for reliable in-memory storage/cacheing of current state data, particularly for allowing scalability by providing a central source for data used across multiple clients/servers
Having said this, by default Redis will persist data snapshots at a periodic interval (apparently this is every 1 minute, but I haven't verified this - this is described by the article below, which is a good basic intro):
http://qnimate.com/redis-permanent-storage/
TL;DR
From the official docs:
RDB persistence [the default] performs point-in-time snapshots of your dataset at specified intervals.
AOF persistence [needs to be explicitly configured] logs every write operation received by the server, that will be played again at server startup, reconstructing the
original dataset.
Redis must be explicitly configured for AOF persistence, if this is required, and this will result in a performance penalty as well as growing logs. It may suffice for relatively reliable persistence of a limited amount of data flow.
You can choose no persistence at all.Better performance but all the data lose when Redis shutting down.
Redis has two persistence mechanisms: RDB and AOF.RDB uses a scheduler global snapshooting and AOF writes update to an apappend-only log file similar to MySql.
You can use one of them or both.When Redis reboots,it constructes data from reading the RDB file or AOF file.
All the answers in this thread are talking about the possibility of redis to persist the data: https://redis.io/topics/persistence (Using AOF + after every write (change)).
It's a great link to get you started, but it is defenently not showing you the full picture.
Can/Should You Really Persist Unrecoverable Data/State On Redis?
Redis docs does not talk about:
Which redis providers support this (AOF + after every write) option:
Almost none of them - redis labs on the cloud does NOT provide this option. You may need to buy the on-premise version of redis-labs to support it. As not all companies are willing to go on-premise, then they will have a problem.
Other Redis Providers does not specify if they support this option at all. AWS Cache, Aiven,...
AOF + after every write - This option is slow. you will have to test it your self on your production hardware to see if it fits your requirements.
Redis enterpice provide this option and from this link: https://redislabs.com/blog/your-cloud-cant-do-that-0-5m-ops-acid-1msec-latency/ let's see some banchmarks:
1x x1.16xlarge instance on AWS - They could not achieve less than 2ms latency:
where latency was measured from the time the first byte of the request arrived at the cluster until the first byte of the ‘write’ response was sent back to the client
They had additional banchmarking on a much better harddisk (Dell-EMC VMAX) which results < 1ms operation latency (!!) and from 70K ops/sec (write intensive test) to 660K ops/sec (read intensive test). Pretty impresive!!!
But it defenetly required a (very) skilled devops to help you create this infrastructure and maintain it over time.
One could (falsy) argue that if you have a cluster of redis nodes (with replicas), now you have full persistency. this is false claim:
All DBs (sql,non-sql,redis,...) have the same problem - For example, running set x 1 on node1, how much time it takes for this (or any) change to be made in all the other nodes. So additional reads will receive the same output. well, it depends on alot of fuctors and configurations.
It is a nightmare to deal with inconsistency of a value of a key in multiple nodes (any DB type). You can read more about it from Redis Author (antirez): http://antirez.com/news/66. Here is a short example of the actual ngihtmare of storing a state in redis (+ a solution - WAIT command to know how much other redis nodes received the latest change change):
def save_payment(payment_id)
redis.rpush(payment_id,”in progress”) # Return false on exception
if redis.wait(3,1000) >= 3 then
redis.rpush(payment_id,”confirmed”) # Return false on exception
if redis.wait(3,1000) >= 3 then
return true
else
redis.rpush(payment_id,”cancelled”)
return false
end
else
return false
end
The above example is not suffeint and has a real problem of knowing in advance how much nodes there actually are (and alive) at every moment.
Other DBs will have the same problem as well. Maybe they have better APIs but the problem still exists.
As far as I know, alot of applications are not even aware of this problem.
All in all, picking more dbs nodes is not a one click configuration. It involves alot more.
To conclude this research, what to do depends on:
How much devs your team has (so this task won't slow you down)?
Do you have a skilled devops?
What is the distributed-system skills in your team?
Money to buy hardware?
Time to invest in the solution?
And probably more...
Many Not well-informed and relatively new users think that Redis is a cache only and NOT an ideal choice for Reliable Persistence.
The reality is that the lines between DB, Cache (and many more types) are blurred nowadays.
It's all configurable and as users/engineers we have choices to configure it as a cache, as a DB (and even as a hybrid).
Each choice comes with benefits and costs. And this is NOT an exception for Redis but all well-known Distributed systems provide options to configure different aspects (Persistence, Availability, Consistency, etc). So, if you configure Redis in default mode hoping that it will magically give you highly reliable persistence then it's team/engineer fault (and NOT that of Redis).
I have discussed these aspects in more detail on my blog here.
Also, here is a link from Redis itself.

Redis configuration for production

I'm developing project with redis.My redis configuration is normal redis setup configuration.
I don't know how should I do redis configuration? Master-Slave? Cluster?
Do you have anything suggestion redis configuration for production?
Standard approach would be to have one master and at least one slave. Depending on your I/O requirements and number of ops/sec, you can always have multiple read-only slaves. Slaves can be read from but not written to. So you'll want to design your application to take advantage of doing round-robin requests to the slaves and writes only to the single master.
Depending on your data storage/backup requirement, you can set fsync for append-only mode to be every second. So while this means you can lose up to one second worth of data, it's really much less than that because your slaves serve as hot backups, and they will have the data within milliseconds.
You'll at least want to do a BGSAVE every hour to get a dump.rdp produced. You can then save this file live while the server is still running, and store it to some off-site backup facility.
But if you're just using Redis as a standard memcache replacement and don't care about data, then you can ignore all of this. Much of it will be changing in Redis Cluster in the 3.0 version.
It depends on what your Read/Writes requirements are. Could you give us more informations on that matter ?
I think 10,000 people use instant my application.I persist member login token on redis.It's important for me.If I don't write redis, member don't login on application.
Even a Redis single instance will be enough to process 10K users (start redis-bench to the throughput available), so just to be sure use a Master/Slave configuration with autopromotion of the slave if the master goes down.
Since you want persistence, use RDB (maybe along with AOF), see this topic on Redisio.

Sharing a Redis database?

I'm using Redis as a session store in my app. Can I use the same instance (and db) of Redis for my job queue? If it's of any significance, it's hosted with redistogo.
It is perfectly fine to use the same redis for multiple operations.
We had a similar use case where we used Redis as a key value store as well as a job queue.
However you may want to consider other aspects like the performance requirements for your application. Redis can ideally handle around 70k operations per second and if at some time in future you think you may hit these benchmarks it's much better to split your operations to multiple redis instances based on the kind of operations you perform. This will allow you to make decisions about availability and replication at a more finer level depending on the requirements. As a simple use case once your key size grows you may be able to flush your session app redis or shard your keys using redis cluster without affecting job queing infrastructure.