I am interested in using Redis to store a customer's session on the server side for authorization. Basically, when a customer logs in a hash will be stored in a Redis cluster and the key returned to the client. On each request, the client will pass the key as a header and the service will check that the hash still exists in Redis, if it doesn't then an error message will be returned. This key will expire after X minutes resulting in any requests using that key to fail. However, I have been reading online that some people experienced issues because of the way the expiration is replicated to slaves. Slaves only expire a key when they receive a del command from the master so if a "get" is made on a slave before this command, the value at that key will be returned.
https://github.com/antirez/redis/issues/187
Does this issue still exist? It seems like a big issue to me and would create a bit of a security hole. Maybe not a big deal for stale data but when using for authorization it is a big deal
A) no, not really — since 2014, a GET of an expired key will return "not found" on a slave even if the slave hasn't yet received a DEL from the replication stream. The outstanding issue has to do with EXISTS being inconsistent with GET, which only matters if you rely on the output of the EXISTS command.
B) Completely independent of this issue, the possibility of replication lag always exists. The security of your app shouldn't depend on the premise that replicas are always up-to-date.
Related
What does redis mean by "they'll still take the full state of the expires existing in the dataset. Can someone explain what does this actually mean in more layman terms?
Below is the extract from how redis expires keys
https://redis.io/commands/expire/#how-redis-expires-keys
"However while the replicas connected to a master will not expire keys independently (but will wait for the DEL coming from the master), they'll still take the full state of the expires existing in the dataset"
Means exactly what it says. The replica has replicated all of the EXPIRE / EXPIREAT / SET .. EX / etc. commands, so it knows the expiration times of every key; it just doesn't act on them directly unless it becomes the master. Instead, it waits until it hears that the master has expired a given key to delete it from its own dataset.
Backstory: The keyspace of the Redis database in question reports a large amount of expired keys and memory usage is maxed out. The application using this database is experiencing (rare) intermittent timeouts and I thought (in my limited knowledge) perhaps it is because Redis is having to eject expired keys each time a new key is created.
So to my question: how do I tell Redis to remove all the expired keys?
Secondarily -- is it possible to access/see expired keys with redis-cli?
Here's a slice of the INFO I'm looking at:
maxmemory_policy:allkeys-lru
expired_keys:24326586
evicted_keys:134022997
keyspace_hits:2684031719
keyspace_misses:186380210
slave_expires_tracked_keys:0
active_defrag_key_hits:0
active_defrag_key_misses:0
db2:keys=12994468,expires=3193,avg_ttl=1891176
Answer for myself, posterity, and any other Redis newbies out there. I was looking at the wrong "database". I was under the WRONG impression that Redis only had single table but looking at my question you see "db2". I searched into that and found that Redis can have up to 16 databases identified by a zero-based index. In this case:
SELECT 2
That selects "db2" and now doing a DBSIZE gives a more accurate output.
Oye -- so the problem is that the keys are still there! Otherwise when Redis expires a key it deletes it.
Whoops! I'm leaving my question because someone else might think to ask the same thing and be on the wrong route.
I want to use Redis for a particular use case. I am not sure to go with a Redis Cluster or with Twemproxy + Sentinel.
I know the Cluster is a winner any day. I am just skeptical due to the MOVED responses. In case of MOVED responses, the client will connect another node and in case of resharding, it may have to connect another again. But in case of Twem, it knows where the data is residing, so it will never get a MOVED response.
There are different problems with Twem, like added hop, may increase overall turnaround time, problem with adding new nodes or if it ejects some nodes out, it won't be able to serve the requests for the keys present on that node. Extra maintenance headache as in, having sentinels for my Redis instances and mechanism for HA of twem itself.
Can anyone suggest me, should I go with Twem or Cluster? I am thinking of going with Twem as I will not be going to and fro in case of MOVED responses. But I am skeptical about it, considering the above mentioned concerns.
P.S. I am planning to using Jedis client for Redis (if that helps).
First of all, I'm not familiar with Twemproxy, so I'll only talk about your concerns on Redis Cluster.
Redis client can get the complete slot-node mapping, i.e. the location of keys, from Redis Cluster. It can cache the mapping on the client side, and sends request to the right node. So most of the time, it won't be redirected, i.e. get the MOVED message.
However, if you add/delete node or reshard the data set, client will receive MOVED message, since it still uses the old mapping. In this case, client can update its local cache, and any subsequent requests will be sent to the right node, i.e. no MOVED message any more.
A decent client library can take the above optimization to make it more efficient. So if your client library has this optimization, you don't need to worry about the MOVED penalty.
I have an application that runs a single Membase server (1.7.1.1) that I use to cache data I'd otherwise fetch from our central SQL Server DB. I have one default bucket associated to the Membase server, and follow the traditional data-fetching pattern of:
When specific data is requested, lookup the relevant key in Membase
If data is returned, use it.
If no data is returned, fetch data from the DB
Store the newly returned data in Membase
I am looking to add an additional server to my default cluster, and rebalance the keys. (I also have replication enabled for one additional server).
In this scenario, I am curious as to how I can use the current pattern (or modify it) to make sure that I am not getting data out of sync when one of my two servers goes down in either an auto-failover or manual failover scenario.
From my understanding, if one server goes down (call it Server A), during the period that it is down but still attached to the cluster, there will be a cache key miss (if the active key is associated to Server A, not Server B). In that case, in the data-fetching pattern above, I would get no data returned and fetch straight from SQL Server. But, when I attempt to store the data back to my Membase cluster, will it store the data in Server B and remap that key to Server B on the next fetch?
I understand that once I mark Server A as "failed over", Server B's replica key will become the active one, but I am unclear about how to handle the intermittent situation when Server A is inaccessible but not yet marked as failed over.
Any help is greatly appreciated!
That's a pretty old version. But several things to clarify.
If you are performing caching you are probably using a memcached bucket, and in this case there is no replica.
Nodes are always considered attached to the cluster until they are explicitly removed by administrative action (autofailover attempts to automate this administrative action for you by attempting to remove the node from the cluster if it's determined to be down for n amount of time).
If the server is down (but not failed over), you will not get a "Cache Miss" per se, but some other kind of connectivity error from your client. Many older memcached clients do not make this distinction and simply return a NULL, False, or similar value for any kind of failure. I suggest you use a proper Couchbase client for your application which should help differentiate between the two.
As far as Couchbase is concerned, data routing for any kind of operation remains the same. So if you were not able to reach the item on Server A. because it was not available, you will encounter this same issue upon attempting to store it back again. In other words, if you tried to get data from Server A and it was down, attempting to store data to Server A will fail in the exact same way, unless the server was failed over between the last fetch and the current storage attempt -- in which case the client will determine this and route the request to the appropriate server.
In "newer" versions of Couchbase (> 2.x) there is a special get-from-replica command available for use with couchbase (or membase)-style buckets which allow you to explicitly read information from a replica node. Note that you still cannot write to such a node, though.
Your overall strategy seems very sane for a cache; except that you need to understand that if a node is unavailable, then a certain percentage of your data will be unavailable (for both reads and writes) until the node is either brought back up again or failed over. There is no
The documentation about Redis Keyspace Notifications http://redis.io/topics/notifications says near its end, that a key with timeout is removed from the database
"When the key is accessed by a command and is found to be expired."
..
Question: Is it enough to retrieve the very key, e.g. via KEYS *, or do I have to access the content the key refers to?
Background: The second process I omitted (the .. above) is a probabilistic process, and the real deletion of an expired key may be delayed, and thus the delivery of the EXPIRED event. I want to ensure the notification is given to a subscriber, so just accessing the keys would be easiest.
Redis implements a logic of periodic checking of keys for expiry and picks a number (100) of keys and checks them for their expiry.
What I understand is that your concerned with the fact that with above logic there would exist events which belong to expired keys which have not been deleted.
To avoid such a case checking the keys just for existence would delete them. Cost of REDIS calls should be kept in mind and hence a LUA script or bulk command should be designed which is invoked periodically and iterates a list of keys and run EXISTS command on them and cause automatic delete if they are expired.
To test this you would need a large dataset.