Redis occupies more memory - redis

OS: RHEL 7.6
Cluster set up:
Single node - 6 instances of redis - 3 master and 3 slaves.
16384 slots shared between three masters.
Sample data:
smembers 201904138
1) "0"
2) "1"
3) "2"
4) "3"
5) "4"
Each set contains 5 ids. I have such 2.49Million keys.
Sample size occupied by each entry:
memory usage 201904138
(integer) 76
:7001> memory usage 201904132
(integer) 76
:7001> memory usage 201904134
(integer) 76
:7001> dbsize
(integer) 2489174
So logically it should occupy 2.49M * 76 = 189MB. I do understand that it stores some extra information as well.
But total memory occupied by this cluster is Memory = 367M, RSS=389M
Why it is double than the original data? how can i reduce it?
Please help.

Related

Redis HyperLogLog - Too many errors

The scenario is really simple. I'm adding 50 elements (different each time) to a HLL. Usually at the third time, I get a wrong PFCOUNT (151 instead of 150). I know that the HLL has a low error rate but is it so easy to get a false positive ? can this error be handled?
Thanks in advance
Here's the logs.
127.0.0.1:6379> PFADD test DaG4yPCb vrTDeJde SCcK4rvG K0UJPxeT s1RtvWyf EpkUaxhY y4ot0BQW vt13T2eS 5rFe0TKj yXm25gXb 4nnw8YYy Fnqdb4C6 rwuPLUyC W9uS0az7 koOtrENo hIjAa00k eT3VvI7Q zQVhYnYY 1Cshhbbk 8q3B82gH NWlnW5QH fbNYBXoy 4ti95TeI TiUyXs0W TAepHjdd CK26UGuC ESt9opXO ihYIo1L9 0XqFKx8x coh31ZxE 01G7eCjb wJZYByUo ZHfJIKoQ tFGPsdgZ 19DUQvNX 20QtyIVq Xjx4wT9z nJazaXtH cHEqmQjZ hz8j0uhT hpeygfWk hWBf44rU iUJbsPSY nIYDiV80 FgaEU3pI 7EEkDGY6 tPF0KHFM twVbY3wR xFpEg4jP 4JEW0pue
127.0.0.1:6379> PFCOUNT test
(integer) 50
127.0.0.1:6379> PFADD test elapxije pbjtcvbg pjoiaarc pogpnjqd ujzfiuyu kykxhqpl hnkwmwpq gljpsnwu rlnflrdb wexqthqe hwbcgbvt yjdddtpo lnkqcoaz tcjgnxme aiflckyh rfsmwzgw eooownar pkvhdwae tywuoxgv mojqkmqd gepsxhqj cbgrmzih jkormrfk irasppno mmealsye fdumtspr anisssut tuqlufyr coqebpyn zijsoauj akvcvkda jruskmma kalinqpr lsazgswh ozyajcpm edvodqnt befvtsbx bcaurnjh psgdgval pyktekgo kucfjnov xruaulrl rrwqzjac ppbbhdhz iohaeoiq fbztqesn zsfnxzsa masqfqjo fsybqced xzfdhtzv
(integer) 1
127.0.0.1:6379> PFCOUNT test
(integer) 100
127.0.0.1:6379> PFADD test hukqyega olgswnll ufzjkscd oygfsgdu bttlwivr xrvtjsfc criuaabz idxilrvd kitvpuzb ehwrvcip ljthitya clgciaex bagxomaq ziszyehx uuhytedx xycrfcgf nmbnxkav ylxxyyrp rfwniodp vezvqefz gomrekbf tirdnpbp fpbokjjz dwppiomo zgypqxyh kavukjeb wsomngmh oawosnvf tinruzjc bbfqchbn airifskr dqcaznzt vnpfejep jmdlwbek eubhstbo iamgnktp gfojfegy hvmbszlu poauswtc tdgozdfy cxdsprqo pjsuxult nctztxwb fbayirlw dcitezyn zufryoro tisxdwtn mmgztjie vykdkvwm dqogmhnm
(integer) 1
127.0.0.1:6379> PFCOUNT test
(integer) 151
From https://redis.io/commands/PFCOUNT
The returned cardinality of the observed set is not exact, but approximated with a standard error of 0.81%.
In your case it is 1/150~=0.67% which is well within the documented standard error.

Redis `SCAN`: how to maintain a balance between newcomming keys that might match and ensure eventual result in a reasonable time?

I am not that familiar with Redis. At the moment I am designing some realtime service and I'd like to rely on it. I expect ~10000-50000 keys per minute to be SET with some reasonable EX and match over them using SCAN rarely enough not to bother with performance bottlenecks.
The thing I doubt is "in/out rate" and possible overflooding with keys that might match some SCAN query and thus it never terminates (i.e. always replies with latest cursor position and forces you to continue; that could happen easily if one consumes x items per second and there are x + y items per second coming in with y > 0).
Obviously, I could set desired SCAN size long enough; but I wonder if there exists a better solution or does Redis itself guarantees that SCAN will grow size automatically in such a case?
First some context, solution at the end:
From SCAN command > Guarantee of termination
The SCAN algorithm is guaranteed to terminate only if the size of the
iterated collection remains bounded to a given maximum size, otherwise
iterating a collection that always grows may result into SCAN to never
terminate a full iteration.
This is easy to see intuitively: if the collection grows there is more
and more work to do in order to visit all the possible elements, and
the ability to terminate the iteration depends on the number of calls
to SCAN and its COUNT option value compared with the rate at which the
collection grows.
But in The COUNT option it says:
Important: there is no need to use the same COUNT value for every
iteration. The caller is free to change the count from one iteration
to the other as required, as long as the cursor passed in the next
call is the one obtained in the previous call to the command.
Important to keep in mind, from Scan guarantees:
A given element may be returned multiple times. It is up to the
application to handle the case of duplicated elements, for example
only using the returned elements in order to perform operations that
are safe when re-applied multiple times.
Elements that were not
constantly present in the collection during a full iteration, may be
returned or not: it is undefined.
The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.
Using DBSIZE or INFO keyspace command you can get how many keys you have at any time:
> DBSIZE
(integer) 200032
> info keyspace
# Keyspace
db0:keys=200032,expires=0,avg_ttl=0
Another source of information is the undocumented DEBUG htstats index, just to get a feeling:
> DEBUG htstats 0
[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 262144
number of elements: 200032
different slots: 139805
max chain length: 8
avg chain length (counted): 1.43
avg chain length (computed): 1.43
Chain length distribution:
0: 122339 (46.67%)
1: 93163 (35.54%)
2: 35502 (13.54%)
3: 9071 (3.46%)
4: 1754 (0.67%)
5: 264 (0.10%)
6: 43 (0.02%)
7: 6 (0.00%)
8: 2 (0.00%)
[Expires HT]
No stats available for empty dictionaries
The table size is the power of 2 following your number of keys:
Keys: 200032 => Table size: 262144
The solution:
We will calculate a desired COUNT argument for every scan.
Say you will be calling SCAN with a frequency (F in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T in s). So you want this finished in N = F*T calls, N = 50 in this example.
Before your first scan, you know your current progress is 0, so your remaining percent is RP = 1 (100%).
Before every SCAN call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of a DBSIZE call), you call DBSIZE to get the number of keys K.
You will use COUNT = K*RP/N
For the first call, this is COUNT = 200032*1/50 = 4000.
For any other call, you need to calculate RP = 1 - ReversedCursor/NextPowerOfTwo(K).
For example, let say you have done 20 calls already, so now N = 30 (remaining number of calls). You called DBSIZE and got K = 281569. This means NextPowerOfTwo(K) = 524288, this is 2^19.
Your next cursor is 14509 in decimal = 000011100010101101 in binary. As the table size is 2^19, we represent it with 18 bits.
You reverse the bits and get 101101010001110000 in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:
RP = 1 - ReversedCursor/NextPowerOfTwo(K) = 1 - 185456 / 524288 = 0.65 or 65%
So you have to adjust:
COUNT = K*RP/N = 281569 * 0.65 / 30 = 6100
So in your next SCAN call you use 6100. Makes sense it increased because:
The amount of keys has increased from 200032 to 281569.
Although we have only 60% of our initial estimate of calls remaining, progress is behind as 65% of the keyspace is pending to be scanned.
All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor PM (percent of matches) to the COUNT calculation.
COUNT = PM * K*RP/N
PM = keysFound / ( K * ReversedCursor/NextPowerOfTwo(K))
If after 20 calls, you have found only keysFound = 2000 keys, then:
PM = 2000 / ( 281569 * 185456 / 524288) = 0.02
This means only 2% of the keys are matching our pattern so far, so
COUNT = PM * K*RP/N = 0.02 * 6100 = 122
This algorithm can probably be improved, but you get the idea.
Make sure to run some benchmarks on the COUNT number you'll use to start with, to measure how many milliseconds is your SCAN taking, as you may need to moderate your expectations about how many calls you need (N) to do this in a reasonable time without blocking the server, and adjust your F and T accordingly.

Sorted set of a fixed size in Redis?

I'm a rookie in using Redis, and recently have a problem when I'm thinking the solution to handle the case of high concurrency in out system, I want to use Redis, everybody know access from In-Memory is great faster than IO. Redis sorted set is the possible tool for use to do it, we want to a fixed size of sorted set to contain the user's mobile number, I Googled/Baidued a lot, didn't find any meaning message, so can anybody tell me How to specify Redis Sorted Set a fixed size? And set should tell me any add operation to sorted set is success or not?
Thanks
I don't think you can specify a size, that you will have to check it yourself.
You can use ZCARD
ZCARD KEY
Or you could just remove the first element ZREMRANGEBYRANK
[ZREMRANGEBYRANK][2] [KEY] 0 -[YOURSIZE]
This can be done with a lua script by checking the set size (ZCARD) before adding the element (ZADD). Complexity is O(log(|setsize|)) :
-- KEYS[1] the sorted set
-- ARGV[1] the score
-- ARGV[2] the member
-- ARGV[3] the max size of the sorted set
-- Returns number of elements added
--
if redis.call('ZCARD', KEYS[1]) < tonumber(ARGV[3]) then
return redis.call('ZADD', KEYS[1], ARGV[1], ARGV[2])
else
return 0
end
You can try it out if you save the above snippet as test.lua file and then run it a few times, e.g
redis-cli --eval test.lua myset , 1 A 3
redis-cli --eval test.lua myset , 2 B 3
redis-cli --eval test.lua myset , 3 C 3
redis-cli --eval test.lua myset , 4 D 3
redis-cli --eval test.lua myset , 5 E 3
Verify
127.0.0.1:6379> ZRANGE myset 0 100
1) "A"
2) "B"
3) "C"

How to get DIFF on sorted set

How do I get most weighted elements from a sorted set, but excluding those found in another set(or list or hash).
>zadd all 1 one
>zadd all 2 two
>zadd all 3 three
>sadd disabled 2
>sdiff all disabled
(error) WRONGTYPE Operation against a key holding the wrong kind of value
Is my only option is to get elements from the sorted set one-by-one and compare to the list of "disabled" items? Wouldn't that be very slow because of so many transactions to a server?
What is the approach here?
Note: I assume you've meant sadd disabled two
As you've found out, SDIFF does not operate on sorted sets - that is because defining the difference between sorted sets isn't trivial.
What you could do is first create a temporary set with ZUNIONSTORE and set the intersect's scores to 0. Then do a range excluding the 0, e.g.:
127.0.0.1:6379> ZADD all 1 one 2 two 3 three
(integer) 3
127.0.0.1:6379> SADD disabled two
(integer) 1
127.0.0.1:6379> ZUNIONSTORE tmp 2 all disabled WEIGHTS 1 0 AGGREGATE MIN
(integer) 3
127.0.0.1:6379> ZREVRANGEBYSCORE tmp +inf 1 WITHSCORES
1) "three"
2) "3"
3) "one"
4) "1"

Redis - how to sort by hash field in redis as opposed to key?

Let's suppose I want have redis hash a = {1:10, 2:15, 3:5, 4:0, 5:20}, and a set b = (5,3,4). I want to get a list containing elements from b, sorted by values of a[b] (result in this case is [4,3,5]).
When I try to do this, it doesn't work well.
redis 127.0.0.1:6379> hmset a 1 10 2 15 3 5 4 0 5 20
redis 127.0.0.1:6379> sadd b 5 3 4
redis 127.0.0.1:6379> sort b by a->*
1) "3"
2) "4"
3) "5"
Obviously, asterisk in hash field placeholder doesn't work. Are there other ways beside declaring a:1-a:5 to do this task by means of Redis?
P.S. This is not a duplicate of Redis : How can I sort my hash by keys?, as that question clearly discusses the a:* approach.
this is a know issue: link
you could do the following:
redis 127.0.0.1:6379> sadd b 5 3 4
redis 127.0.0.1:6379> zadd a 10 1 15 2 5 3 0 4 20 5
redis 127.0.0.1:6379> zinterstore result 2 a b
redis 127.0.0.1:6379> zrange result 0 -1
1) "4"
2) "3"
3) "5"
Maybe you can model it using sorted sets instead? Use the values as score, and the keys as members. Sorted sets are more or less like hashes sorted by value. I'd love to give you an example, but I'm not sure exactly what the problem you're trying to solve is. If you could elaborate maybe I could help.