I have around 3 million keys of structure
with following format as an example
3000000:60
Each key has a hset of string converted values of integer and double
Each key has around 10 such entries in map.
Key: String = 3000000:60
Value : field(String) = abcc , Value = 1000:-123.456:1001:234.57:.....
When I read the data using hgetAll or hget or hmget it takes 2-3 minutes
When I change it to pipeline , I could finish in 20 seconds but I am not sure how to process the data in batches.
My code snippet is like below
readPipeline.multi();
for (long i = start; i < end; ++i)
if (pipelineBatch == currentBatchCount)
readPipeline.exec();
List<Object> allObjects = readPipeline.syncAndReturnAll();
Here when I read sometimes the Objects have value as Queued. So, how do I process the pipelined data?
Related
I have to count unique entries from a stream of transactions using Redis. There will be at least 1K jobs trying to concurrently check if the transaction is unique and if it is, put the the transaction type as key and the value is an incremented counter. This counter is again shared by all threads.
If all threads do
Check if key exists. exists(transactionType)
Increment the counter. val count = incr(counter)
Set the new value. setnx(transactionType, count)
This creates two problems.
Increments the counter unnecessarily, as the count can be updated by one of the threads.
Have to perform an exists, increment and then insert. (3 operations)
Is there a better way of doing this increment and update of counter if the value does not exist.
private void checkAndIncrement(String transactionType, Jedis redisHandle) {
if(transactionType != null) {
if(redisHandle.exists(transactionType) ^ Boolean.TRUE) {
long count = redisHandle.incr("t_counter");
redisHandle.setnx(transactionType, "" + count);
}
}
}
EDIT:
Once a value is created as say T1 = 100, the transaction should also be identifiable with the number 100. I would have to store another map with counter as key and transaction type as value.
Two options:
Use a hash, HSETNX to add keys to the hash (just set the value to 1 or "" or anything), and HLEN to get the count of keys in the hash. You can always start over with HDEL. You could also use HINCRBY instead of HSETNX to additionally find out how many times each key appears.
Use a hyperloglog. Use PFADD to insert elements and PFCOUNT to retrieve the count. HyperLogLog is a probabilistic algorithm; the memory usage for a HLL doesn't go up with the number of unique items the way a hash does, but the count returned is only approximate (usually within about 1% of the true value).
I am not that familiar with Redis. At the moment I am designing some realtime service and I'd like to rely on it. I expect ~10000-50000 keys per minute to be SET with some reasonable EX and match over them using SCAN rarely enough not to bother with performance bottlenecks.
The thing I doubt is "in/out rate" and possible overflooding with keys that might match some SCAN query and thus it never terminates (i.e. always replies with latest cursor position and forces you to continue; that could happen easily if one consumes x items per second and there are x + y items per second coming in with y > 0).
Obviously, I could set desired SCAN size long enough; but I wonder if there exists a better solution or does Redis itself guarantees that SCAN will grow size automatically in such a case?
First some context, solution at the end:
From SCAN command > Guarantee of termination
The SCAN algorithm is guaranteed to terminate only if the size of the
iterated collection remains bounded to a given maximum size, otherwise
iterating a collection that always grows may result into SCAN to never
terminate a full iteration.
This is easy to see intuitively: if the collection grows there is more
and more work to do in order to visit all the possible elements, and
the ability to terminate the iteration depends on the number of calls
to SCAN and its COUNT option value compared with the rate at which the
collection grows.
But in The COUNT option it says:
Important: there is no need to use the same COUNT value for every
iteration. The caller is free to change the count from one iteration
to the other as required, as long as the cursor passed in the next
call is the one obtained in the previous call to the command.
Important to keep in mind, from Scan guarantees:
A given element may be returned multiple times. It is up to the
application to handle the case of duplicated elements, for example
only using the returned elements in order to perform operations that
are safe when re-applied multiple times.
Elements that were not
constantly present in the collection during a full iteration, may be
returned or not: it is undefined.
The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.
Using DBSIZE or INFO keyspace command you can get how many keys you have at any time:
> DBSIZE
(integer) 200032
> info keyspace
# Keyspace
db0:keys=200032,expires=0,avg_ttl=0
Another source of information is the undocumented DEBUG htstats index, just to get a feeling:
> DEBUG htstats 0
[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 262144
number of elements: 200032
different slots: 139805
max chain length: 8
avg chain length (counted): 1.43
avg chain length (computed): 1.43
Chain length distribution:
0: 122339 (46.67%)
1: 93163 (35.54%)
2: 35502 (13.54%)
3: 9071 (3.46%)
4: 1754 (0.67%)
5: 264 (0.10%)
6: 43 (0.02%)
7: 6 (0.00%)
8: 2 (0.00%)
[Expires HT]
No stats available for empty dictionaries
The table size is the power of 2 following your number of keys:
Keys: 200032 => Table size: 262144
The solution:
We will calculate a desired COUNT argument for every scan.
Say you will be calling SCAN with a frequency (F in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T in s). So you want this finished in N = F*T calls, N = 50 in this example.
Before your first scan, you know your current progress is 0, so your remaining percent is RP = 1 (100%).
Before every SCAN call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of a DBSIZE call), you call DBSIZE to get the number of keys K.
You will use COUNT = K*RP/N
For the first call, this is COUNT = 200032*1/50 = 4000.
For any other call, you need to calculate RP = 1 - ReversedCursor/NextPowerOfTwo(K).
For example, let say you have done 20 calls already, so now N = 30 (remaining number of calls). You called DBSIZE and got K = 281569. This means NextPowerOfTwo(K) = 524288, this is 2^19.
Your next cursor is 14509 in decimal = 000011100010101101 in binary. As the table size is 2^19, we represent it with 18 bits.
You reverse the bits and get 101101010001110000 in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:
RP = 1 - ReversedCursor/NextPowerOfTwo(K) = 1 - 185456 / 524288 = 0.65 or 65%
So you have to adjust:
COUNT = K*RP/N = 281569 * 0.65 / 30 = 6100
So in your next SCAN call you use 6100. Makes sense it increased because:
The amount of keys has increased from 200032 to 281569.
Although we have only 60% of our initial estimate of calls remaining, progress is behind as 65% of the keyspace is pending to be scanned.
All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor PM (percent of matches) to the COUNT calculation.
COUNT = PM * K*RP/N
PM = keysFound / ( K * ReversedCursor/NextPowerOfTwo(K))
If after 20 calls, you have found only keysFound = 2000 keys, then:
PM = 2000 / ( 281569 * 185456 / 524288) = 0.02
This means only 2% of the keys are matching our pattern so far, so
COUNT = PM * K*RP/N = 0.02 * 6100 = 122
This algorithm can probably be improved, but you get the idea.
Make sure to run some benchmarks on the COUNT number you'll use to start with, to measure how many milliseconds is your SCAN taking, as you may need to moderate your expectations about how many calls you need (N) to do this in a reasonable time without blocking the server, and adjust your F and T accordingly.
I got several sorted sets with a common prefix (itemmovements:) in Redis.
I know we can use ZCOUNT to get the number of items for a single (sorted set) key like this:
127.0.0.1:6379> zcount itemmovements:8 0 1000000000
(integer) 23
(I am able to do this, since I know the range of the item scores.)
How to run this in a loop for all keys prefixed itemmovements:?
Taking hint from How to atomically delete keys matching a pattern using Redis I tried this:
127.0.0.1:6379> EVAL "return redis.call('zcount', unpack(redis.call('keys', ARGV[1])), 0, 1000000000)" 0 itemmovements:*
(integer) 150
but as you can see it just returns a single number (which happens to be the size of itemmovements:0, the first value returned by keys).
I realized I did not understand what that lua code in EVAL was doing. The code below works fine:
eval "local a = {}; for _,k in ipairs(redis.call('keys', 'itemmovements:*')) do table.insert(a, k); table.insert(a, redis.call('zcount', k, 0, 1000000000)); end; return a" 0
How do I push integers with lpush in a redis list type?
I want to test my finagle-redis client if it works correctly and insert manual sample data into redis like this
127.0.0.1:6379> rpush key:214 1 1 1
(integer) 3
127.0.0.1:6379> LRANGE key:214 0 -1
1) "1"
2) "1"
3) "1"
Redis already displays the numbers as char. When I extract them, I also get chars:
val data: List[ChannelBuffer] = Await.result(redisClient.lRange(key, 0, -1))
val buffer: ChannelBuffer = data(0)
buffer.readChar() // 1
buffer.readInt() // 49
Is writing integers to a list possible with the cli client? And if not, will the following even work?
val key = ChannelBuffers.copiedBuffer("listkey", utf8)
val list: List[ChannelBuffer] = List(1,1,1).map { number =>
val buffer = ChannelBuffers.buffer(4) // int sized buffer
buffer.writeInt(number)
buffer
}
// will this store the int's correctly??
redisClient.lpush(key, list)
In redis most everything is a string. This is because of the protocol used. It's text-based, so it's impossible to tell number 1234 from string "1234".
It might be storing integers internally, but you'll get strings anyway. You should cast your numbers in the app.
I want to be able to store data such as "store x is open between 9am and 5pm on Monday but it's only open during 9am and 12pm on Saturday"
What's the best way to store this using redis?
I would later like to query it using something like this. Show me all stores that are open on Saturday at 10:30am
In Redis, like most if not all other NoSQL databases, you want to store your data in the manner that's most suitable for answering the query. There are quite a few ways you can represent this data and answer the query, choosing between them requires knowledge about the other access patterns that you need to support.
However, in the context of this specific question alone, the simplest way of doing that IMO is to use two Sorted Sets per for each day of the week. Assuming that stores are open continuously and at most once each day (i.e. no siestas), the members of these Sorted Sets should be the store ids and the scores their opening hours - the first Sort Set's scores will denote the time that the store opens whereas the second's the time it closes. For example:
ZADD monday:open 9 store:x
ZADD monday:close 17 store:x
ZADD saturday:open 9 store:x
ZADD saturday:close 12 store:x
Once you have all the Sorted Sets in place, answering the query requires two calls to ZRANGEBYSCORE and intersecting the results. The snippet below demonstrates how to do it using Lua since doing using server scripts will be more efficient than moving the entire thing to the client in most cases.
Note: an alternative approach to doing the intersect in Lua is actually storing the temporary results in Redis' Sets and calling SINTER.
-- helper function to make a "set" out of a table
local function makeset(t)
local r = {}
for _, v in ipairs(t) do r[v] = true end
return(r)
end
-- get opening and closing hours for a given day
local ot = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
local ct = redis.call('ZRANGEBYSCORE', KEYS[2], '(' .. ARGV[1], '+inf')
-- convert to sets and choose the smaller set as s1
local s1 = {}
local s2 = {}
if #ot < #ct then
s1 = makeset(ot)
s2 = makeset(ct)
else
s1 = makeset(ct)
s2 = makeset(ot)
end
-- intersect s1 and s2
local t = {}
for k in pairs(s1) do
t[k] = s2[k]
end
-- prepare a response table
local r = {}
for k in pairs(t) do
r[#r+1] = k
end
return(r)
Run this script by passing to it the two keys and the hour, like so:
redis-cli --eval storehours.lua saturday:open saturday:close , 10.5