How do I represent this data using Redis? - redis

I want to be able to store data such as "store x is open between 9am and 5pm on Monday but it's only open during 9am and 12pm on Saturday"
What's the best way to store this using redis?
I would later like to query it using something like this. Show me all stores that are open on Saturday at 10:30am

In Redis, like most if not all other NoSQL databases, you want to store your data in the manner that's most suitable for answering the query. There are quite a few ways you can represent this data and answer the query, choosing between them requires knowledge about the other access patterns that you need to support.
However, in the context of this specific question alone, the simplest way of doing that IMO is to use two Sorted Sets per for each day of the week. Assuming that stores are open continuously and at most once each day (i.e. no siestas), the members of these Sorted Sets should be the store ids and the scores their opening hours - the first Sort Set's scores will denote the time that the store opens whereas the second's the time it closes. For example:
ZADD monday:open 9 store:x
ZADD monday:close 17 store:x
ZADD saturday:open 9 store:x
ZADD saturday:close 12 store:x
Once you have all the Sorted Sets in place, answering the query requires two calls to ZRANGEBYSCORE and intersecting the results. The snippet below demonstrates how to do it using Lua since doing using server scripts will be more efficient than moving the entire thing to the client in most cases.
Note: an alternative approach to doing the intersect in Lua is actually storing the temporary results in Redis' Sets and calling SINTER.
-- helper function to make a "set" out of a table
local function makeset(t)
local r = {}
for _, v in ipairs(t) do r[v] = true end
return(r)
end
-- get opening and closing hours for a given day
local ot = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
local ct = redis.call('ZRANGEBYSCORE', KEYS[2], '(' .. ARGV[1], '+inf')
-- convert to sets and choose the smaller set as s1
local s1 = {}
local s2 = {}
if #ot < #ct then
s1 = makeset(ot)
s2 = makeset(ct)
else
s1 = makeset(ct)
s2 = makeset(ot)
end
-- intersect s1 and s2
local t = {}
for k in pairs(s1) do
t[k] = s2[k]
end
-- prepare a response table
local r = {}
for k in pairs(t) do
r[#r+1] = k
end
return(r)
Run this script by passing to it the two keys and the hour, like so:
redis-cli --eval storehours.lua saturday:open saturday:close , 10.5

Related

Queue or other methods to handle tick data?

In our electronic trading system, we need to do calculation based on tick data from 100+ contracts.
Tick data of contracts is not received in one message. One message only include tick data for one contract. Timestamp of contracts are slightly different (sometimes big diff, but let's ignore this case).
eg: (first column is timestamp. Second is contract name)
below 2 data has 1ms diff
10:34:03.235,10002007,510050C2006A03500 ,0.0546
10:34:03.236,10001909,510050C2003A02750 ,0.3888
below 2 data has 3ms diff
10:34:03.594,10002154,510300C2003M03700 ,0.4985
10:34:03.597,10002118,510300C2001M03700 ,0.4514
Only those with price change will have data. So I can't count contract number to know if I have received all data for this tick.
But on the other hand, we don't want to wait till we receive all data for the tick, because sometimes data could be late for long time, we will want to exclude them.
Low latency is required. So I think we will define a window - say 50 ms - and start to calculate based on whatever data we received in past 50ms.
What will be the best way to handle such use case?
Originally I want to use redis stream to maintain a small queue, that whenever a contract's data is received, I will push it to redis stream. But I couldn't figure out what's the best way to pull data as soon as specific time (say 50ms) passed.
I am thinking about maybe I should use some other technicals?
Any suggestions are appreciated.
Use XRANGE myStream - + COUNT 1 to get the first entry.
Use XREVRANGE myStream + - COUNT 1 to get the last entry.
XINFO STREAM myStream also brings first and last entry, but the docs say it is O(log N).
Assuming you are using a timestamp as ID, or as a field, then you can compute the time difference.
If you are using Redis Streams auto-ID (XADD myStream * ...), the first part of the ID is the UNIX timestamp in milliseconds.
Assuming the above, you can do the check atomically with a Lua script:
EVAL "local first = redis.call('XRANGE', KEYS[1], '-', '+', 'COUNT', '1') local firstTime = {} if next(first) == nil then return redis.error_reply('Stream is empty or key doesn`t exist') end for str in string.gmatch(first[1][1], '([^-]+)') do table.insert(firstTime, tonumber(str)) end local last = redis.call('XREVRANGE', KEYS[1], '+', '-', 'COUNT', '1') local lastTime = {} for str in string.gmatch(last[1][1], '([^-]+)') do table.insert(lastTime, tonumber(str)) end local ms = lastTime[1] - firstTime[1] if ms >= tonumber(ARGV[1]) then return redis.call('XRANGE', KEYS[1], '-', '+') else return redis.error_reply('Only '..ms..' ms') end" 1 myStream 50
The arguments are numKeys(1 here) streamKey timeInMs(50 here): 1 myStream 50.
Here a friendly view of the Lua script:
local first = redis.call('XRANGE', KEYS[1], '-', '+', 'COUNT', '1')
local firstTime = {}
if next(first) == nil then
return redis.error_reply('Stream is empty or key doesn`t exist')
end
for str in string.gmatch(first[1][1], '([^-]+)') do
table.insert(firstTime, tonumber(str))
end
local last = redis.call('XREVRANGE', KEYS[1], '+', '-', 'COUNT', '1')
local lastTime = {}
for str in string.gmatch(last[1][1], '([^-]+)') do
table.insert(lastTime, tonumber(str))
end
local ms = lastTime[1] - firstTime[1]
if ms >= tonumber(ARGV[1]) then
return redis.call('XRANGE', KEYS[1], '-', '+')
else
return redis.error_reply('Only '..ms..' ms')
end
It returns:
(error) Stream is empty or key doesn`t exist
(error) Only 34 ms if we don't have the required time elapsed
The actual list of entries if the required time between first and last message has elapsed.
Make sure to check Introduction to Redis Streams to get familiar with Redis Streams, and EVAL command to learn about Lua scripts.

How to get same rank for same scores in Redis' ZRANK?

If I have 5 members with scores as follows
a - 1
b - 2
c - 3
d - 3
e - 5
ZRANK of c returns 2, ZRANK of d returns 3
Is there a way to get same rank for same scores?
Example: ZRANK c = 2, d = 2, e = 3
If yes, then how to implement that in spring-data-redis?
Any real solution needs to fit the requirements, which are kind of missing in the original question. My 1st answer had assumed a small dataset, but this approach does not scale as dense ranking is done (e.g. via Lua) in O(N) at least.
So, assuming that there are a lot of users with scores, the direction that for_stack suggested is better, in which multiple data structures are combined. I believe this is the gist of his last remark.
To store users' scores you can use a Hash. While conceptually you can use a single key to store a Hash of all users scores, in practice you'd want to hash the Hash so it will scale. To keep this example simple, I'll ignore Hash scaling.
This is how you'd add (update) a user's score in Lua:
local hscores_key = KEYS[1]
local user = ARGV[1]
local increment = ARGV[2]
local new_score = redis.call('HINCRBY', hscores_key, user, increment)
Next, we want to track the current count of users per discrete score value so we keep another hash for that:
local old_score = new_score - increment
local hcounts_key = KEYS[2]
local old_count = redis.call('HINCRBY', hcounts_key, old_score, -1)
local new_count = redis.call('HINCRBY', hcounts_key, new_score, 1)
Now, the last thing we need to maintain is the per score rank, with a sorted set. Every new score is added as a member in the zset, and scores that have no more users are removed:
local zdranks_key = KEYS[3]
if new_count == 1 then
redis.call('ZADD', zdranks_key, new_score, new_score)
end
if old_count == 0 then
redis.call('ZREM', zdranks_key, old_score)
end
This 3-piece-script's complexity is O(logN) due to the use of the Sorted Set, but note that N is the number of discrete score values, not the users in the system. Getting a user's dense ranking is done via another, shorter and simpler script:
local hscores_key = KEYS[1]
local zdranks_key = KEYS[2]
local user = ARGV[1]
local score = redis.call('HGET', hscores_key, user)
return redis.call('ZRANK', zdranks_key, score)
You can achieve the goal with two Sorted Set: one for member to score mapping, and one for score to rank mapping.
Add
Add items to member to score mapping: ZADD mem_2_score 1 a 2 b 3 c 3 d 5 e
Add the scores to score to rank mapping: ZADD score_2_rank 1 1 2 2 3 3 5 5
Search
Get score first: ZSCORE mem_2_score c, this should return the score, i.e. 3.
Get the rank for the score: ZRANK score_2_rank 3, this should return the dense ranking, i.e. 2.
In order to run it atomically, wrap the Add, and Search operations into 2 Lua scripts.
Then there's this Pull Request - https://github.com/antirez/redis/pull/2011 - which is dead, but appears to make dense rankings on the fly. The original issue/feature request (https://github.com/antirez/redis/issues/943) got some interest so perhaps it is worth reviving it /cc #antirez :)
The rank is unique in a sorted set, and elements with the same score are ordered (ranked) lexically.
There is no Redis command that does this "dense ranking"
You could, however, use a Lua script that fetches a range from a sorted set, and reduces it to your requested form. This could work on small data sets, but you'd have to devise something more complex for to scale.
unsigned long zslGetRank(zskiplist *zsl, double score, sds ele) {
zskiplistNode *x;
unsigned long rank = 0;
int i;
x = zsl->header;
for (i = zsl->level-1; i >= 0; i--) {
while (x->level[i].forward &&
(x->level[i].forward->score < score ||
(x->level[i].forward->score == score &&
sdscmp(x->level[i].forward->ele,ele) <= 0))) {
rank += x->level[i].span;
x = x->level[i].forward;
}
/* x might be equal to zsl->header, so test if obj is non-NULL */
if (x->ele && x->score == score && sdscmp(x->ele,ele) == 0) {
return rank;
}
}
return 0;
}
https://github.com/redis/redis/blob/b375f5919ea7458ecf453cbe58f05a6085a954f0/src/t_zset.c#L475
This is the piece of code redis uses to compute the rank in sorted sets. Right now ,it just gives rank based on the position in the Skiplist (which is sorted based on scores).
What does the skiplistnode variable "span" mean in redis.h? (what is span ?)

Pyomo scheduling

I am trying to formulate a flowshop scheduling problem in Pyomo. This is an Abstract model
Problem description
There are 3 jobs (chest, door and chair) and 3 machines (cutting, welding, packing in that order). Objective is to minimise the makespan. The python code and the data are as follows.
## flowshop.py ##
from pyomo.environ import *
flowshop = AbstractModel()
flowshop.jobs = Set()
flowshop.machines = Set()
flowshop.machinesN = Param()
flowshop.jobsN = Param()
flowshop.proc_T = Param(flowshop.jobs,
flowshop.machines,
within=NonNegativeReals)
flowshop.start_T = Var(flowshop.jobs,
flowshop.machines,
within=NonNegativeReals)
flowshop.makespan = Var(within=NonNegativeReals)
def makespan_rule(flowshop,i,j):
return flowshop.makespan >= flowshop.start_T[i,j]+flowshop.proc_T[i,j]
flowshop.makespan_cons = Constraint(flowshop.jobs,
flowshop.machines,
rule=makespan_rule)
def objective_rule(flowshop):
return flowshop.makespan
flowshop.objc = Objective(rule=objective_rule,sense=minimize)
## data.dat ##
set jobs := chest door chair ;
set machines := cutting welding packing ;
param: machinesN := 3 ;
param: jobsN := 3 ;
param proc_T:
cutting welding packing :=
chest 10 40 45
door 30 20 25
chair 05 30 15
;
I havent added all the constraints yet, I plan to add them after this issue gets fixed. In the code (flowhop.py) above, for the makespan_rule, I want the makespan to be more that the completion time of only the last machine.
Currently, it is set to be more than completion times of all the machines.
For that, I believe, I have to get the last index of the machines set.
For that, I tried flowshop.machines[-1], but it gives an error saying:
Cannot index unordered set machines
How do I solve this issue?
Thanks for the help.
PS - I am also struggling to model the binary variables used to define the precedence of a job. If you have any ideas regarding that, that would also be helpful.
As the error says Cannot index unordered sets, the set flowshop.machines is not ordered. One needs to provide ordered=True argument in the while declaring the set -
flowshop.machines = Set(ordered=True)
After this, one can access any element by normal indexing - flowshop.machines[i]
For the binary variables, one can declare them as -
c = flowshop.jobsN*(flowshop.jobsN-1)/2
flowshop.prec = Var(RangeSet(1,c),within=Binary)
Then, this variable can be used to decide the precedence between 2 jobs and to formulate the assignment constraints. The precedence variable corresponding to a pair of jobs can be found out using the indices of the jobs (for which the flowshop.jobs has to be an ordered set - flowshop.jobs = Set(ordered=True))

redis - compare 10 million sets with each other

This is the setup that I have going on:
[domain:id] => [keyword_id, keyword_id2, keyword_id3....]
....
What I want to do is for each domain, find other similar domains that contain similar keywords. The way I "measure" similarity between domain:1 and domain:2, for example, is by dividing intersection(domain:1, domain:2) by union(domain:1, domain:2).
The problem is that I have about 5 million domains with each having about a couple hundred keywords on average. Doing this similarity calculation in a nested loop comparing each domain with every other domain would take years on the hardware I have now. I tested this just for one domain:
keys = redis.keys("domain:*");
foreach(keys as key){
long inter = sinterstore("inter_temp", "domain:1", key);
long union = sunionstore("union_temp", "domain:1", key);
float similarity = inter / union;
if(similarity > 0.1){
similar_domains.add(key);
}
}
...
^ and computing similar domains for just this one domain took about 2 minutes. Doing this for 5 million domains would have taken years.
So what could I do? I have no problem moving this program to the most expensive Amazon EC2 instance for an hour once a week to compute it all, and send it back to my host, but would that even help or do I simply have too much data?
Instead of comparing each domain one by one. can't you create a batch of say 100 and pass all the keys in that domain to Redis where it will do the union/intersection for you.
For example
SADD domain:1 a b c d e f
SADD domain:2 a c e
SADD domain:3 c e f h
SINTERSTORE destination domain:1 domain:2 domain:3
will result following keys [a, b ,c ,d ,e ,f, h] in destination set
and
SINTERSTORE destination domain:1 domain:2 domain:3
will result following keys [c ,e] in destination set

Redis: Sum of SCORES in Sorted Set

What's the best way to get the sum of SCORES in a Redis sorted set?
The only option I think is iterating the sorted set and computing the sum client side.
Available since Redis v2.6 is the most awesome ability to execute Lua scripts on the Redis server. This renders the challenge of summing up a Sorted Set's scores to trivial:
local sum=0
local z=redis.call('ZRANGE', KEYS[1], 0, -1, 'WITHSCORES')
for i=2, #z, 2 do
sum=sum+z[i]
end
return sum
Runtime example:
~$ redis-cli zadd z 1 a 2 b 3 c 4 d 5 e
(integer) 5
~$ redis-cli eval "local sum=0 local z=redis.call('ZRANGE', KEYS[1], 0, -1, 'WITHSCORES') for i=2, #z, 2 do sum=sum+z[i] end return sum" 1 z
(integer) 15
If the sets are small, and you don't need killer performance, I would just iterate (zrange/zrangebyscore) and sum the values client side.
If, on the other hand, you are talking about many thousands - millions of items, you can always keep a reference set with running totals for each user and increment/decrement them as the gifts are sent.
So when you do your ZINCR 123:gifts 1 "3|345", you could do a seperate ZINCR command, which could be something like this:
ZINCR received-gifts 1 <user_id>
Then, to get the # of gifts received for a given user, you just need to run a ZSCORE:
ZSCORE received-gifts <user_id>
Here is a little lua script that maintains the zset score total as you go, in a counter with key postfixed with '.ss'. You can use it instead of ZADD.
local delta = 0
for i=1,#ARGV,2 do
local oldScore = redis.call('zscore', KEYS[1], ARGV[i+1])
if oldScore == false then
oldScore = 0
end
delta = delta - oldScore + ARGV[i]
end
local val = redis.call('zadd', KEYS[1], unpack(ARGV))
redis.call('INCRBY', KEYS[1]..'.ss', delta)