How to process single update of the resource in given time window - redis

I am receiving multiple location updates from the device for a particular resource Id . These location updates can be received at an interval of second or milliseconds as well .
I want to process only one of the update for any particular resource Id in a given time window lets say 10 second .
Currently I am reading the last processed request for that resource Id from redis cluster and checking if the time difference is greater than 30 seconds , then only I am processing the request .
Issue happens when we receive the request at almost same time (milliseconds apart) . And two different machine picks those request to process . When both the machine try to get the last processed request for that resource Id , they both get the same record , because of which both the requests enter into the system .
Is there any technique to prevent this scenario ?

You can use setnx (set if not exists). For resource ID 1, at any timestamp do a division of 10000 (10 seconds). For example, timestamp 1583906206000, the value will be 158390620. And construct the key as resource_1_time_158390620. Till the next 10 secs our key will be the same.
setnx resource_1_time_158390620 value
If multiple appservers tries, or multiple requests comes within 10 secs, the value will be set for the first one alone. Also this commands returns 1 meaning the value is set and 0 meaning the value is already present. You can make use of it as well.

Related

After importing a metric into Victoria Metrics, the metric is repeated for 5 minutes. What controls this behavior?

I am writing some software that will be pushing data to Victoria Metrics, as below:
curl -d 'foo{bar="baz"} 30' -X POST 'http://[Victoria]/insert/0/prometheus/api/v1/import/prometheus'
I noticed that if I push a single metric like this, it shows up as not a single data point but rather shows up repeatedly as if it was being scraped every 15 seconds, either until I push a new value for that metric or 5 minutes passes.
What setting/mechanism is causing this 5-minute repeat period?
Pushing data with a timestamp does not change this. Metric gets repeated for 5 minutes after that time or until a change regardless.
I don't necessarily need to alter this behavior, just trying to understand why it's happening.
How do you query the database?
I guess this behaviour is due to the ranged query concept and ephemeral datapoints, check this out:
https://docs.victoriametrics.com/keyConcepts.html#range-query
The interval between datapoints depends on the step parameter, which is 5 minutes when omitted.
If you want to receive only the real datapoints, go via export functions.
https://docs.victoriametrics.com/#how-to-export-time-series
TSDB VM has ephemeral dots which fill gaps in the closest sample on the left to the requested timestamp.
So if you make the instant request:
curl "http://<victoria-metrics-addr>/api/v1/query?query=foo_bar&time=2022-05-10T10:03:00.000Z"
The time range at which VictoriaMetrics will try to locate a missing data sample is equal to 5m by default and can be overridden via step parameter.
step - optional max lookback window for searching for raw samples when executing the query. If step is skipped, then it is set to 5m (5 minutes) by default.
GET | POST /api/v1/query?query=...&time=...&step=...
You can read more about key concepts in this part of the documentation
key-concepts
There you can find also information about query range and different concepts about TSDB

Storing time intervals efficiently in redis

I am trying to track server uptimes using redis.
So the approach I have chosen is as follows:
server xyz will keep on sending my service ping indicating that it was alive and working in the last 30 seconds.
My service will store a list of all time intervals during which the server was active. This will be done by storing a list of {startTime, endTime} in redis, with key as name of the server (xyz)
Depending on a user query, I will use this list to generate server uptime metrics. Like % downtime in between times (T1, T2)
Example:
assume that the time is T currently.
at T+30, server sends a ping.
xyz:["{start:T end:T+30}"]
at T+60, server sends another ping
xyz:["{start:T end:T+30}", "{start:T+30 end:T+60}"]
and so on for all pings.
This works fine , but an issue is that over a large time period this list will get a lot of elements. To avoid this currently, on a ping, I pop the last element of the list, check if it can be merged with the latest time interval. If it can be merged, I coalesce and push a single time interval into the list. if not then 2 time intervals are pushed.
So with this my list becomes like this after step 2 : xyz:["{start:T end:T+60}"]
Some problems I see with this approach is:
the merging is being done in my service, and not redis.
incase my service is distributed, The list ordering might get corrupted due to multiple readers and writers.
Is there a more efficient/elegant way to handle this , like maybe handling merging of time intervals in redis itself ?

Redis race condition: `Keys *` sometimes returns incorrect amount of keys

I have 2 applications. First one is writing keys such as
SET MyKey_1
SET MYKey_2
and then sends notification to second application via network.
Other application waits for notification and then counts how many keys with specific prefix is in DB:
KEYS MyKey_*
if key count is different from expected it raises an error:
waitNotification(firstAppSocket)
if redisCount("KEYS MyKey_*") != 2 {
panic("wrong key count")
}
Sometimes I encounter into race condition where 1-st app sets key and receives OK from Redis, notifies 2-nd app, but count returns 1. This happens approx. 1 in 10 times. If I retry count operation after a very short timeout (talking microseconds) it becomes correct.
Is there a race condition in Redis for such operation? Is there key population timeout?

Keep item in list for certain time

I am no an expert in redis at all. Today I run into one idea, but I don't know if it is possible in redis.
I want to store list of values but only for some time, for example list of ip addresses which visited page in last 5 minutes. As far as I know I can't set EXPIRE on single list/hash item, right? So I am pushing 1, 2, 3 into list/hash but after certain constant time I want each item to expire/disapear? Or maybe instead of list hash structure will be more suitable { '1': timestamp-when-disapear, ... }?
Or maybe only solution is
SET test.1.1 1
EXPIRE test.1.1 60
SET test.1.2 2
EXPIRE test.1.2 60
SET test.1.3 3
EXPIRE test.1.3 60
# to retrieve, can I pipeline KEYS output to MGET?
KEYS test.1.*
Use a sorted set instead.
log the server IP along with the timestamp in a sorted set. During retrieval make use of that timestamp to get things you need. In a scheduler periodically delete keys which goes beyond the range.
Example:
zadd test 1465371055 1.1
zadd test 1465381055 1.3
zadd test 1465391055 1.1
your sorted set will have 1.1 and 1.3, where 1.1 is with the new value 1465391055.
Now on retrieval use
zrangebyscore test min max
min -> currenttime - (5*60*1000)
max -> currenttime
you will get IP's visited in last 5 mins.
In another scheduler kind of thread you need to delete unwanted entries.
zremrangebyscore test min max
min -> currenttime - (10*60*1000) -> you can give it to any minute you want.
max -> currenttime
Also understand that if number of distinct IP's are too large then the sorted set will grow rapidly. Your scheduler thread must work properly to keep the memory in control.

Is there any option to use redis.expire more elastically?

I got a quick simple question,
Assume that if server receives 10 messages from user within 10 minutes, server sends a push email.
At first I thought it very simple using redis,
incr("foo"), expire("foo",60*10)
and in Java, handle the occurrence count like below
if(jedis.get("foo")>=10){sendEmail();jedis.del("foo");}
but imagine if user send one message at first minute and send 8 messages at 10th minute.
and the key expires, and user again send 3 messages in the next minute.
redis key will be created again with value 3 which will not trigger sendEmail() even though user send 11 messages in 2 minutes actually.
we're gonna use Redis and we don't want to put receive time values to redis.
is there any solution ?
So, there's 2 ways of solving this-- one to optimize on space and the other to optimize on speed (though really the speed difference should be marginal).
Optimizing for Space:
Keep up to 9 different counters; foo1 ... foo9. Basically, we'll keep one counter for each of the possible up to 9 different messages before we email the user, and let each one expire as it hits the 10 minute mark. This will work like a circular queue. Now do this (in Python for simplicity, assuming we have a connection to Redis called r):
new_created = False
for i in xrange(1,10):
var_name = 'foo%d' % i
if not (new_created or r.exists(var_name)):
r.set(var_name, 0)
r.expire(var_name, 600)
new_created = True
if not r.exists(var_name): continue
r.incr(var_name, 1)
if r.get(var_name) >= 10:
send_email(user)
r.del(var_name)
If you go with this approach, put the above logic in a Lua script instead of the example Python, and it should be quite fast. Since you'll at most be storing 9 counters per user, it'll also be quite space efficient.
Optimizing for speed:
Keep one Redis Sortet Set per user. Every time a user sends a message, add to his sorted set with a key equal to the timestamp and an arbitrary value. Then just do a ZCOUNT(now, now - 10 minutes) and send an email if that's greater than 10. Then ZREMRANGEBYSCORE(now - 10 minutes, inf). I know you said you didn't want to keep timestamps in Redis, but IMO this is a better solution, and you're going to have to hold some variant on timestamps somewhere no matter what.
Personally I'd go with the latter approach because the space differences are probably not that big, and the code can be done quickly in pure Redis, but up to you.