How to set multiple keys to expire at different times? - redis

Is there a better way to set multiple keys to expire at different times without having to set expiration of each individual one?
Here's an example to explain my quesiton better :
redis.mSet(["key1", "value1", "key2", "value2"])
redis.expire("key1", 1000)
redis.expire("key2", 2000)
//set them both to expire in a single query?

What you are doing is the right solution. I would enhance it with two things:
Use transactions (multi/exec) so it will all be treated by the server as a single command.
Use pipeline so the perf impact will be minimal when using multiple commands (I think it is done automatically when using Node-Redis)

Related

REDIS SISMEMBERS Performance: LUA script on multiple keys vs single transaction

I have a dozen of REDIS Keys of the type SET, say
PUBSUB_USER_SET-1-1668985588478915880,
PUBSUB_USER_SET-2-1668985588478915880,
PUBSUB_USER_SET-3-1668988644477632747,
.
.
.
.
PUBSUB_USER_SET-10-1668983464477632083
The set contains a userId and the problem statement is to check if the user is present in any of the set or not
The solution I tried is to get all the keys and append with a delimiter (, comma) and pass it as an argument to lua script wherein with gmatch operator I split the keys and run sismember operation until there is a hit.
local vals = KEYS[1]
for match in (vals..","):gmatch("(.-)"..",") do
local exist = redis.call('sismember', match, KEYS[2])
if (exist == 1) then
return 1
end
end
return 0
Now as and when the number of keys grows to PUBSUB_USER_SET-20 or PUBSUB_USER_SET-30 I see an increase in latency and in throughput.
Is this the better way to do or Is it better to batch LUA scripts where in instead of passing 30keys as arguments I pass in batches of 10keys and return as soon as the user is present or is there any better way to do this?
I would propose a different solution instead of storing keys randomly in a set. You should store keys in one set and you should query that set to check whether a key is there or not.
Lets say we've N sets numbered s-0,s-1,s-2,...,s-19
You should put your keys in one of these sets based on their hash key, which means you need to query only one set instead of checking all these sets. You can use any hashing algorithm.
To make it further interesting you can try consistent hashing.
You can use redis pipeline with batching(10 keys per iteration) to improve the performance

HAProxy general purpose counters and stick tables

I'm trying to use HAProxy for rate-limiting.
I need to keep track of several endpoints and limit them individually.
So far I was using general purpose counters. However, there is only 3 of them, sc0 to sc2.
In the documentation, it mentions that all the operations on these counters take an optional table parameter, but it's not clear, then, if I can track different things on different tables, but using the same counters.
In other words: is the limit of 3 general purpose counters global, or per sticky table?
If, after proper table definition and track instructions, I do
sc1_inc_gpc0(table1)
(and, under different conditions)
sc1_inc_gpc0(table2)
And then have 2 acl rules like
acl X sc1_get_gpc0(table1) gt 1
acl Y sc1_get_gpc0(table2) gt 1
Will the two acls work indepentently, or they would to all effects track the same counter?
Thanks for all help!
(In case you are wondering: for a number of reasons, at the moment I could not use a different solution than HAProxy for rate-limiting)
Self answered after looking at the source code and testing.
Yes it is possible to use the same counter on different tables
Moreover, you can also increment the number of available counters at build time. The default is 3, but it can be set up to 10 for sure. Then generic versions of the functions like sc_gpc0_rate(<ctr>[,<table>]) can be used, passing the index of the new counter as first argument.

How do I delete all keys matching a specified key pattern using StackExchange.Redis?

I've got about 150,000 keys in a Redis cache, and need to delete > 95% of them - all keys matching a specific key prefix - as part of a cache rebuild. As I can see it, there are three ways to achieve this:
Use server.Keys(pattern) to pull out the entire key list matching my prefix pattern, and iterate through the keys calling KeyDelete for each one.
Maintain a list of keys in a Redis set - each time I insert a value, I also insert the key in the corresponding key set, and then retrieve these sets rather than using Keys. This would avoid the expensive Keys() call, but still relies on deleting tens of thousands of records one by one.
Isolate all of my volatile data in a specific numbered database, and just flush it completely at the start of a cache rebuild.
I'm using .NET and the StackExchange.Redis client - I've seen solutions elsewhere that use the CLI or rely on Lua scripting, but nothing that seems to address this particular use case - have I missed a trick, or is this just something you're not supposed to do with Redis?
(Background: Redis is acting as a view model in front of the Microsoft Dynamics CRM API, so the cache is populated on first run by pulling around 100K records out of CRM, and then kept in sync by publishing notifications from within CRM whenever an entity is modified. Data is cached in Redis indefinitely and we're dealing with a specific scenario here where the CRM plugins fail to fire for a period of time, which causes cache drift and eventually requires us to flush and rebuild the cache.)
Both options 2 & 3 are reasonable.
Steer clear of option 1. KEYS really is slow and only gets slower as your keyspace grows.
I'd normally go for 2 (without LUA, including LUA would increase the learning curve to support the solution - which of course is fine when justified and assuming it's existence is clear/documented.), but 3 could definitely be a contender, fast and simple, as long as you can be sure you won't exceed the configured DB limit.
Use scanStream instead of keys and it will work like a charm.
Docs - https://redis.io/commands/scan
The below code can get you a array of keys starting with LOGIN:: and you can loop through the array and execute redis DEL command to del the corresponding keys.
Example code in nodejs :-
const redis = require('ioredis');
let stream = redis.scanStream({
match: "LOGIN::*"
});
stream.on("data", async (keys = []) => {
let key;
for (key of keys) {
if (!keysArray.includes(key)) {
await keysArray.push(key);
}
}
});
stream.on("end", () => {
res(keysArray);
});

Problems loading a series of snapshots by date

I have been running into a consistent problem using the LBAPI which I feel is probably a common use case given its purpose. I am generating a chart which uses LBAPI snapshots of a group of Portfolio Items to calculate the chart series. I know the minimum and maximum snapshot dates, and need to query once a day in between these two dates. There are two main ways I have found to accomplish this, both of which are not ideal:
Use the _ValidFrom and _ValidTo filter properties to limit the results to snapshots within the selected timeframe. This is bad because it will also load snapshots which I don't particularly care about. For instance if a PI is revised several times throughout the day, I'm really only concerned with the last valid snapshot of that day. Because some of the PIs I'm looking for have been revised several thousand times, this method requires pulling mostly data I'm not interested in, which results in unnecessarily long load times.
Use the __At filter property and send a separate request for each query date. This method is not ideal because some charts would require several hundred requests, with many requests returning redundant results. For example if a PI wasn't modified for several days, each request within that time frame would return a separate instance of the same snapshot.
My workaround for this was to simulate the effect of __At, but with several filters per request. To do this, I added this filter to my request:
Rally.data.lookback.QueryFilter.or(_.map(queryDates, function(queryDate) {
return Rally.data.lookback.QueryFilter.and([{
property : '_ValidFrom',
operator : '<=',
value : queryDate
},{
property : '_ValidTo',
operator : '>=',
value : queryDate
}]);
}))
But of course, a new problem arises... Adding this filter results in much too large of a request to be sent via the LBAPI, unless querying for less than ~20 dates. Is there a way I can send larger filters to the LBAPI? Or will I need to break theis up into several requests, which only makes this solution slightly better than the second of the latter.
Any help would be much appreciated. Thanks!
Conner, my recommendation is to download all of the snapshots even the ones you don't want and marshal them on the client side. There is functionality in the Lumenize library that's bundled with the App SDK that makes this relatively easy and the TimeSeriesCalculator will also accomplish this for you with even more features like aggregating the data into series.

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)