Reading a large number of keys in high Redis traffic

Reading a large number of keys in high Redis traffic - redis

I have used Redis to store the statistics of a website with relatively high traffic. I get a Redis variable for each statistical item and increment it with each call.
Now, I need to read this information every minute and update it in SQL and reset the keys to zero. Since the number of keys and traffic is high, what is the optimal way to read the keys?
The easiest method is to scan the Redis database every minute. But since scanning is costly and it will include new keys that are alive for less than a minute, this method does not seem suitable.
In another method, I first enabled the keyspace events to receive a notification when the keys expires.
CONFIG SET notify-keyspace-events Ex
Then, for each key, I created an empty key with a TTL of one minute, and at the time of receiving its expiration notification, I read the associated key and its value and put it in the output queue. This method also has disadvantages, such as the possibility of losing notifications when reconnecting.
public void SubscribeOnStatKeyExpire()
{
sub.Subscribe("__keyevent#0__:expired").OnMessage(async channelMessage => {
await ExportExpiredKeys((string)channelMessage.Message);
});
}
private async Task ExportExpiredKeys(string expiredKey)
{
if(expiredKey.StartsWith(_notifyKeyPrefix))
{
var key = expiredKey.Remove(0, _notifyKeyPrefix.Length);
var value = await _cache.StringGetDeleteAsync(key);
if (!value.HasValue || value.IsNullOrEmpty) return;
var count = (int)value;
if(count <= _negligibleValue)
{
await _cache.StringIncrementAsync(key, count);
return;
}
var listLength = await Enqueue(_exportQueue, key + "." + value);
if (listLength > _numberOfKeysInOneMessage) // _numberOfKeysInOneMessage for now is 500
{
var list = await Dequeue(_exportQueue,_numberOfKeysInOneMessage);
await SendToSqlServiceBrocker(list);
}
}
}
Is there a better way? For example, using a stream, sorted set or etc.

Related

How to remove old data in redis when updating data?

I'm setting data in redis, but I update that data every minute, and I need redis to replace the old data with the new updated one. I am very confused as to how I might be able to achieve that. I tried to use client.expire('key', 60), but that doesn't seem to work. Below is my code. Any help would be appreciated. Thanks in advance.
var redis = require("redis"),
client = redis.createClient();
const { promisify } = require("util");
const setAsync = promisify(client.set).bind(client);
async function run(){
//do something
const success = await setAsync('mykey', list);
console.log({success});
client.expire('mykey', 59);
}

It seems like the key will be deleted (or more precisely, marked for deletion) 59 seconds after calling client.expire, but there are 2 things you need to be aware of:
If you call the EXPIRE command on a key that already have a TTL, it'll override the pervious TTL.
The expiration timeout is cleared once the value of the key is overwritten (using SET, DEL, etc.).
BTW, in case you're updating the exact same key, the data will be overwritten anyway.. for example:
await setAsync('key', 'value1'); // set 'key' to 'value1'
await setAsync('key', 'value2'); // override 'key' to 'value2'
await getAsync('key'); // 'value2'
Also, instead of running two commands, one for setting the value and one for setting the TTL, you can use the EX option like that:
setAsync('key', 'value', 'EX', 60); // set `key` to `value` with TTL of 60 seconds

How to get lastest data from redis at real time?

I want to implement real-time applications with Redis.
There are data that are pushed in real time on Redis, like the source code below that used lettuce library.
RedisClient redisClient = RedisClient.create(uri);
StatefulRedisConnection<String, String> connection = redisClient.connect()
RedisStringAsyncCommands<String, String> asyncCommands = connection.async();
List<RedisFuture<?>> futures = Lists.newArrayList();
while(true) {
futures.add(asyncCommands.set("key", "value"));
}
If I want to check the data at client in real time, how can I implement it?
At first time, I used pub/sub way, but the pub/sub method could not get the stored data. It was just publish data - channel - subscribe data in real time.
For example, Kafka can continuously get data through the consumer, like that.
while(true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
logger.info("offset = {}, value = {}", record.offset(), record.value());
}
}
Are there some ways?

How to span a ConcurrentDictionary across load-balancer servers when using SignalR hub with Redis

I have ASP.NET Core web application setup with SignalR scaled-out with Redis.
Using the built-in groups works fine:
Clients.Group("Group_Name");
and survives multiple load-balancers. I'm assuming that SignalR persists those groups in Redis automatically so all servers know what groups we have and who are subscribed to them.
However, in my situation, I can't just rely on Groups (or Users), as there is no way to map the connectionId (Say when overloading OnDisconnectedAsync and only the connection id is known) back to its group, and you always need the Group_Name to identify the group. I need that to identify which part of the group is online, so when OnDisconnectedAsync is called, I know which group this guy belongs to, and on which side of the conversation he is.
I've done some research, and they all suggested (including Microsoft Docs) to use something like:
static readonly ConcurrentDictionary<string, ConversationInformation> connectionMaps;
in the hub itself.
Now, this is a great solution (and thread-safe), except that it exists only on one of the load-balancer server's memory, and the other servers have a different instance of this dictionary.
The question is, do I have to persist connectionMaps manually? Using Redis for example?
Something like:
public class ChatHub : Hub
{
static readonly ConcurrentDictionary<string, ConversationInformation> connectionMaps;
ChatHub(IDistributedCache distributedCache)
{
connectionMaps = distributedCache.Get("ConnectionMaps");
/// I think connectionMaps should not be static any more.
}
}
and if yes, is it thread-safe? if no, can you suggest a better solution that works with Load-Balancing?

Have been battling with the same issue on this end. What I've come up with is to persist the collections within the redis cache while utilising a StackExchange.Redis.IDatabaseAsync alongside locks to handle concurrency.
This unfortunately makes the entire process sync but couldn't quite figure a way around this.
Here's the core of what I'm doing, this attains a lock and return back a deserialised collection from the cache
private async Task<ConcurrentDictionary<int, HubMedia>> GetMediaAttributes(bool requireLock)
{
if(requireLock)
{
var retryTime = 0;
try
{
while (!await _redisDatabase.LockTakeAsync(_mediaAttributesLock, _lockValue, _defaultLockDuration))
{
//wait till we can get a lock on the data, 100ms by default
await Task.Delay(100);
retryTime += 10;
if (retryTime > _defaultLockDuration.TotalMilliseconds)
{
_logger.LogError("Failed to get Media Attributes");
return null;
}
}
}
catch(TaskCanceledException e)
{
_logger.LogError("Failed to take lock within the default 5 second wait time " + e);
return null;
}
}
var mediaAttributes = await _redisDatabase.StringGetAsync(MEDIA_ATTRIBUTES_LIST);
if (!mediaAttributes.HasValue)
{
return new ConcurrentDictionary<int, HubMedia>();
}
return JsonConvert.DeserializeObject<ConcurrentDictionary<int, HubMedia>>(mediaAttributes);
}
Updating the collection like so after I've done manipulating it
private async Task<bool> UpdateCollection(string redisCollectionKey, object collection, string lockKey)
{
var success = false;
try
{
success = await _redisDatabase.StringSetAsync(redisCollectionKey, JsonConvert.SerializeObject(collection, new JsonSerializerSettings
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
}));
}
finally
{
await _redisDatabase.LockReleaseAsync(lockKey, _lockValue);
}
return success;
}
and when I'm done I just ensure the lock is released for other instances to grab and use
private async Task ReleaseLock(string lockKey)
{
await _redisDatabase.LockReleaseAsync(lockKey, _lockValue);
}
Would be happy to hear if you find a better way of doing this. Struggled to find any documentation on scale out with data retention and sharing.

Booksleeve setting expiration on multiple key/values

Unless I am missing something, I don't see a Multiple Set/Add overload that allows you to set multiple keys with an expiration.
var conn = new RedisConnection("server");
Dictionary<string,string> keyvals;
conn.Strings.Set(0,keyvals,expiration);
or even doing it with multiple operations
conn.Strings.Set(0,keyvals);
conn.Expire(keyvals.Keys,expiration);

No such redis operation exists - expire is not varadic. However, since the api is pipelined, just call the method multiple times. If you want to ensure absolute best performance, you can suspend eager socket flushing while you do this:
conn.SuspendFlush();
try {
foreach(...)
conn.Keys.Expire(...);
} finally {
conn.ResumeFlush();
}

Here is my approach:
var expireTime = ...
var batchOp = redisCache.CreateBatch();
foreach (...) {
batchOp.StringSetAsync(key, value, expireTime);
}
batchOp.Execute();

Delaying writes to SQL Server

I am working on an app, and need to keep track of how any views a page has. Almost like how SO does it. It is a value used to determine how popular a given page is.
I am concerned that writing to the DB every time a new view needs to be recorded will impact performance. I know this borderline pre-optimization, but I have experienced the problem before. Anyway, the value doesn't need to be real time; it is OK if it is delayed by 10 minutes or so. I was thinking that caching the data, and doing one large write every X minutes should help.
I am running on Windows Azure, so the Appfabric cache is available to me. My original plan was to create some sort of compound key (PostID:UserID), and tag the key with "pageview". Appfabric allows you to get all keys by tag. Thus I could let them build up, and do one bulk insert into my table instead of many small writes. The table looks like this, but is open to change.
int PageID | guid userID | DateTime ViewTimeStamp
The website would still get the value from the database, writes would just be delayed, make sense?
I just read that the Windows Azure Appfabric cache does not support tag based searches, so it pretty much negates my idea.
My question is, how would you accomplish this? I am new to Azure, so I am not sure what my options are. Is there a way to use the cache without tag based searches? I am just looking for advice on how to delay these writes to SQL.

You might want to take a look at http://www.apathybutton.com (and the Cloud Cover episode it links to), which talks about a highly scalable way to count things. (It might be overkill for your needs, but hopefully it gives you some options.)

You could keep a queue in memory and on a timer drain the queue, collapse the queued items by totaling the counts by page and write in one SQL batch/round trip. For example, using a TVP you could write the queued totals with one sproc call.
That of course doesn't guarantee the view counts get written since its in memory and latently written but page counts shouldn't be critical data and crashes should be rare.

You might want to have a look at how the "diagnostics" feature in Azure works. Not because you would use diagnostics for what you are doing at all, but because it is dealing with a similar problem and may provide some inspiration. I am just about to implement a data auditing feature and I want to log that to table storage so also want to delay and bunch the updates together and I have taken a lot of inspiration from diagnostics.
Now, the way Diagnostics in Azure works is that each role starts a little background "transfer" thread. So, whenever you write any traces then that gets stored in a list in local memory and the background thread will (by default) bunch all the requests up and transfer them to table storage every minute.
In your scenario, I would let each role instance keep track of a count of hits and then use a background thread to update the database every minute or so.
I would probably use something like a static ConcurrentDictionary (or one hanging off a singleton) on each webrole with each hit incrementing the counter for the page identifier. You'd need to have some thread handling code to allow multiple request to update the same counter in the list. Alternatively, just allow each "hit" to add a new record to a shared thread-safe list.
Then, have a background thread once per minute increment the database with the number of hits per page since last time and reset the local counter to 0 or empty the shared list if you are going with that approach (again, be careful about the multi threading and locking).
The important thing is to make sure your database update is atomic; If you do a read-current-count from the database, increment it and then write it back then you may have two different web role instances doing this at the same time and thus losing one update.
EDIT:
Here is a quick sample of how you could go about this.
using System.Collections.Concurrent;
using System.Data.SqlClient;
using System.Threading;
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
// You would put this in your Application_start for the web role
Thread hitTransfer = new Thread(() => HitCounter.Run(new TimeSpan(0, 0, 1))); // You'd probably want the transfer to happen once a minute rather than once a second
hitTransfer.Start();
//Testing code - this just simulates various web threads being hit and adding hits to the counter
RunTestWorkerThreads(5);
Thread.Sleep(5000);
// You would put the following line in your Application shutdown
HitCounter.StopRunning(); // You could do some cleverer stuff with aborting threads, joining the thread etc but you probably won't need to
Console.WriteLine("Finished...");
Console.ReadKey();
}
private static void RunTestWorkerThreads(int workerCount)
{
Thread[] workerThreads = new Thread[workerCount];
for (int i = 0; i < workerCount; i++)
{
workerThreads[i] = new Thread(
(tagname) =>
{
Random rnd = new Random();
for (int j = 0; j < 300; j++)
{
HitCounter.LogHit(tagname.ToString());
Thread.Sleep(rnd.Next(0, 5));
}
});
workerThreads[i].Start("TAG" + i);
}
foreach (var t in workerThreads)
{
t.Join();
}
Console.WriteLine("All threads finished...");
}
}
public static class HitCounter
{
private static System.Collections.Concurrent.ConcurrentQueue<string> hits;
private static object transferlock = new object();
private static volatile bool stopRunning = false;
static HitCounter()
{
hits = new ConcurrentQueue<string>();
}
public static void LogHit(string tag)
{
hits.Enqueue(tag);
}
public static void Run(TimeSpan transferInterval)
{
while (!stopRunning)
{
Transfer();
Thread.Sleep(transferInterval);
}
}
public static void StopRunning()
{
stopRunning = true;
Transfer();
}
private static void Transfer()
{
lock(transferlock)
{
var tags = GetPendingTags();
var hitCounts = from tag in tags
group tag by tag
into g
select new KeyValuePair<string, int>(g.Key, g.Count());
WriteHits(hitCounts);
}
}
private static void WriteHits(IEnumerable<KeyValuePair<string, int>> hitCounts)
{
// NOTE: I don't usually use sql commands directly and have not tested the below
// The idea is that the update should be atomic so even though you have multiple
// web servers all issuing similar update commands, potentially at the same time,
// they should all commit. I do urge you to test this part as I cannot promise this code
// will work as-is
//using (SqlConnection con = new SqlConnection("xyz"))
//{
// foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
// {
// var cmd = con.CreateCommand();
// cmd.CommandText = "update hits set count = count + #count where tag = #tag";
// cmd.Parameters.AddWithValue("#count", hitCount.Value);
// cmd.Parameters.AddWithValue("#tag", hitCount.Key);
// cmd.ExecuteNonQuery();
// }
//}
Console.WriteLine("Writing....");
foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
{
Console.WriteLine(String.Format("{0}\t{1}", hitCount.Key, hitCount.Value));
}
}
private static IEnumerable<string> GetPendingTags()
{
List<string> hitlist = new List<string>();
var currentCount = hits.Count();
for (int i = 0; i < currentCount; i++)
{
string tag = null;
if (hits.TryDequeue(out tag))
{
hitlist.Add(tag);
}
}
return hitlist;
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Reading a large number of keys in high Redis traffic - redis

Related

How to remove old data in redis when updating data?

How to get lastest data from redis at real time?

How to span a ConcurrentDictionary across load-balancer servers when using SignalR hub with Redis

Booksleeve setting expiration on multiple key/values

Delaying writes to SQL Server

Categories

Resources