Readding all key batch wise using stack exchange Redis client - redis

i am using StackExchange.Redis (2.1.58) , i write below code by using cursor .
Dictionary<string, string> keyResult = new Dictionary<string, string>();
var Server = Connection.GetServer(Connection.GetEndPoints()[0]);
long current_cursor = 0;
int next_cursor =-1;
long page_size = 10;
while(next_cursor!=0)
{
var allkeys = Server.Keys(RedisCache.Database, argKeyPattern, 10, 0, next_cursor==-1 ? 0 : next_cursor);
var cursor = ((StackExchange.Redis.IScanningCursor)allkeys);
foreach (var key in allkeys)
{
if (current_cursor == cursor.Cursor)
{
keyResult.Add(key, RedisCache.StringGet(key));
}
else
{
next_cursor = Convert.ToInt32(cursor.Cursor);
current_cursor = next_cursor;
break;
}
}
}
this code works fine , my question is there any other approach to read keys from Redis batch wise in more efficient way ?
Thanks !!

From the documentation
The Keys(...) method deserves special mention: it is unusual in that it does not have an *Async counterpart. The reason for this is that behind the scenes, the system will determine the most appropriate method to use (KEYS vs SCAN, based on the server version), and if possible will use the SCAN approach to hand you back an IEnumerable that does all the paging internally - so you never need to see the implementation details of the cursor operations. If SCAN is not available, it will use KEYS, which can cause blockages at the server.
From first looks, KEYS command should be avoid. However, the library already fix that for you by using SCAN if the command is available. So I think you're good here.
if SCAN command is not available so Keys() will fallback to using KEYS command
SCAN command is available in version 2.8++

Related

Redis StackExchange LuaScripts with parameters

I'm trying to use the following Lua script using C# StackExchange library:
private const string LuaScriptToExecute = #"
local current
current = redis.call(""incr"", KEYS[1])
if current == 1 then
redis.call(""expire"", KEYS[1], KEYS[2])
return 1
else
return current
end
Whenever i'm evaluating the script "as a string", it works properly:
var incrementValue = await Database.ScriptEvaluateAsync(LuaScriptToExecute,
new RedisKey[] { key, ttlInSeconds });
If I understand correctly, each time I invoke the ScriptEvaluateAsync method, the script is transmitted to the redis server which is not very effective.
To overcome this, I tried using the "prepared script" approach, by running:
_setCounterWithExpiryScript = LuaScript.Prepare(LuaScriptToExecute);
...
...
var incrementValue = await Database.ScriptEvaluateAsync(_setCounterWithExpiryScript,
new[] { key, ttlInSeconds });
Whenever I try to use this approach, I receive the following error:
ERR Error running script (call to f_7c891a96328dfc3aca83aa6fb9340674b54c4442): #user_script:3: #user_script: 3: Lua redis() command arguments must be strings or integers
What am I doing wrong?
What is the right approach in using "prepared" LuaScripts that receive dynamic parameters?
If I look in the documentation: no idea.
If I look in the unit test on github it looks really easy.
(by the way, is your ttlInSeconds really RedisKey and not RedisValue? You are accessing it thru KEYS[2] - shouldnt that be ARGV[1]? Anyway...)
It looks like you should rewrite your script to use named parameters and not arguments:
private const string LuaScriptToExecute = #"
local current
current = redis.call(""incr"", #myKey)
if current == 1 then
redis.call(""expire"", #myKey, #ttl)
return 1
else
return current
end";
// We should load scripts to whole redis cluster. Even when we dont have any.
// In that case, there will be only one EndPoint, one iteration etc...
_myScripts = _redisMultiplexer.GetEndPoints()
.Select(endpoint => _redisMultiplexer.GetServer(endpoint))
.Where(server => server != null)
.Select(server => lua.Load(server))
.ToArray();
Then just execute it with anonymous class as parameter:
for(var setCounterWithExpiryScript in _myScripts)
{
var incrementValue = await Database.ScriptEvaluateAsync(
setCounterWithExpiryScript,
new {
myKey: (RedisKey)key, // or new RedisKey(key) or idk
ttl: (RedisKey)ttlInSeconds
}
)// .ConfigureAwait(false); // ? ;-)
// when ttlInSeconds is value and not key, just dont cast it to RedisKey
/*
var incrementValue = await
Database.ScriptEvaluateAsync(
setCounterWithExpiryScript,
new {
myKey: (RedisKey)key,
ttl: ttlInSeconds
}
).ConfigureAwait(false);*/
}
Warning:
Please note that Redis is in full-stop mode when executing scripts. Your script looks super-easy (you sometimes save one trip to redis (when current != 1) so i have a feeling that this script will be counter productive in greater-then-trivial scale. Just do one or two calls from c# and dont bother with this script.
First of all, Jan's comment above is correct.
The script line that updated the key's TTL should be redis.call(""expire"", KEYS[1], ARGV[1]).
Regarding the issue itself, after searching for similar issues in RedisStackExchange's Github, I found that Lua scripts do not work really well in cluster mode.
Fortunately, it seems that "loading the scripts" isn't really necessary.
The ScriptEvaluateAsync method works properly in cluster mode and is sufficient (caching-wise).
More details can be found in the following Github issue.
So at the end, using ScriptEvaluateAsync without "preparing the script" did the job.
As a side note about Jan's comment above that this script isn't needed and can be replaced with two C# calls, it is actually quite important since this operation should be atomic as it is a "Rate limiter" pattern.

RavenDB fails with ConcurrencyException when using new transaction

This code always fails with a ConcurrencyException:
[Test]
public void EventOrderingCode_Fails_WithConcurrencyException()
{
Guid id = Guid.NewGuid();
using (var scope1 = new TransactionScope())
using (var session = DataAccess.NewOpenSession)
{
session.Advanced.UseOptimisticConcurrency = true;
session.Advanced.AllowNonAuthoritativeInformation = false;
var ent1 = new CTEntity
{
Id = id,
Name = "George"
};
using (var scope2 = new TransactionScope(TransactionScopeOption.RequiresNew))
{
session.Store(ent1);
session.SaveChanges();
scope2.Complete();
}
var ent2 = session.Load<CTEntity>(id);
ent2.Name = "Gina";
session.SaveChanges();
scope1.Complete();
}
}
It fails at the last session.SaveChanges. Stating that it is using a NonCurrent etag. If I use Required instead of RequiresNew for scope2 - i.e. using the same Transaction. It works.
Now, since I load the entity (ent2) it should be using the newest Etag unless this is some cached value attached to scope1 that I am using (but I have disabled Caching). So I do not understand why this fails.
I really need this setup. In the production code the outer TransactionScope is created by NServiceBus, and the inner is for controlling an aspect of event ordering. It cannot be the same Transaction.
And I need the optimistic concurrency too - if other threads uses the entity at the same time.
BTW: This is using Raven 2.0.3.0
Since no one else have answered, I had better give it a go myself.
It turns out this was a human error. Due to a bad configuration of our IOC container the DataAccess.NewOpenSession gave me the same Session all the time (across other tests). In other words Raven works as expected :)
Before I found out about this I also experimented with using TransactionScopeOption.Suppress instead of RequiresNew. That also worked. Then I just had to make sure that whatever I did in the suppressed scope could not fail. Which was a valid option in my case.

Raven DB: How can I delete all documents of a given type

More specifically in Raven DB, I want to create a generic method with a signature like;
public void Clear<T>() {...
Then have Raven DB clear all documents of the given type.
I understand from other posts by Ayende to similar questions that you'd need an index in place to do this as a batch.
I think this would involve creating an index that maps each document type - this seems like a lot of work.
Does anyone know an efficient way of creating a method like the above that will do a set delete directly in the database?
I assume you want to do this from the .NET client. If so, use the standard DocumentsByEntityName index:
var indexQuery = new IndexQuery { Query = "Tag:" + collectionName };
session.Advanced.DocumentStore.DatabaseCommands.DeleteByIndex(
"Raven/DocumentsByEntityName",
indexQuery,
new BulkOperationOptions { AllowStale = true });
var hilo = session.Advanced.DocumentStore.DatabaseCommands.Get("Raven/H‌​ilo/", collectionName);
if (hilo != null) {
session.Advanced.DocumentStore.DatabaseCommands.Delete(hilo.‌​Key, hilo.Etag);
}
Where collectionName is the actual name of your collection.
The first operation deletes the items. The second deletes the HiLo file.
Also check out the official documentation - How to delete or update documents using index.
After much experimentation I found the answer to be quite simple, although far from obvious;
public void Clear<T>()
{
session.Advanced.DocumentStore.DatabaseCommands.PutIndex(indexName, new IndexDefinitionBuilder<T>
{
Map = documents => documents.Select(entity => new {})
});
session.Advanced.DatabaseCommands.DeleteByIndex(indexName, new IndexQuery());
}
Of course you almost certainly wouldn't define your index and do your delete in one go, I've put this as a single method for the sake of brevity.
My own implementation defines the indexes on application start as recommended by the documentation.
If you wanted to use this approach to actually index a property of T then you would need to constrain T. For example if I have an IEntity that all my document classes inherit from and this class specifies a property Id. Then a 'where T : IEntity' would allow you to use that property in the index.
It's been said in other places, but it's also worth noting that once you define a static index Raven will probably use it, this can cause your queries to seemingly not return data that you've inserted:
RavenDB Saving to disk query
I had this problem as well and this is the solution that worked for me. I'm only working in a test project, so this might be slow for a bigger db, but Ryan's answer didn't work for me.
public static void ClearDocuments<T>(this IDocumentSession session)
{
var objects = session.Query<T>().ToList();
while (objects.Any())
{
foreach (var obj in objects)
{
session.Delete(obj);
}
session.SaveChanges();
objects = session.Query<T>().ToList();
}
}
You can do that using:
http://blog.orangelightning.co.uk/?p=105

Delaying writes to SQL Server

I am working on an app, and need to keep track of how any views a page has. Almost like how SO does it. It is a value used to determine how popular a given page is.
I am concerned that writing to the DB every time a new view needs to be recorded will impact performance. I know this borderline pre-optimization, but I have experienced the problem before. Anyway, the value doesn't need to be real time; it is OK if it is delayed by 10 minutes or so. I was thinking that caching the data, and doing one large write every X minutes should help.
I am running on Windows Azure, so the Appfabric cache is available to me. My original plan was to create some sort of compound key (PostID:UserID), and tag the key with "pageview". Appfabric allows you to get all keys by tag. Thus I could let them build up, and do one bulk insert into my table instead of many small writes. The table looks like this, but is open to change.
int PageID | guid userID | DateTime ViewTimeStamp
The website would still get the value from the database, writes would just be delayed, make sense?
I just read that the Windows Azure Appfabric cache does not support tag based searches, so it pretty much negates my idea.
My question is, how would you accomplish this? I am new to Azure, so I am not sure what my options are. Is there a way to use the cache without tag based searches? I am just looking for advice on how to delay these writes to SQL.
You might want to take a look at http://www.apathybutton.com (and the Cloud Cover episode it links to), which talks about a highly scalable way to count things. (It might be overkill for your needs, but hopefully it gives you some options.)
You could keep a queue in memory and on a timer drain the queue, collapse the queued items by totaling the counts by page and write in one SQL batch/round trip. For example, using a TVP you could write the queued totals with one sproc call.
That of course doesn't guarantee the view counts get written since its in memory and latently written but page counts shouldn't be critical data and crashes should be rare.
You might want to have a look at how the "diagnostics" feature in Azure works. Not because you would use diagnostics for what you are doing at all, but because it is dealing with a similar problem and may provide some inspiration. I am just about to implement a data auditing feature and I want to log that to table storage so also want to delay and bunch the updates together and I have taken a lot of inspiration from diagnostics.
Now, the way Diagnostics in Azure works is that each role starts a little background "transfer" thread. So, whenever you write any traces then that gets stored in a list in local memory and the background thread will (by default) bunch all the requests up and transfer them to table storage every minute.
In your scenario, I would let each role instance keep track of a count of hits and then use a background thread to update the database every minute or so.
I would probably use something like a static ConcurrentDictionary (or one hanging off a singleton) on each webrole with each hit incrementing the counter for the page identifier. You'd need to have some thread handling code to allow multiple request to update the same counter in the list. Alternatively, just allow each "hit" to add a new record to a shared thread-safe list.
Then, have a background thread once per minute increment the database with the number of hits per page since last time and reset the local counter to 0 or empty the shared list if you are going with that approach (again, be careful about the multi threading and locking).
The important thing is to make sure your database update is atomic; If you do a read-current-count from the database, increment it and then write it back then you may have two different web role instances doing this at the same time and thus losing one update.
EDIT:
Here is a quick sample of how you could go about this.
using System.Collections.Concurrent;
using System.Data.SqlClient;
using System.Threading;
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
// You would put this in your Application_start for the web role
Thread hitTransfer = new Thread(() => HitCounter.Run(new TimeSpan(0, 0, 1))); // You'd probably want the transfer to happen once a minute rather than once a second
hitTransfer.Start();
//Testing code - this just simulates various web threads being hit and adding hits to the counter
RunTestWorkerThreads(5);
Thread.Sleep(5000);
// You would put the following line in your Application shutdown
HitCounter.StopRunning(); // You could do some cleverer stuff with aborting threads, joining the thread etc but you probably won't need to
Console.WriteLine("Finished...");
Console.ReadKey();
}
private static void RunTestWorkerThreads(int workerCount)
{
Thread[] workerThreads = new Thread[workerCount];
for (int i = 0; i < workerCount; i++)
{
workerThreads[i] = new Thread(
(tagname) =>
{
Random rnd = new Random();
for (int j = 0; j < 300; j++)
{
HitCounter.LogHit(tagname.ToString());
Thread.Sleep(rnd.Next(0, 5));
}
});
workerThreads[i].Start("TAG" + i);
}
foreach (var t in workerThreads)
{
t.Join();
}
Console.WriteLine("All threads finished...");
}
}
public static class HitCounter
{
private static System.Collections.Concurrent.ConcurrentQueue<string> hits;
private static object transferlock = new object();
private static volatile bool stopRunning = false;
static HitCounter()
{
hits = new ConcurrentQueue<string>();
}
public static void LogHit(string tag)
{
hits.Enqueue(tag);
}
public static void Run(TimeSpan transferInterval)
{
while (!stopRunning)
{
Transfer();
Thread.Sleep(transferInterval);
}
}
public static void StopRunning()
{
stopRunning = true;
Transfer();
}
private static void Transfer()
{
lock(transferlock)
{
var tags = GetPendingTags();
var hitCounts = from tag in tags
group tag by tag
into g
select new KeyValuePair<string, int>(g.Key, g.Count());
WriteHits(hitCounts);
}
}
private static void WriteHits(IEnumerable<KeyValuePair<string, int>> hitCounts)
{
// NOTE: I don't usually use sql commands directly and have not tested the below
// The idea is that the update should be atomic so even though you have multiple
// web servers all issuing similar update commands, potentially at the same time,
// they should all commit. I do urge you to test this part as I cannot promise this code
// will work as-is
//using (SqlConnection con = new SqlConnection("xyz"))
//{
// foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
// {
// var cmd = con.CreateCommand();
// cmd.CommandText = "update hits set count = count + #count where tag = #tag";
// cmd.Parameters.AddWithValue("#count", hitCount.Value);
// cmd.Parameters.AddWithValue("#tag", hitCount.Key);
// cmd.ExecuteNonQuery();
// }
//}
Console.WriteLine("Writing....");
foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
{
Console.WriteLine(String.Format("{0}\t{1}", hitCount.Key, hitCount.Value));
}
}
private static IEnumerable<string> GetPendingTags()
{
List<string> hitlist = new List<string>();
var currentCount = hits.Count();
for (int i = 0; i < currentCount; i++)
{
string tag = null;
if (hits.TryDequeue(out tag))
{
hitlist.Add(tag);
}
}
return hitlist;
}
}

given a list of objects using C# push them to ravendb without knowing which ones already exist

Given 1000 documents with a complex data structure. for e.g. a Car class that has three properties, Make and Model and one Id property.
What is the most efficient way in C# to push these documents to raven db (preferably in a batch) without having to query the raven collection individually to find which to update and which to insert. At the moment I have to going like so. Which is totally inefficient.
note : _session is a wrapper on the IDocumentSession where Commit calls SaveChanges and Add calls Store.
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var page = 0;
const int total = 30;
do
{
var paged = sales.Skip(page*total).Take(total);
if (!paged.Any()) return;
foreach (var sale in paged)
{
var current = sale;
var existing = _session.Query<Sale>().FirstOrDefault(s => s.Id == current.Id);
if (existing != null)
existing = current;
else
_session.Add(current);
}
_session.Commit();
page++;
} while (true);
}
Your session code doesn't seem to track with the RavenDB api (we don't have Add or Commit).
Here is how you do this in RavenDB
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
sales.ForEach(session.Store);
session.SaveChanges();
}
Your code sample doesn't work at all. The main problem is that you cannot just switch out the references and expect RavenDB to recognize that:
if (existing != null)
existing = current;
Instead you have to update each property one-by-one:
existing.Model = current.Model;
existing.Make = current.Model;
This is the way you can facilitate change-tracking in RavenDB and many other frameworks (e.g. NHibernate). If you want to avoid writing this uinteresting piece of code I recommend to use AutoMapper:
existing = Mapper.Map<Sale>(current, existing);
Another problem with your code is that you use Session.Query where you should use Session.Load. Remember: If you query for a document by its id, you will always want to use Load!
The main difference is that one uses the local cache and the other not (the same applies to the equivalent NHibernate methods).
Ok, so now I can answer your question:
If I understand you correctly you want to save a bunch of Sale-instances to your database while they should either be added if they didn't exist or updated if they existed. Right?
One way is to correct your sample code with the hints above and let it work. However that will issue one unnecessary request (Session.Load(existingId)) for each iteration. You can easily avoid that if you setup an index that selects all the Ids of all documents inside your Sales-collection. Before you then loop through your items you can load all the existing Ids.
However, I would like to know what you actually want to do. What is your domain/use-case?
This is what works for me right now. Note: The InjectFrom method comes from Omu.ValueInjecter (nuget package)
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var ids = sales.Select(i => i.Id);
var existingSales = _ravenSession.Load<Sale>(ids);
existingSales.ForEach(s => s.InjectFrom(sales.Single(i => i.Id == s.Id)));
var existingIds = existingSales.Select(i => i.Id);
var nonExistingSales = sales.Where(i => !existingIds.Any(x => x == i.Id));
nonExistingSales.ForEach(i => _ravenSession.Store(i));
_ravenSession.SaveChanges();
}