Redis Secondary Indicing maintenance add,remove are fine what about updates - redis

All articles and write ups Ive read all talking about insertion only, and i have a problem.
I have implemented a secondary index mechanism using transactions (MULTI) and sets in order to save time when looking up entities, the index is saved by a set names by property name and value.
say we have a Person
Person Jack = new Person() { Id = 1, Name = Jack, Age = 30 }
Person Jena = new Person() { Id = 2, Name = Jena, Age = 30 }
when I choose to index the Age property and insert both, I look up the Age of 1 and 2 prep them and insert them to the corresponding set and update the index in the same transaction.
age_30_index holds ids 1,2
when deleting Jack, I prep Jacks age, 30, and remove id 1 from age_30_index and remove Jack from its set again within one transaction, all great, well .. almost.
the problem start when I want to change the Age and update the cache, look at the following scenario:
var p = GetEntity<Person>(id: 1)
p.Age = 31
UpdateEntity(p)
now with the concept above, i will have age_30_index -> 1,2 and age_31_index -> 1
that is because when updating the entity in cache I don't know what is the value of the property stored in cache.. therefore cant remove it from the index.
another problem is deleting by Id, or deleting like this:
var p = GetEntity<Person>(id: 1)
p.Age = 31
DeleteEntity(p)
An easy solution would be using a distributed lock and lock by entity name, get the entities from cache, delete the indexes and continue, but with tests I ran, the performance is lacking.
any other option I thought about is not thread safe because its not atomic.
Is there any other way to achieve what I'm tying to do?
the project is c# .net framework with redis on windows, redisearch.io seems nice but its out of scope unfortunately.

The Hard Way
If RediSearch is really something you don't want to get into because you are running Redis on Windows - my big comment is that you are using the wrong data structure for storing age, numerics generally should be stored in sorted sets so you can query them more easily
so for your age you would have the sorted set Person:Age and when you add Jack (let's say Jack's Id is 1 and age is 30 like your example) you would set your index like so:
ZADD Person:Age 30 1
then when you need to update Jack's age to 31, all you would need to do is update the member in the sorted set:
ZADD Person:Age 31 1
you'd then be able to query all the 31 year olds with a ZRANGE which in StackExchange.Redis looks like:
db.SortedSetRangeByScore("Person:Age", 31, 31, Exclude.None);
that's it, no need for you to look it up in an older collection and a newer one.
Your bigger concern about atomicity should be surrounding if you need to scale out redis. If you need to do that, there's no good atomic way to update a distributed index without RediSearch. That's because all the keys in a multi-key operation or transaction have to be in the same slot, or redis will not accept the operation. Because you are on windows I'm assuming you are not using a clustered multi-shard environment and that you are either stand-alone or sentinel.
Assuming you are running stand-alone or sentinel you can run the updates inside of a lua script which would allow you to run all of your commands sequentially basically your script would update your entity and then update all the accompanying indexes so if you have a sorted set for age in you example Person:Age your script to update that would look something like:
redis.call('HSET', KEYS[1], ARGV[1])
redis.call('ZADD', 'Person:Age', ARGV[1], KEYS[1])
naturally, you will have to change this script out with whatever the reality of your index is, so you'll probably want to do some kind of dynamic script generation. I'd use scripting over MULTI because MULTI on StackExchange.Redis has slightly different behavior than what's expected by redis due to it's architecture.
The best way with RediSearch and Redis.OM
The best way to handle this would be to use RediSearch (I'm putting this part second even though it really is the right answer because it sounds like something you want to avoid, given your environment). With RediSearch + Redis.OM .NET you would just update the item in question and call collection.Save() so for your example here - the following does everything you're looking for (insert, retrieve, update, delete)
using Redis.OM;
using TestRedisOmUpdate;
var provider = new RedisConnectionProvider("redis://localhost:6379");
provider.Connection.CreateIndex(typeof(Person));
var collection = provider.RedisCollection<Person>();
//insert
var id = await collection.InsertAsync(new() {Name = "Jack", Age = 30});
//update
foreach (var person in collection.Where(p => p.Name == "Jack"))
{
person.Age = 31;
}
await collection.SaveAsync();
//check
foreach (var person in collection.Where(p => p.Name == "Jack"))
{
Console.WriteLine($"{person.Name} {person.Age}");
}
//delete
provider.Connection.Unlink(id);
that all would work off of the model:
[Document]
public class Person
{
[Indexed]
public string Name { get; set; }
[Indexed]
public int Age { get; set; }
}
And you don't have the headache or hassle of maintaining the indexes yourself.

Related

Sitefinity - Safely delete orphaned dynamic content records

I've been adding records to a dynamic module via the API and in the process during my experimentation I added a bunch of records that weren't associated correctly with any valid parent record.
I've checked and so far I can see that Sitefinity stores data about these records in a number of tables:
mydynamiccontenttype_table
sf_dynamic_content
sf_dynmc_cntnt_sf_lnguage_data
sf_dynmc_cntent_sf_permissions
I would like to clean up the database by deleting these records but I want to make sure I don't create more problems in the process.
Anyone know if there are more references to these dynamic content type records or a process to safely delete them?
There are probably other tables, so your safest option would be to delete the items using the Sitefinity API.
Just get the masterId of the item and use a code like this:
public static void DeleteDataItemOfType(this DynamicModuleManager manager, string type, Guid Id)
{
Type resolvedType = TypeResolutionService.ResolveType(type);
using (var region = new ElevatedModeRegion(manager))
{
manager.DeleteDataItem(resolvedType, Id);
manager.SaveChanges();
}
}

Existing saga instances after applying the [Unique] attribute to IContainSagaData property

I have a bunch of existing sagas in various states of a long running process.
Recently we decided to make one of the properties on our IContainSagaData implementation unique by using the Saga.UniqueAttribute (about which more here http://docs.particular.net/nservicebus/nservicebus-sagas-and-concurrency).
After deploying the change, we realized that all our old saga instances were not being found, and after further digging (thanks Charlie!) discovered that by adding the unique attribute, we were required to data fix all our existing sagas in Raven.
Now, this is pretty poor, kind of like adding a index to a database column and then finding that all the table data no longer select-able, but being what it is, we decided to create a tool for doing this.
So after creating and running this tool we've now patched up the old sagas so that they now resemble the new sagas (sagas created since we went live with the change).
However, despite all the data now looking right we're still not able to find old instances of the saga!
The tool we wrote does two things. For each existing saga, the tool:
Adds a new RavenJToken called "NServiceBus-UniqueValue" to the saga metadata, setting the value to the same value as our unique property for that saga, and
Creates a new document of type NServiceBus.Persistence.Raven.SagaPersister.SagaUniqueIdentity, setting the SagaId, SagaDocId, and UniqueValue fields accordingly.
My questions are:
Is it sufficient to simply make the data look correct or is there something else we need to do?
Another option we have is to revert the change which added the unique attribute. However in this scenario, would those new sagas which have been created since the change went in be OK with this?
Code for adding metadata token:
var policyKey = RavenJToken.FromObject(saga.PolicyKey); // This is the unique field
sagaDataMetadata.Add("NServiceBus-UniqueValue", policyKey);
Code for adding new doc:
var policyKeySagaUniqueId = new SagaUniqueIdentity
{
Id = "Matlock.Renewals.RenewalSaga.RenewalSagaData/PolicyKey/" + Guid.NewGuid().ToString(),
SagaId = saga.Id,
UniqueValue = saga.PolicyKey,
SagaDocId = "RenewalSaga/" + saga.Id.ToString()
};
session.Store(policyKeySagaUniqueId);
Any help much appreciated.
EDIT
Thanks to David's help on this we have fixed our problem - the key difference was we used the SagaUniqueIdentity.FormatId() to generate our document IDs rather than a new guid - this was trivial tio do since we were already referencing the NServiceBus and NServiceBus.Core assemblies.
The short answer is that it is not enough to make the data resemble the new identity documents. Where you are using Guid.NewGuid().ToString(), that data is important! That's why your solution isn't working right now. I spoke about the concept of identity documents (specifically about the NServiceBus use case) during the last quarter of my talk at RavenConf 2014 - here are the slides and video.
So here is the long answer:
In RavenDB, the only ACID guarantees are on the Load/Store by Id operations. So if two threads are acting on the same Saga concurrently, and one stores the Saga data, the second thread can only expect to get back the correct saga data if it is also loading a document by its Id.
To guarantee this, the Raven Saga Persister uses an identity document like the one you showed. It contains the SagaId, the UniqueValue (mostly for human comprehension and debugging, the database doesn't technically need it), and the SagaDocId (which is a little duplication as its only the {SagaTypeName}/{SagaId} where we already have the SagaId.
With the SagaDocId, we can use the Include feature of RavenDB to do a query like this (which is from memory, probably wrong, and should only serve to illustrate the concept as pseudocode)...
var identityDocId = // some value based on incoming message
var idDoc = RavenSession
// Look at the identity doc's SagaDocId and pull back that document too!
.Include<SagaIdentity>(identityDoc => identityDoc.SagaDocId)
.Load(identityDocId);
var sagaData = RavenSession
.Load(idDoc.SagaDocId); // Already in-memory, no 2nd round-trip to database!
So then the identityDocId is very important because it describes the uniqueness of the value coming from the message, not just any old Guid will do. So what we really need to know is how to calculate that.
For that, the NServiceBus saga persister code is instructive:
void StoreUniqueProperty(IContainSagaData saga)
{
var uniqueProperty = UniqueAttribute.GetUniqueProperty(saga);
if (!uniqueProperty.HasValue) return;
var id = SagaUniqueIdentity.FormatId(saga.GetType(), uniqueProperty.Value);
var sagaDocId = sessionFactory.Store.Conventions.FindFullDocumentKeyFromNonStringIdentifier(saga.Id, saga.GetType(), false);
Session.Store(new SagaUniqueIdentity
{
Id = id,
SagaId = saga.Id,
UniqueValue = uniqueProperty.Value.Value,
SagaDocId = sagaDocId
});
SetUniqueValueMetadata(saga, uniqueProperty.Value);
}
The important part is the SagaUniqueIdentity.FormatId method from the same file.
public static string FormatId(Type sagaType, KeyValuePair<string, object> uniqueProperty)
{
if (uniqueProperty.Value == null)
{
throw new ArgumentNullException("uniqueProperty", string.Format("Property {0} is marked with the [Unique] attribute on {1} but contains a null value. Please make sure that all unique properties are set on your SagaData and/or that you have marked the correct properties as unique.", uniqueProperty.Key, sagaType.Name));
}
var value = Utils.DeterministicGuid.Create(uniqueProperty.Value.ToString());
var id = string.Format("{0}/{1}/{2}", sagaType.FullName.Replace('+', '-'), uniqueProperty.Key, value);
// raven has a size limit of 255 bytes == 127 unicode chars
if (id.Length > 127)
{
// generate a guid from the hash:
var key = Utils.DeterministicGuid.Create(sagaType.FullName, uniqueProperty.Key);
id = string.Format("MoreThan127/{0}/{1}", key, value);
}
return id;
}
This relies on Utils.DeterministicGuid.Create(params object[] data) which creates a Guid out of an MD5 hash. (MD5 sucks for actual security but we are only looking for likely uniqueness.)
static class DeterministicGuid
{
public static Guid Create(params object[] data)
{
// use MD5 hash to get a 16-byte hash of the string
using (var provider = new MD5CryptoServiceProvider())
{
var inputBytes = Encoding.Default.GetBytes(String.Concat(data));
var hashBytes = provider.ComputeHash(inputBytes);
// generate a guid from the hash:
return new Guid(hashBytes);
}
}
}
That's what you need to replicate to get your utility to work properly.
What's really interesting is that this code made it all the way to production - I'm surprised you didn't run into trouble before this, with messages creating new saga instances when they really shouldn't because they couldn't find the existing Saga data.
I almost think it might be a good idea if NServiceBus would raise a warning any time you tried to find Saga Data by anything other than a [Unique] marked property, because it's an easy thing to forget to do. I filed this issue on GitHub and submitted this pull request to do just that.

Should I use unique tables for every user?

I'm working on an web app that collects traffic information for websites that use my service. Think google analytics but far more visual. I'm using SQL Server 2012 for the backbone of my app and am considering using MongoDB as the data gathering analytic side of the site.
If I have 100 users with an average of 20,000 hits a month on their site, that's 2,000,000 records in a single collection that will be getting queried.
Should I use MongoDB to store this information (I'm new to it and new things are intimidating)?
Should I dynamically create new collections/tables for every new user?
Thanks!
With MongoDB the collection (aka sql table) can get quite big without much issue. That is largely what it is designed for. The Mongo is part HuMONGOus (pretty clever eh). This is a great use for mongodb which is great at storing point in time information.
Options :
1. New Collection for each Client
very easy to do I use a GetCollectionSafe Method for this
public class MongoStuff
private static MongoDatabase GetDatabase()
{
var databaseName = "dbName";
var connectionString = "connStr";
var client = new MongoClient(connectionString);
var server = client.GetServer();
return server.GetDatabase(databaseName);
}
public static MongoCollection<T> GetCollection<T>(string collectionName)
{
return GetDatabase().GetCollection<T>(collectionName);
}
public static MongoCollection<T> GetCollectionSafe<T>(string collectionName)
{
//var db = GetDatabase();
var db = GetDatabase();
if (!db.CollectionExists(collectionName)) {
db.CreateCollection(collectionName);
}
return db.GetCollection<T>(collectionName);
}
}
then you can call with :
var collection = MongoStuff.GetCollectionSafe<Record>("ClientName");
Running this script
static void Main(string[] args)
{
var times = new List<long>();
for (int i = 0; i < 1000; i++)
{
Stopwatch watch = new Stopwatch();
watch.Start();
MongoStuff.GetCollectionSafe<Person>(String.Format("Mark{0:000}", i));
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
times.Add(watch.ElapsedMilliseconds);
}
Console.WriteLine(String.Format("Max : {0} \nMin : {1} \nAvg : {2}", times.Max(f=>f), times.Min(f=> f), times.Average(f=> f)));
Console.ReadKey();
}
Gave me (on my laptop)
Max : 180
Min : 1
Avg : 6.635
Benefits :
Ease of splitting data if one client needs to go on their own
Might match your brain map of the problem
Cons :
Almost impossible to do aggregate data over all collections
Hard to find collections in Management studios (like robomongo)
2. One Large Collection
Use one collection for it all access it this way
var coll = MongoStuff.GetCollection<Record>("Records");
Put an index on the table (the index will make reads orders of magnitude quicker)
coll.EnsureIndex(new IndexKeysBuilder().Ascending("ClientId"));
needs to only be run once (per collection, per index )
Benefits :
One Simple place to find data
Aggregate over all clients possible
More traditional Mongodb setup
Cons :
All Clients Data is intermingled
May not mentally map as well
Just as a reference the mongodb limits for sizes are here :
[http://docs.mongodb.org/manual/reference/limits/][1]
3. Store only aggregated data
If you are never intending to break down to an individual record just save the aggregates themselves.
Page Loads :
# Page Total Time Average Time
15 Default.html 1545 103
I will let someone else tackle the MongoDB side of your question as I don't feel I'm the best person to comment on it, I would point out that MongoDB is a very different animal and you'll lose a lot of the RI you enjoy in SQL.
In terms of SQL design I would not use a different schema for each customer approach. Your database schema and backups could grow uncontrollably, maintaining a dynamically growing schema will be a nightmare.
I would suggest one of two approaches:
Either you can create a new database for each customer:
This is more secure as users cannot access each other's data (just use different credentials) and users are easier to manage/migrate and separate.
However many hosting providers charge per database, it will cost more to run and maintain and should you wish to compare data across users it gets much more challenging.
Your second approach is to simply host all users in a single DB, your tables will grow large (although 2 million rows is not over the top for a well maintained SQL DB). You would simply use a UserID column to discriminate.
The emphasis will be on you to get the performance you need through proper indexing
Users' data will exist in the same system and there's no SQL defense against users accessing each other's data - your code will have to be good!

nHibernate Proxy Load using Natural Key

How would I use nHibernate,configured by fluent nhibernate if it makes any difference, to load an entity using natural/alternate key in some cases, rather than the primary key when using the Load method on an ISession.
I still need the functionality to allow me to do both, and in the majority of cases, the entity will be loaded via the PKey, but in some cases (where an external system is involved), I need to select the record using the natural key.
I'd like to keep the performance benefit Load allows, rather than do a query etc.
// Current
int countryID = 1; // from normal input source
Address a = new Address();
a.Country = session.Load<Country>(countryID);
session.SaveOrUpdate(a);
// Required
string countryCode = "usa"; // from external input source
Address a2 = new Address();
a2.Country = session.LoadViaNatualKeySomehow<Country>(c=> c.Code, countryCode); // :)
session.SaveOrUpdate(a2);
AFAIK, it is not possible. As you can see in Ayendes post, there is a query syntax for criteria, the only natural ID in the whole NHibernate API as far as I know. This query translates into a "normal" query, except of the second level cache handling as described in this post.
It would be nice if it wouldn't at least flush the session.
one simple performance enhancement you can do is turning off auto flush before querying by the (immutable!) natural ID:
session.FlushMode = FlushMode.Never;
session.CreateQuery(...by natural id ...);
session.FlushMode = FlushMode.Auto;
This can make a big difference, but does of course not compete to Load.
The reason why it doesn't exist is most probably the fact the entities in the session are all identified by the id.
If you had it:
var entity1 = session.Load<Entit>(id);
// does not exist
var entity2 = session.LoadByNaturalKey(natural id);
How could NH determine that the id and the natural id are identifying the same object, without loading them from the database? The whole session cache gets into trouble.

NHibernate criteria query question

I have 3 related objects (Entry, GamePlay, Prize) and I'm trying to find the best way to query them for what I need using NHibernate. When a request comes in, I need to query the Entries table for a matching entry and, if found, get a) the latest game play along with the first game play that has a prize attached. Prize is a child of GamePlay and each Entry object has a GamePlays property (IList).
Currently, I'm working on a method that pulls the matching Entry and eagerly loads all game plays and associated prizes, but it seems wasteful to load all game plays just to find the latest one and any that contain a prize.
Right now, my query looks like this:
var entry = session.CreateCriteria<Entry>()
.Add(Restrictions.Eq("Phone", phone))
.AddOrder(Order.Desc("Created"))
.SetFetchMode("GamePlays", FetchMode.Join)
.SetMaxResults(1).UniqueResult<Entry>();
Two problems with this:
It loads all game plays up front. With 365 days of data, this could easily balloon to 300k of data per query.
It doesn't eagerly load the Prize child property for each game. Therefore, my code that loops through the GamePlays list looking for a non-null Prize must make a call to load each Prize property I check.
I'm not an nhibernate expert, but I know there has to be a better way to do this. Ideally, I'd like to do the following (pseudocode):
entry = findEntry(phoneNumber)
lastPlay = getLatestGamePlay(Entry)
firstWinningPlay = getFirstWinningGamePlay(Entry)
The end result of course is that I have the entry details, the latest game play, and the first winning game play. The catch is that I want to do this in as few database calls as possible, otherwise I'd just execute 3 separate queries.
The object definitions look like:
public class Entry
{
public Guid Id {get;set;}
public string Phone {get;set;}
public IList<GamePlay> GamePlays {get;set;}
// ... other properties
}
public class GamePlay
{
public Guid Id {get;set;}
public Entry Entry {get;set;}
public Prize Prize {get;set;}
// ... other properties
}
public class Prize
{
public Guid Id {get;set;}
// ... other properties
}
The proper NHibernate mappings are in place, so I just need help figuring out how to set up the criteria query (not looking for HQL, don't use it).
since you are doing this in each request maybe it should be better to set up two formula-properties in your entity.
The first one should fetch the latest Gameplay-Id and the other the first Gameplay-Id with a not Null property
this could be as such in the xml mapping file of Entry
<property name="LatestGameplay" formula="select top(1)gp.Id from Gameplay gp where gp.FK_EntryId = PK_EntryId order by gp.InsertDate desc" />
this leaves you with the Gameplay Id's on the Entry entity and after you fetch it it would require another round trip to the DB to GetById-fetch the gameplay's
Alternatively you could work-around using filters.
Set the collection back to "lazy"
and create these nice filters
Gameplay latest = NHibernateSession.CreateFilter(entry.GamePlays , "order by InsertDate desc").SetMaxResults(1).SetFirstResult(1).UniqueResult<Gameplay>();
Gameplay winner = NHibernateSession.CreateFilter(entry.GamePlays , "where FK_PrizeId is not null order by InsertDate asc ").SetMaxResults(1).SetFirstResult(1).UniqueResult<Gameplay>();
And IFilters can be used in a multiquery as so have 2 db hits: one for the original Entry and one for the multiquery.
Last but not least, you could define 2 bags in the Entry entity, one IList<GamePlay> Latest and one IList<Gameplay> Winner which in the Entry mapping file would be filtered with the appropriate query (although i don't remember now if you can define TOP clauses in the filters) and set those as non-lazy. Then with a single round-trip you can have all the data you want with the following (ugly) syntax
Entry entry = findEntry(phoneNumber);
Gameplay winner = entry.Winner[0]; //check this if null first
Gameplay Latest = entry.Latest[0]; //ditto
note that of all the solutions the 3rd is the one that provides a mechanism to generate additional queries, as the bag can be used in a Criteria/HQL query