Adding key value to RavenDB document based on another document's values (For all documents in large collection) - ravendb

I am new to RavenDB and could really use some help.
I have a collection of ~20M documents, and I need to add a key to each document. The challenge is that the value of the key needs to be derived from another document.
For instance, given the following document:
{
"Name" : "001A"
"Date" : "09-09-2013T00:00:00.0000000"
"Related" : [
"002B",
"003B"
]
}
The goal is to add a key that holds the dates for the related documents, i.e. 002B and 003B, by looking up the related documents in the collection and returning their date. E.g.:
{
"Name" : "001A"
"Date" : "09-09-2013T00:00:00.0000000"
"Related" : [
"002B",
"003B"
]
"RelatedDates" : [
"08-10-2013T00:00:00.0000000",
"08-15-2013T00:00:00.0000000"
]
}
I realize that I'm trying to treat the collection somewhat like a relational database, but this is the form that my data is in to begin with. I would prefer not to put everything into a relational dataset first in order to structure the data for RavenDB.
I first tried doing this on the client side, by paging through the collection and updating the records. However, I quickly reach the maximum number of request for the session.
I then tried patching on the server side with JavaScript, but I'm not sure if this is possible.
At this point I would greatly appreciate some strategic guidance on the right way to approach this problem, as well as, more tactical guidance on how to implement it.

The recommended way of doing this is via a Console application that loops thru all your records, similar to what you have already done but in a way that pages the data so you dont hit the maximum number of requests per session.
See this example from the ravendb source code example application:
you need to do something like this:
using (var store = new DocumentStore { ConnectionStringName = "RavenDB" }.Initialize())
{
int start = 0;
while (true)
{
using (var session = store.OpenSession())
{
var posts = session.Query<Post>()
.OrderBy(x => x.CreatedAt)
.Include(x => x.CommentsId)
.Skip(start)
.Take(128)
.ToList();
if (posts.Count == 0)
break;
foreach (var post in posts)
{
session.Load<PostComments>(post.CommentsId).Post = new PostComments.PostReference
{
Id = post.Id,
PublishAt = post.PublishAt
};
}
session.SaveChanges();
start += posts.Count;
Console.WriteLine("Migrated {0}", start);
}
}
}
I've done this sort of thing with about ~1.5M records and it wasnt exactly quick to do the migration. If your records are small then you can just Load<> and SaveChanges on each one as from experience programmatically patching the documents did not speed things up materially
As a side note, the ravendb google groups is very active if you want to ask specifically about doing this from the studio

Related

Strategy for generating unique identifier for document in RavenDB

Say I have something like a support ticket system (simplified as example). It has many users and organizations. Each user can be a member of several organizations, but the typical case would be one org => many users, and most of them belong only to this organization. Each organization has a "tag" which is to be used to construct "ticket numbers" for this organization. Lets say we have an org called StackExchange that wants the tag SES.
So if I open the first ticket of today, I want it to be SES140407-01. The next is SES140407-02 and so on. Doesn't have to be two digits after the dash.
How can I make sure this is generated in a way that makes sure it is 100% unique across the organization (no orgs will have the same tag)?
Note: This does not have to be the document ID in the database - that will probably just be a Guid or similar. This is just a ticket reference - kinda like a slug - that will appear in related emails etc. So it has to be unique, and I would prefer if we didn't "waste" the sequential case numbers hilo style.
Is there a practical way to ensure I get a unique ticket number even if two or more people report a new one at almost the same time?
EDIT: Each Organization is a document in RavenDB, and can easily hold a property like LastIssuedTicketId. My challenge is basically to find the best way to read this field, generate a new one, and store this back in a way that is "race condition safe".
Another edit: To be clear - I intend to generate the ticket ID in my own software. What I am looking for is a way to ask RavenDB "what was the last ticket number", and then when I generate the next one after that, "am I the only one using this?" - so that I give my ticket a unique case id, not necessarily related to what RavenDB considers the document id.
I use for that generic sequence generator written for RavenDB:
public class SequenceGenerator
{
private static readonly object Lock = new object();
private readonly IDocumentStore _docStore;
public SequenceGenerator(IDocumentStore docStore)
{
_docStore = docStore;
}
public int GetNextSequenceNumber(string sequenceKey)
{
lock (Lock)
{
using (new TransactionScope(TransactionScopeOption.Suppress))
{
while (true)
{
try
{
var document = GetDocument(sequenceKey);
if (document == null)
{
PutDocument(new JsonDocument
{
Etag = Etag.Empty,
// sending empty guid means - ensure the that the document does NOT exists
Metadata = new RavenJObject(),
DataAsJson = RavenJObject.FromObject(new { Current = 0 }),
Key = sequenceKey
});
return 0;
}
var current = document.DataAsJson.Value<int>("Current");
current++;
document.DataAsJson["Current"] = current;
PutDocument(document);
{
return current;
}
}
catch (ConcurrencyException)
{
// expected, we need to retry
}
}
}
}
}
private void PutDocument(JsonDocument document)
{
_docStore.DatabaseCommands.Put(
document.Key,
document.Etag,
document.DataAsJson,
document.Metadata);
}
private JsonDocument GetDocument(string key)
{
return _docStore.DatabaseCommands.Get(key);
}
}
It generates incremental unique sequence based on sequenceKey. Uniqueness is guaranteed by raven optimistic concurrency based on Etag. So each sequence has its own document which we update when generate new sequence number. Also, there is lock which reduced extra db calls if several threads are executing at the same moment at the same process (appdomain).
For your case you can use it this way:
var sequenceKey = string.Format("{0}{1:yyMMdd}", yourCompanyPrefix, DateTime.Now);
var nextSequenceNumber = new SequenceGenerator(yourDocStore).GetNextSequenceNumber(sequenceKey);
var nextSequenceKey = string.Format("{0}-{1:00}", sequenceKey, nextSequenceNumber);

Updating complex type with ef code first

I have a complex type called account, which contains a list of licenses.
Licenses in turn contains a list of domains (a domain is a simple id + url string).
In my repository I have this code
public void SaveLicense(int accountId, License item)
{
Account account = GetById(accountId);
if (account == null)
{
return;
}
if (item.Id == 0)
{
account.Licenses.Add(item);
}
else
{
ActiveContext.Entry(item).State = EntityState.Modified;
}
ActiveContext.SaveChanges();
}
When I try to save an updated License (with modified domains) what happens is that strings belonging straight to the license get updated just fine.
However no domains get updated.
I should mention that what I have done is allow the user to add and remove domains in the user interface. Any new domains get id=0 and any deleted domains are simply not in the list.
so what I want is
Any domains that are in the list and database and NOT changed - nothing happens
Any domains that are in the list and database, but changed in the list - database gets updated
Any domains with id=0 should be inserted (added) into database
Any domains NOT in the list but that are in the database should be removed
I have played a bit with it with no success but I have a sneaky suspicion that I am doing something wrong in the bigger picture so I would love tips on if I am misunderstanding something design-wise or simply just missed something.
Unfortunately updating object graphs - entities with other related entities - is a rather difficult task and there is no very sophisticated support from Entity Framework to make it easy.
The problem is that setting the state of an entity to Modified (or generally to any other state) only influences the entity that you pass into DbContext.Entry and only its scalar properties. It has no effect on its navigation properties and related entities.
You must handle this object graph update manually by loading the entity that is currently stored in the database including the related entities and by merging all changes you have done in the UI into that original graph. Your else case could then look like this:
//...
else
{
var licenseInDb = ActiveContext.Licenses.Include(l => l.Domains)
.SingleOrDefault(l => l.Id == item.Id)
if (licenseInDb != null)
{
// Update the license (only its scalar properties)
ActiveContext.Entry(licenseInDb).CurrentValus.SetValues(item);
// Delete domains from DB that have been deleted in UI
foreach (var domainInDb in licenseInDb.Domains.ToList())
if (!item.Domains.Any(d => d.Id == domainInDb.Id))
ActiveContext.Domains.Remove(domainInDb);
foreach (var domain in item.Domains)
{
var domainInDb = licenseInDb.Domains
.SingleOrDefault(d => d.Id == domain.Id);
if (domainInDb != null)
// Update existing domains
ActiveContext.Entry(domainInDb).CurrentValus.SetValues(domain);
else
// Insert new domains
licenseInDb.Domains.Add(domain);
}
}
}
ActiveContext.SaveChanges();
//...
You can also try out this project called "GraphDiff" which intends to do this work in a generic way for arbitrary detached object graphs.
The alternative is to track all changes in some custom fields in the UI layer and then evaluate the tracked state changes when the data get posted back to set the appropriate entity states. Because you are in a web application it basically means that you have to track changes in the browser (most likely requiring some Javascript) while the user changes values, adds new items or deletes items. In my opinion this solution is even more difficult to implement.
This should be enough to do what you are looking to do. Let me know if you have more questions about the code.
public void SaveLicense(License item)
{
if (account == null)
{
context.Licenses.Add(item);
}
else if (item.Id > 0)
{
var currentItem = context.Licenses
.Single(t => t.Id == item.Id);
context.Entry(currentItem ).CurrentValues.SetValues(item);
}
ActiveContext.SaveChanges();
}

Updating objects in an array is often failing RavenDB

Iterating through a result set and updating is often failing to persist the update
if (actionObject.ActionType==ActionType.TradeComplete)
{
var results = _session.Query<Model.ActionObject>().Where(x => x.ActionType == ActionType.TradeRequest && x.ActionObjectId==actionObject.ActionObjectId);
foreach (var result in results)
{
result.State = State.Closed;
}
}
_session.Store(actionObject);
_session.SaveChanges();
Often, the object is not getting its state set to state.closed..
I see the discussion around patch commands, but there's little to no documentation on how to do that with a query that has multiple parameters..
Any idea why it isn't persisting?
Edit: my object does not have an ID, could that be the problem? It seems Raven is supposed to track the objects, and often is...

Raven DB: How can I delete all documents of a given type

More specifically in Raven DB, I want to create a generic method with a signature like;
public void Clear<T>() {...
Then have Raven DB clear all documents of the given type.
I understand from other posts by Ayende to similar questions that you'd need an index in place to do this as a batch.
I think this would involve creating an index that maps each document type - this seems like a lot of work.
Does anyone know an efficient way of creating a method like the above that will do a set delete directly in the database?
I assume you want to do this from the .NET client. If so, use the standard DocumentsByEntityName index:
var indexQuery = new IndexQuery { Query = "Tag:" + collectionName };
session.Advanced.DocumentStore.DatabaseCommands.DeleteByIndex(
"Raven/DocumentsByEntityName",
indexQuery,
new BulkOperationOptions { AllowStale = true });
var hilo = session.Advanced.DocumentStore.DatabaseCommands.Get("Raven/H‌​ilo/", collectionName);
if (hilo != null) {
session.Advanced.DocumentStore.DatabaseCommands.Delete(hilo.‌​Key, hilo.Etag);
}
Where collectionName is the actual name of your collection.
The first operation deletes the items. The second deletes the HiLo file.
Also check out the official documentation - How to delete or update documents using index.
After much experimentation I found the answer to be quite simple, although far from obvious;
public void Clear<T>()
{
session.Advanced.DocumentStore.DatabaseCommands.PutIndex(indexName, new IndexDefinitionBuilder<T>
{
Map = documents => documents.Select(entity => new {})
});
session.Advanced.DatabaseCommands.DeleteByIndex(indexName, new IndexQuery());
}
Of course you almost certainly wouldn't define your index and do your delete in one go, I've put this as a single method for the sake of brevity.
My own implementation defines the indexes on application start as recommended by the documentation.
If you wanted to use this approach to actually index a property of T then you would need to constrain T. For example if I have an IEntity that all my document classes inherit from and this class specifies a property Id. Then a 'where T : IEntity' would allow you to use that property in the index.
It's been said in other places, but it's also worth noting that once you define a static index Raven will probably use it, this can cause your queries to seemingly not return data that you've inserted:
RavenDB Saving to disk query
I had this problem as well and this is the solution that worked for me. I'm only working in a test project, so this might be slow for a bigger db, but Ryan's answer didn't work for me.
public static void ClearDocuments<T>(this IDocumentSession session)
{
var objects = session.Query<T>().ToList();
while (objects.Any())
{
foreach (var obj in objects)
{
session.Delete(obj);
}
session.SaveChanges();
objects = session.Query<T>().ToList();
}
}
You can do that using:
http://blog.orangelightning.co.uk/?p=105

given a list of objects using C# push them to ravendb without knowing which ones already exist

Given 1000 documents with a complex data structure. for e.g. a Car class that has three properties, Make and Model and one Id property.
What is the most efficient way in C# to push these documents to raven db (preferably in a batch) without having to query the raven collection individually to find which to update and which to insert. At the moment I have to going like so. Which is totally inefficient.
note : _session is a wrapper on the IDocumentSession where Commit calls SaveChanges and Add calls Store.
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var page = 0;
const int total = 30;
do
{
var paged = sales.Skip(page*total).Take(total);
if (!paged.Any()) return;
foreach (var sale in paged)
{
var current = sale;
var existing = _session.Query<Sale>().FirstOrDefault(s => s.Id == current.Id);
if (existing != null)
existing = current;
else
_session.Add(current);
}
_session.Commit();
page++;
} while (true);
}
Your session code doesn't seem to track with the RavenDB api (we don't have Add or Commit).
Here is how you do this in RavenDB
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
sales.ForEach(session.Store);
session.SaveChanges();
}
Your code sample doesn't work at all. The main problem is that you cannot just switch out the references and expect RavenDB to recognize that:
if (existing != null)
existing = current;
Instead you have to update each property one-by-one:
existing.Model = current.Model;
existing.Make = current.Model;
This is the way you can facilitate change-tracking in RavenDB and many other frameworks (e.g. NHibernate). If you want to avoid writing this uinteresting piece of code I recommend to use AutoMapper:
existing = Mapper.Map<Sale>(current, existing);
Another problem with your code is that you use Session.Query where you should use Session.Load. Remember: If you query for a document by its id, you will always want to use Load!
The main difference is that one uses the local cache and the other not (the same applies to the equivalent NHibernate methods).
Ok, so now I can answer your question:
If I understand you correctly you want to save a bunch of Sale-instances to your database while they should either be added if they didn't exist or updated if they existed. Right?
One way is to correct your sample code with the hints above and let it work. However that will issue one unnecessary request (Session.Load(existingId)) for each iteration. You can easily avoid that if you setup an index that selects all the Ids of all documents inside your Sales-collection. Before you then loop through your items you can load all the existing Ids.
However, I would like to know what you actually want to do. What is your domain/use-case?
This is what works for me right now. Note: The InjectFrom method comes from Omu.ValueInjecter (nuget package)
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var ids = sales.Select(i => i.Id);
var existingSales = _ravenSession.Load<Sale>(ids);
existingSales.ForEach(s => s.InjectFrom(sales.Single(i => i.Id == s.Id)));
var existingIds = existingSales.Select(i => i.Id);
var nonExistingSales = sales.Where(i => !existingIds.Any(x => x == i.Id));
nonExistingSales.ForEach(i => _ravenSession.Store(i));
_ravenSession.SaveChanges();
}