Inheriting data from ancestor documents in RavenDB - ravendb

I'm using RavenDB to store three types of curriculum: Units, Lessons, and Activities. These three types all inherit from CurriculumBase:
public abstract class CurriculumBase
{
public string Id { get; set; }
public string Title { get; set; }
public List<string> SubjectAreaIds { get; set; }
// non-relevant properties removed
}
These documents have a hierarchical relationship, so I've modeled the hierarchy as a separate single document as recommended here: Modelling Hierarchical Data with RavenDB
public class CurriculumHierarchy
{
public class Node
{
public string CurriculumId { get; set; }
public string Title { get; set; }
public List<Node> Children { get; set; }
public Node()
{
Children = new List<Node>();
}
}
public List<Node> RootCurriculum { get; set; }
public CurriculumHierarchy()
{
RootCurriculum = new List<Node>();
}
}
I need to be able to do searches across all curriculum documents. For simple properties, that seems easy enough to do with a multi-map index.
However one of the properties I need to be able to search by (in combination with the other search criteria) is SubjectAreaId. I need to be able to get curriculum for which it or any of its ancestors have the specified subject area id(s). In other words, for search purposes, documents should inherit the subjectAreaIds of their ancestors.
I've considered de-normalizing subjectAreaIds, and storing the full calculated set of subjectAreaIds in each document, but that will require updates whenever the hierarchy itself or the subjectAreaIds of any of a given document's ancestors change. I'm hoping this is something I can accomplish with an index, or perhaps an entirely different approach is needed.

You can use LoadDocument to load the parents during indexing.
http://ravendb.net/docs/article-page/3.0/csharp/indexes/indexing-related-documents

The main challenge I encountered was that I had written code in CurriculumHierarchy to get a document's ancestors, but this code isn't executable during indexing.
To solve this, I added a read-only property to CurriculumHierarchy which generates a dictionary of ancestors for each document:
public Dictionary<string, IEnumerable<string>> AncestorLookup
{
get
{
// Not shown: build a dictionary where the key is an
// ID and the value is a list of the IDs for
// that item's ancestors
}
}
This dictionary is serialized by Raven and therefore available for indexing.
Then my index ended up looking like this:
public class Curriculum_Search : AbstractMultiMapIndexCreationTask
{
public Curriculum_Search()
{
AddMap<Activity>(
activities =>
from activity in activities
let hierarchy =
LoadDocument<CurriculumHierarchy>("curriculum_hierarchy")
let ancestors =
LoadDocument<CurriculumBase>(hierarchy.AncestorLookup[activity.Id])
select new
{
subjectAreaIds = ancestors.SelectMany(x => x.SubjectAreaIds).Distinct().Union(activity.SubjectAreaIds),
});
// Not shown: Similar AddMap statements for Lessons and Units
}
}
I was a bit concerned about performance, but since there are less than 2000 total curriculum documents, this seems to perform acceptably.

Related

Returning a document and its path in a hierarchy using RavenDB

I am using RavenDB for the first time as a database for a website. I am just starting out and thinking about how to represent the page website hierarchy in the database. I read this article Modelling hierarchical data with RavenDB and it shows a really neat way of storing a hierarchy in a document database and hence I am running with this design.
So I have my Page document
public class Page
{
public string Id { get; set; }
public string Slug { get; set; }
}
and my PagesHierarchy document.
public class PagesHierarchyTree
{
public class Node
{
public string PageId { get; set; }
public List<Node> Children { get; set; }
}
public List<Node> RootPages { get; set; }
}
The idea is to have the PagesHierarchyTree represent the tree and this document has reference id's to the actual documents.
So, now to my question. I want to create an index where I can find a document (page) based on the slug but also return the slug path i.e a/b/c based on where the document lives in the tree.
I read about Indexing Hierarchical Data and Indexing Related Documents but i`m struggling to bring them together.
Can someone help me with this or point me in the right direction?
I got my answer from the ravendb google groups forum found here.

Why do I need to name the properties in my index with underscore?

Given that I have the following structure (unnecessary details stripped out)
public class Product {
public Guid Id { get; set; }
public string Name { get; set; }
public Manufacturer Manufacturer { get; set; }
}
public class Manufacturer {
public Guid Id { get; set; }
public string Name { get; set; }
}
If I have a lot of these kind of products stored in raven and I want to index them by manufacturer id (or maybe some other things as well) I'd make an index such as this (of course in real life this index also contains some other information as well...)
public class ProductManufacturerIndex : AbstractIndexCreationTask<Product> {
public ProductManufacturerIndex() {
Map = products => from product in products
select new {
Manufacturer_Id = product.Manufacturer.Id,
};
}
}
My question here is, why do I need to name my field Manufacturer_Id? If I do not name it Manufacturer_Id I get exceptions when attempting to query my index since the manufacturer id column is not indexed.
Basically, why can't I do this? (Which would be my first guess)
public class ProductManufacturerIndex : AbstractIndexCreationTask<Product> {
public ProductManufacturerIndex() {
Map = products => from product in products
select new {
product.Manufacturer.Id,
};
}
}
There is a naming convention that RavenDB uses. If you aren't naming your fields properly, it doesn't know how to map things.
In this case, the second index you use has a property of Id, but RavenDB has no way of knowing that you mapped the Manufacturer's id, and not the root id.
That is why we have this convention. You can change it if you really want to, but it is generally not recommended.

Easier way to store list of id's in a parent document using RavenDB

If I have a parent document that can have multiple children such as a Store can have multiple Products is there any easier way to store a list of Product.Id in the Store document?
Currently I am just storing the Product objects first and then looping through them to get the Product.Id for the Store.ProductIds property.
Store
class Store
{
public string Id { get; set; }
public string Name { get; set; }
public IList<string> ProductIds { get; set; }
[JsonIgnore]
public IList<Product> Products { get; set; }
}
Product
class Product
{
public string Id { get; set; }
public string Name { get; set; }
}
Current Working Save Method
var store = new Store()
{
Name = "Walmart",
Products = new List<Product>
{
new Product {Name = "Ball"},
new Product {Name = "Bat"}
}
};
using (var session = DocumentStore.OpenSession())
{
foreach (var product in store.Products)
{
session.Store(product);
}
session.SaveChanges();
var list = new List<string>();
foreach (var product in store.Products)
{
list.Add(product.Id);
}
store.ProductIds = list;
session.Store(store);
session.SaveChanges();
}
To answer your specific question - there are two things you can simplify with the code:
You can eliminate the first session.SaveChanges(); The product ids will be created when you call .Store() on the product.
You can gather the product ids with some linq to collapse it to one line:
store.ProductIds = store.Products.Select(x=> x.Id).ToList();
You still have the same general approach though - it's just simplifying the code a bit.
This approach will work, but do realize that you are just putting the Products in the Store for convenience. [JsonIgnore] is ok here, but it only helps with serialization - not deserialization. In other words, when loading a store, only the ProductIds list will be populated. You would have to load them separately (possibly using .Include())
Personally, I would take the Products list out of the Store object altogether. Other relational products like Entity Framework and NHibernate use virtual members and proxy classes for this purpose, but it has little meaning in RavenDB. A consumer of your class won't know that the property is ignored, so they might misunderstand its usage. When I see the Products property, my expectation is that each Product is fully embedded in the document - in which case you wouldn't need the separate ProductIds list. Having them both with one ignored just causes confusion.
Another argument against your proposed design would be that it implicitly makes every product in every store unique. This is because you are creating the products with the store, and then adding each one separately. If it is indeed the case that all products are unique (not just "ball", but "this specific ball"), then you could just embed the product without the [JsonIgnore] or the ProductIds list, and there would be no need for Product to exist as a separate document. In the more likely scenario that products are not unique (multiple stores can sell bats and balls), then you have two options:
class Store
{
public string Id { get; set; }
public string Name { get; set; }
public IList<string> ProductIds { get; set; }
}
class Store
{
public string Id { get; set; }
public string Name { get; set; }
public IList<Product> Products { get; set; }
}
Product would still be in its own document with either case - the second scenario would be used as a denormalized reference so you can get at the product name without loading the product. This is acceptable, but if product names can change, then you have lots of patching to do.
Sorry if there's no clean "do it this way" answer. Raven has lots of options, sometimes one way or another works better depending on all the different ways you might use it. Me personally - I would just keep the list of ProductIds. You can always index the related document to pull in the product name for querying purposes.

RavenDB Persisting chain of relationships

I'm working on a collaborative document editing tool that's going to use RavenDB for persistence. In my domain I have a document class that looks like this.
public class Document
{
public string Id { get; private set; }
public string Name { get; set; }
public IRevision CurrentRevision {get; private set; }
public string Contents {get { return CurrentRevision.GenerateEditedContent(); }}
}
As you can see that document then has a CurrentRevision property that points to an IRevision object that looks like this.
public interface IRevision
{
IRevision PreviousRevisionAppliedTo { get; }
IRevision NextRevisionApplied { get; set; }
Guid Id { get; }
string GenerateEditedContent();
}
So the basic idea is that the document's contents are generated on the fly by checking out the current revision, which in turn checks it's parent revision, and so on and so forth.
Out of the box, RavenDB doesn't seem to handle persisting this chain of object references the way I need it to. I've been trying to persist it by just calling Session.Store(document), and hoping that the list of associated revisions would get stored as well. I've looked into some pieces of the RavenDB framework like custom serializers, but I can't figure out a clear path that would allow me to deserialize and reserialize the data as I would like. What's a good way to handle this situation.

RavenDB Modeling - a single document vs multiple documents?

Given a simple example as follows, I'd like some guidance on whether to store as a single document vs multiple documents.
class User
{
public string Id;
public string UserName;
public List<Post> Posts;
}
class Post
{
public string Id;
public string Content;
}
Once the data is stored, there are times when I will want all the posts for a given user. Sometimes I might want posts across multiple users that meet a particular criteria.
Should I store each User as a document (with Posts embedded), or does it make more sense to store Users and Posts as seperate documents, and have some sort of ID in my post to link it back to a User?
Now, what if each user belonged to an Organization (there will be hundreds of organizations in my application)?
class Organization
{
public string Id;
public List<User> users;
}
Should I then stay with the single document approach? In this case I would store one giant document for each organization, which will contain embedded users, which in turn contain embedded posts?
You should keep them as separate documents. User, Organization, and Post are great examples of aggregate entities, and in Raven, each aggregate is usually its own document.
Only entities which are not aggregates should be nested in the same document. For example, in Post you might have a List<Comment>. Comment and Post are both entities, but only Post is an aggregate.
You should instead model them with references:
public class User
{
public string Id { get; set; }
public string Name { get; set; }
public List<string> PostIds { get; set; }
}
public class Post
{
public string Id { get; set; }
public string Content { get; set; }
}
public class Organization
{
public string Id { get; set; }
public List<string> UserIds { get; set; }
}
Optionally, you can denormalize some of the data into your references where appropriate:
public class UserRef
{
public string Id { get; set; }
public string Name { get; set; }
}
public class Organization
{
public string Id { get; set; }
public List<UserRef> Users { get; set; }
}
Denormalizing the user's name into the organization document has the benefit of not needing to fetch each user document when displaying the organization. However, it has the drawback of having to update the organization document any time a user's name is changed. You should weigh the pros and cons of this each time you consider a relationship. There is no one right answer for all cases.
Also, you should be considering how data will be really used. In practice, you will probably find that your Organization class may not need a user list at all. Instead, you could put a string OrganizationId property on the User class. That would be easier to maintain, and if you wanted a list of users in an organization, you could query for that information using an index.
You should read more in the raven documentation on Document Structure Design and Handling Document Relationships.