RavenDB update denormalized reference and stale indexes - indexing

I have a RavenDB with some collections and about 30 indexes.
I'm trying to perform some mass updates in a specific collection (Profiles) via DatabaseCommands.UpdateByIndex and a PatchRequest, actually my code is something like this:
db.DatabaseCommands.UpdateByIndex("Profiles/ByFinder", new
Raven.Abstractions.Data.IndexQuery { }, new [] { new PatchRequest {
Type = PatchCommandType.Unset, Name = "CreatedById" } });
Where "Profiles/ByFinder" is an index that works on this specific collection.
The strange thing is that ALL the indexes in the DB go in stale state as I perform this command, even the indexes that don't work with the Profiles collection in any way.
Is that the default behaviour, and if so, there's a way to avoid it?

That is by design, whenever you modify a document, all documents are stale until they can verify that this document isn't related to them.

Related

KeystoneJS `filter` vs `Item` list access control

I am trying to understand more in depth the difference between filter and item access control.
Basically I understand that Item access control is, sort of, higher order check and will run before the GraphQL filter.
My question is, if I am doing a filter on a specific field while updating, for instance a groupID or something like this, do I need to do the same check in Item Access Control?
This will cause an extra database query that will be part of the filter.
Any thoughts on that?
The TL;DR answer...
if I am doing a filter on a specific field [..] do I need to do the same check in Item Access Control?
No, you only need to apply the restriction in one place or the other.
Generally speaking, if you can describe the restriction using filter access control (ie. as a graphQL-style filter, with the args provided) then that's the best place to do it. But, if your access control needs to behave differently based on values in the current item or the specific changes being made, item access control may be required.
Background
Access control in Keystone can be a little hard to get your head around but it's actually very powerful and the design has good reasons behind it. Let me attempt to clarify:
Filter access control is applied by adding conditions to the queries run against the database.
Imagine a content system with lists for users and posts. Users can author a post but some posts are also editable by everyone. The Post list config might have something like this:
// ..
access: {
filter: {
update: () => ({ isEditable: { equals: true } }),
}
},
// ..
What that's effectively doing is adding a condition to all update queries run for this list. So if you update a post like this:
mutation {
updatePost(where: { id: "123"}, data: { title: "Best Pizza" }) {
id name
}
}
The SQL that runs might look like this:
update "Post"
set title = 'Best Pizza'
where id = 234 and "isEditable" = true;
Note the isEditable condition that's automatically added by the update filter. This is pretty powerful in some ways but also has its limits – filter access control functions can only return GraphQL-style filters which prevents them from operating on things like virtual fields, which can't be filtered on (as they don't exist in the database). They also can't apply different filters depending on the item's current values or the specific updates being performed.
Filter access control functions can access the current session, so can do things like this:
filter: {
// If the current user is an admin don't apply the usual filter for editability
update: (session) => {
return session.isAdmin ? {} : { isEditable: { equals: true } };
},
}
But you couldn't do something like this, referencing the current item data:
filter: {
// ⚠️ this is broken; filter access control functions don't receive the current item ⚠️
// The current user can update any post they authored, regardless of the isEditable flag
update: (session, item) => {
return item.author === session.itemId ? {} : { isEditable: { equals: true } };
},
}
The benefit of filter access control is it doesn't force Keystone to read an item before an operation occurs; the filter is effectively added to the operation itself. This can makes them more efficient for the DB but does limit them somewhat. Note that things like hooks may also cause an item to be read before an operation is performed so this performance difference isn't always evident.
Item access control is applied in the application layer, by evaluating the JS function supplied against the existing item and/or the new data supplied.
This makes them a lot more powerful in some respects. You can, for example, implement the previous use case, where authors are allowed to update their own posts, like this:
item: {
// The current user can update any post they authored, regardless of the isEditable flag
update: (session, item) => {
return item.author === session.itemId || item.isEditable;
},
}
Or add further restrictions based on the specific updates being made, by referencing the inputData argument.
So item access control is arguably more powerful but they can have significant performance implications – not so much for mutations which are likely to be performed in small quantities, but definitely for read operations. In fact, Keystone won't let you define item access control for read operations. If you stop and think about this, you might see why – doing so would require reading all items in the list out of the DB and running the access control function against each one, every time a list was read. As such, the items accessible can only be restricted using filter access control.
Tip: If you think you need item access control for reads, consider putting the relevant business logic in a resolveInput hook that flattens stores the relevant values as fields, then referencing those fields using filter access control.
Hope that helps

How to write a RavenDB index as a prefiltered list of documents, not searchable

I want to get a list of users filtered by a property (string) being null or empty.
I've created an index for this but I'm not sure if my way of implementing it is the right way to do it.
public class Users_Contributors : AbstractIndexCreationTask<User>
{
public Users_Contributors()
{
Map = users => from user in users
where user.ContributionDescription != null && user.ContributionDescription != ""
select new {};
}
}
So I just want raven to "prepare" the list of users for me. I'm just gonna get all user objects out of that index with no additional filtering/where criterias at query time.
Is the above code the way to go or can I achieve the same in a better way? Im feeling Im missing something here. Thanks!
This will work just fine. The result would be that the index only contains users that have something in their ContributionDescription field.
If you want to make it slightly easier to read, you can use string.IsNullOrEmpty, but that won't have any impact on performance.
Map = users => from user in users
where !string.IsNullOrEmpty(user.ContributionDescription)
select new {};
It probably feels strange because of the empty object at the end, but that just defines the index entries. If you aren't sorting or filtering by any other field, then using an empty object is just fine. Keep in mind that the __document_id entry gets created regardless of what fields you map.

RavenDB Index created incorrectly

I have a document in RavenDB that looks looks like:
{
"ItemId": 1,
"Title": "Villa
}
With the following metadata:
Raven-Clr-Type: MyNamespace.Item, MyNamespace
Raven-Entity-Name: Doelkaarten
So I serialized with a type MyNamespace.Item, but gave it my own Raven-Entity-Name, so it get its own collection.
In my code I define an index:
public class DoelkaartenIndex : AbstractIndexCreationTask<Item>
{
public DoelkaartenIndex()
{
// MetadataFor(doc)["Raven-Entity-Name"].ToString() == "Doelkaarten"
Map = items => from item in items
where MetadataFor(item)["Raven-Entity-Name"].ToString() == "Doelkaarten"
select new {Id = item.ItemId, Name = item.Title};
}
}
In the Index it is translated in the "Maps" field to:
docs.Items
.Where(item => item["#metadata"]["Raven-Entity-Name"].ToString() == "Doelkaarten")
.Select(item => new {Id = item.ItemId, Name = item.Title})
A query on the index never gives results.
If the Maps field is manually changed to the code below it works...
from doc in docs
where doc["#metadata"]["Raven-Entity-Name"] == "Doelkaarten"
select new { Id = doc.ItemId, Name=doc.Title };
How is it possible to define in code the index that gives the required result?
RavenDB used: RavenHQ, Build #961
UPDATE:
What I'm doing is the following: I want to use SharePoint as a CMS, and use RavenDB as a ready-only replication of the SharePoint list data. I created a tool to sync from SharePoint lists to RavenDB. I have a generic type Item that I create from a SharePoint list item and that I serialize into RavenDB. So all my docs are of type Item. But they come from different lists with different properties, so I want to be able to differentiate. You propose to differentiate on an additional property, this would perfectly work. But then I will see all list items from all lists in one big Items collection... What would you think to be the best approach to this problem? Or just live with it? I want to use the indexes to create projections from all data in an Item to the actual data that I need.
You can't easily change the name of a collection this way. The server-side will use the Raven-Entity-Name metadata, but the client side will determine the collection name via the conventions registered with the document store. The default convention being to use the type name of the entity.
You can provide your own custom convention by assigning a new function to DocumentStore.Conventions.FindTypeTagName - but it would probably be cumbersome to do that for every entity. You could create a custom attribute to apply to your entities and then write the function to look for and understand that attribute.
Really the simplest way is just to call your entity Doelkaarten instead of Item.
Regarding why the change in indexing works - it's not because of the switch in linq syntax. It's because you said from doc in docs instead of from doc in docs.Items. You probably could have done from doc in docs.Doelkaartens instead of using the where clause. They are equivalent. See this page in the docs for further examples.

Check if property exists in RavenDB

I want to add property to existing document (using clues form http://ravendb.net/docs/client-api/partial-document-updates). But before adding want to check if that property already exists in my database.
Is any "special,proper ravendB way" to achieve that?
Or just load document and check if this property is null or not?
You can do this using a set based database update. You carry it out using JavaScript, which fortunately is similar enough to C# to make it a pretty painless process for anybody. Here's an example of an update I just ran.
Note: You have to be very careful doing this because errors in your script may have undesired results. For example, in my code CustomId contains something like '1234-1'. In my first iteration of writing the script, I had:
product.Order = parseInt(product.CustomId.split('-'));
Notice I forgot the indexer after split. The result? An error, right? Nope. Order had the value of 12341! It is supposed to be 1. So be careful and be sure to test it thoroughly.
Example:
Job has a Products property (a collection) and I'm adding the new Order property to existing Products.
ravenSession.Advanced.DocumentStore.DatabaseCommands.UpdateByIndex(
"Raven/DocumentsByEntityName",
new IndexQuery { Query = "Tag:Jobs" },
new ScriptedPatchRequest { Script =
#"
this.Products.Map(function(product) {
if(product.Order == undefined)
{
product.Order = parseInt(product.CustomId.split('-')[1]);
}
return product;
});"
}
);
I referenced these pages to build it:
set based ops
partial document updates (in particular the Map section)

Persist a top-level collection?

NHibernate allows me to query a database and get an IList of objects in return. Suppose I get a list of a couple of dozen objects and modify a half-dozen or so. Does NHibernate have a way to persist changes to the collection, or do I have to persist each object as I change it?
Here's an example. Suppose I run the following code:
var hql = "from Project";
var query = session.CreateQuery(hql);
var myProjectList = query.List<Project>();
I will get back an IList that contains all projects. Now suppose I execute the following code:
var myNewProject = new Project("My New Project");
myProjectList .Add(myNewProject);
And let's say I do this several times, adding several new projects to the list. Now I'm ready to persist the changes to the collection.
I'd like to persist the changes by simply passing myProjectList to the current ISession for updating. But ISession.SaveOrUpdate() appears to take only individual objects, not collections like myProjectList. Is there a way that I can persist changes to myProjectList, or do I have to persist each new object as I create it? Thanks for your help.
David Veeneman
Foresight Systems
If you load objects like in your example - then yes you have to persist them one by one.
However, if you make a small design change, and load something like : Account that has an IList<Project> - if you specify cascade "what_cascade_you_need" in the mapping , then when you change the projects on Account , you only have to save Account and everything will get saved.