Raven DB: How can I delete all documents of a given type - ravendb

More specifically in Raven DB, I want to create a generic method with a signature like;
public void Clear<T>() {...
Then have Raven DB clear all documents of the given type.
I understand from other posts by Ayende to similar questions that you'd need an index in place to do this as a batch.
I think this would involve creating an index that maps each document type - this seems like a lot of work.
Does anyone know an efficient way of creating a method like the above that will do a set delete directly in the database?

I assume you want to do this from the .NET client. If so, use the standard DocumentsByEntityName index:
var indexQuery = new IndexQuery { Query = "Tag:" + collectionName };
session.Advanced.DocumentStore.DatabaseCommands.DeleteByIndex(
"Raven/DocumentsByEntityName",
indexQuery,
new BulkOperationOptions { AllowStale = true });
var hilo = session.Advanced.DocumentStore.DatabaseCommands.Get("Raven/H‌​ilo/", collectionName);
if (hilo != null) {
session.Advanced.DocumentStore.DatabaseCommands.Delete(hilo.‌​Key, hilo.Etag);
}
Where collectionName is the actual name of your collection.
The first operation deletes the items. The second deletes the HiLo file.
Also check out the official documentation - How to delete or update documents using index.

After much experimentation I found the answer to be quite simple, although far from obvious;
public void Clear<T>()
{
session.Advanced.DocumentStore.DatabaseCommands.PutIndex(indexName, new IndexDefinitionBuilder<T>
{
Map = documents => documents.Select(entity => new {})
});
session.Advanced.DatabaseCommands.DeleteByIndex(indexName, new IndexQuery());
}
Of course you almost certainly wouldn't define your index and do your delete in one go, I've put this as a single method for the sake of brevity.
My own implementation defines the indexes on application start as recommended by the documentation.
If you wanted to use this approach to actually index a property of T then you would need to constrain T. For example if I have an IEntity that all my document classes inherit from and this class specifies a property Id. Then a 'where T : IEntity' would allow you to use that property in the index.
It's been said in other places, but it's also worth noting that once you define a static index Raven will probably use it, this can cause your queries to seemingly not return data that you've inserted:
RavenDB Saving to disk query

I had this problem as well and this is the solution that worked for me. I'm only working in a test project, so this might be slow for a bigger db, but Ryan's answer didn't work for me.
public static void ClearDocuments<T>(this IDocumentSession session)
{
var objects = session.Query<T>().ToList();
while (objects.Any())
{
foreach (var obj in objects)
{
session.Delete(obj);
}
session.SaveChanges();
objects = session.Query<T>().ToList();
}
}

You can do that using:
http://blog.orangelightning.co.uk/?p=105

Related

given a list of objects using C# push them to ravendb without knowing which ones already exist

Given 1000 documents with a complex data structure. for e.g. a Car class that has three properties, Make and Model and one Id property.
What is the most efficient way in C# to push these documents to raven db (preferably in a batch) without having to query the raven collection individually to find which to update and which to insert. At the moment I have to going like so. Which is totally inefficient.
note : _session is a wrapper on the IDocumentSession where Commit calls SaveChanges and Add calls Store.
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var page = 0;
const int total = 30;
do
{
var paged = sales.Skip(page*total).Take(total);
if (!paged.Any()) return;
foreach (var sale in paged)
{
var current = sale;
var existing = _session.Query<Sale>().FirstOrDefault(s => s.Id == current.Id);
if (existing != null)
existing = current;
else
_session.Add(current);
}
_session.Commit();
page++;
} while (true);
}
Your session code doesn't seem to track with the RavenDB api (we don't have Add or Commit).
Here is how you do this in RavenDB
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
sales.ForEach(session.Store);
session.SaveChanges();
}
Your code sample doesn't work at all. The main problem is that you cannot just switch out the references and expect RavenDB to recognize that:
if (existing != null)
existing = current;
Instead you have to update each property one-by-one:
existing.Model = current.Model;
existing.Make = current.Model;
This is the way you can facilitate change-tracking in RavenDB and many other frameworks (e.g. NHibernate). If you want to avoid writing this uinteresting piece of code I recommend to use AutoMapper:
existing = Mapper.Map<Sale>(current, existing);
Another problem with your code is that you use Session.Query where you should use Session.Load. Remember: If you query for a document by its id, you will always want to use Load!
The main difference is that one uses the local cache and the other not (the same applies to the equivalent NHibernate methods).
Ok, so now I can answer your question:
If I understand you correctly you want to save a bunch of Sale-instances to your database while they should either be added if they didn't exist or updated if they existed. Right?
One way is to correct your sample code with the hints above and let it work. However that will issue one unnecessary request (Session.Load(existingId)) for each iteration. You can easily avoid that if you setup an index that selects all the Ids of all documents inside your Sales-collection. Before you then loop through your items you can load all the existing Ids.
However, I would like to know what you actually want to do. What is your domain/use-case?
This is what works for me right now. Note: The InjectFrom method comes from Omu.ValueInjecter (nuget package)
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var ids = sales.Select(i => i.Id);
var existingSales = _ravenSession.Load<Sale>(ids);
existingSales.ForEach(s => s.InjectFrom(sales.Single(i => i.Id == s.Id)));
var existingIds = existingSales.Select(i => i.Id);
var nonExistingSales = sales.Where(i => !existingIds.Any(x => x == i.Id));
nonExistingSales.ForEach(i => _ravenSession.Store(i));
_ravenSession.SaveChanges();
}

How to work around NHibernate caching?

I'm new to NHibernate and was assigned to a task where I have to change a value of an entity property and then compare if this new value (cached) is different from the actual value stored on the DB. However, every attempt to retrieve this value from the DB resulted in the cached value. As I said, I'm new to NHibernate, maybe this is something easy to do and obviously could be done with plain ADO.NET, but the client demands that we use NHibernate for every access to the DB. In order to make things clearer, those were my "successful" attempts (ie, no errors):
1
DetachedCriteria criteria = DetachedCriteria.For<User>()
.SetProjection(Projections.Distinct(Projections.Property(UserField.JobLoad)))
.Add(Expression.Eq(UserField.Id, userid));
return GetByDetachedCriteria(criteria)[0].Id; //this is the value I want
2
var JobLoadId = DetachedCriteria.For<User>()
.SetProjection(Projections.Distinct(Projections.Property(UserField.JobLoad)))
.Add(Expression.Eq(UserField.Id, userid));
ICriteria criteria = JobLoadId.GetExecutableCriteria(NHibernateSession);
var ids = criteria.List();
return ((JobLoad)ids[0]).Id;
Hope I made myself clear, sometimes is hard to explain a problem when even you don't quite understand the underlying framework.
Edit: Of course, this is a method body.
Edit 2: I found out that it doesn't work properly for the method call is inside a transaction context. If I remove the transaction, it works fine, but I need it to be in this context.
I do that opening a new stateless session for geting the actual object in the database:
User databaseuser;
using (IStatelessSession session = SessionFactory.OpenStatelessSession())
{
databaseuser = db.get<User>("id");
}
//do your checks
Within a session, NHibernate will return the same object from its Level-1 Cache (aka Identity Map). If you need to see the current value in the database, you can open a new session and load the object in that session.
I would do it like this:
public class MyObject : Entity
{
private readonly string myField;
public string MyProperty
{
get { return myField; }
set
{
if (value != myField)
{
myField = value;
DoWhateverYouNeedToDoWhenItIsChanged();
}
}
}
}
googles nhforge
http://nhibernate.info/doc/howto/various/finding-dirty-properties-in-nhibernate.html
This may be able to help you.

Get existing entity if it exists or create a new one

I'm importing data that may or may not exist already in my database. I'd like NHibernate to associate any entities with the existing db one if it exists (probably just setting the primary key/id), or create a new one if it doesn't. I'm using S#arp architecture for my framework (MVC 2, NHibernate, Fluent).
I've added the [HasUniqueDomainSignature] attribute to the class, and a [DomainSignature] attribute to the properties I want to use for comparison. The only way I can think to do it (which is not an acceptable solution and may not even work) is the following (psuedo C#):
foreach (Book importedBook in importedBooks){
foreach (Author author in importedBook.Authors){
if (!author.IsValid()){ // NHibernate Validator will check DomainSignatures
author = _authorRepository.GetByExample(author); // This would be to get the db object with the same signature,
//but I don't think I could even update this as I iterate through it.
}
}
}
As you can see, this is both messy, and non-sensical. Add to that the fact that I've got a half dozen associations on the Book (subject, format, etc), and it doesn't make any sense. There's got to be an easy way to do this that I'm missing. I'm not a novice with NHibernate, but I'm definitely not an expert.
I might not be understanding the problem, but how can the data "may or may not exist in the database"? For example, if a Book has 2 Authors, how is the relationship stored at the database level if the Author doesn't exist?
It seems as if you're trying to use NHibernate to import your data (or create an entity if it doesn't exist) which doesn't seem correct.
Most database implementations support a conditional UPDATE-or-INSERT syntax. Oracle, for example, has a MERGE command. In combination with a Hibernate <sql-insert> block in your mapping you should be able to work something out. I don't know Fluent but I assume it supports this too.
Just realize I never gave an answer or approved another's answer. I ended up just writing a new SaveOrUpdate which takes a parameter to check for existing before persisting. I also added an attribute to my domain models to overwrite when saving/updating (although in retrospect it's only on updating that it'd be overwriting).
Here's the code if it can help anyone else in this dilemma:
public TEntity SaveOrUpdate<TEntity>(TEntity entity, bool checkForExistingEntity)
{
IRepository<TEntity> repository = new Repository<TEntity>();
if (checkForExistingEntity) {
if (entity is Entity) {
IEnumerable<PropertyInfo> props = (entity as Entity).GetSignatureProperties();
Dictionary<string, object> parameters =
props.ToDictionary(propertyInfo => propertyInfo.Name, propertyInfo => propertyInfo.GetValue(entity, null));
TEntity duplicateEntity = repository.FindOne(parameters);
if (duplicateEntity != null) {
// Update any properties with the OverwriteOnSaveUpdate attribute
foreach (var property in RepositoryHelper.GetUpdatableProperties(typeof(TEntity)))
{
object initialValue = property.GetValue(entity, null);
property.SetValue(duplicateEntity, initialValue, null);
}
// Fill in any blank properties on db version
foreach (var property in typeof(TEntity).GetProperties())
{
if (property.GetValue(duplicateEntity, null) == null) {
object initialValue = property.GetValue(entity, null);
property.SetValue(duplicateEntity, initialValue, null);
}
}
return duplicateEntity;
}
}
}
return SaveOrUpdate(entity);
}

NHibernate: Creating a criteria which applies for all queries on a table

Using Castle ActiveRecord / NHibernate: Is there a way you can force an ICriterion on all queries on a table?
For example, a good amount of of my tables have a "UserId" column. I might want to ensure that I am always selecting rows for the logged in user. I can easily create an ICriterion object, but I am forced to supply it for different methods: FindAll(), FindFirst(), FindLast() etc.
Is there a way to force a WHERE clause on all queries to a Castle ActiveRecord?
I finally found a great solution (using filters). Since Castle AR does not have any native API for mapping to NHibernate filters, this part was pretty much undocumented. So here goes.
This example filter, will make sure you will never get news more than a year old, no matter what kind of query you use on the ActiveRecord. You can probably think of more practical applications for this.
First, create an ActiveRecord "News".
Use the following code before you initialize ActiveRecordStarter.
ActiveRecordStarter.MappingRegisteredInConfiguration += MappingRegisteredInConfiguration;
Castle.ActiveRecord.Framework.InterceptorFactory.Create = () => { return new EnableFiltersInterceptor(); };
Then, add the missing function and class:
void MappingRegisteredInConfiguration(Castle.ActiveRecord.Framework.ISessionFactoryHolder holder)
{
var cfg = holder.GetConfiguration(typeof (ActiveRecordBase));
var typeParameters = new Dictionary<string, IType>
{
{"AsOfDate", NHibernateUtil.DateTime}
};
cfg.AddFilterDefinition(new FilterDefinition("Latest", "", typeParameters));
var mappings = cfg.CreateMappings(Dialect.GetDialect(cfg.Properties));
var newsMapping = cfg.GetClassMapping(typeof (News));
newsMapping.AddFilter("Latest", ":AsOfDate <= Date");
}
public class EnableFiltersInterceptor : EmptyInterceptor
{
public override void SetSession(ISession session)
{
session.EnableFilter("Latest").SetParameter("AsOfDate", DateTime.Now.AddYears(-1));
}
}
And voila! Queries on News, e.g. FindAll(), DeleteAll(), FindOne(), Exists(), etc. will never touch entries more than a year old.
The closest thing would be using filters. See http://ayende.com/Blog/archive/2009/05/04/nhibernate-filters.aspx

Encapsulating common logic (domain driven design, best practices)

Updated: 09/02/2009 - Revised question, provided better examples, added bounty.
Hi,
I'm building a PHP application using the data mapper pattern between the database and the entities (domain objects). My question is:
What is the best way to encapsulate a commonly performed task?
For example, one common task is retrieving one or more site entities from the site mapper, and their associated (home) page entities from the page mapper. At present, I would do that like this:
$siteMapper = new Site_Mapper();
$site = $siteMapper->findByid(1);
$pageMapper = new Page_Mapper();
$site->addPage($pageMapper->findHome($site->getId()));
Now that's a fairly trivial example, but it gets more complicated in reality, as each site also has an associated locale, and the page actually has multiple revisions (although for the purposes of this task I'd only be interested in the most recent one).
I'm going to need to do this (get the site and associated home page, locale etc.) in multiple places within my application, and I cant think of the best way/place to encapsulate this task, so that I don't have to repeat it all over the place. Ideally I'd like to end up with something like this:
$someObject = new SomeClass();
$site = $someObject->someMethod(1); // or
$sites = $someObject->someOtherMethod();
Where the resulting site entities already have their associated entities created and ready for use.
The same problem occurs when saving these objects back. Say I have a site entity and associated home page entity, and they've both been modified, I have to do something like this:
$siteMapper->save($site);
$pageMapper->save($site->getHomePage());
Again, trivial, but this example is simplified. Duplication of code still applies.
In my mind it makes sense to have some sort of central object that could take care of:
Retrieving a site (or sites) and all nessessary associated entities
Creating new site entities with new associated entities
Taking a site (or sites) and saving it and all associated entities (if they've changed)
So back to my question, what should this object be?
The existing mapper object?
Something based on the repository pattern?*
Something based on the unit of work patten?*
Something else?
* I don't fully understand either of these, as you can probably guess.
Is there a standard way to approach this problem, and could someone provide a short description of how they'd implement it? I'm not looking for anyone to provide a fully working implementation, just the theory.
Thanks,
Jack
Using the repository/service pattern, your Repository classes would provide a simple CRUD interface for each of your entities, then the Service classes would be an additional layer that performs additional logic like attaching entity dependencies. The rest of your app then only utilizes the Services. Your example might look like this:
$site = $siteService->getSiteById(1); // or
$sites = $siteService->getAllSites();
Then inside the SiteService class you would have something like this:
function getSiteById($id) {
$site = $siteRepository->getSiteById($id);
foreach ($pageRepository->getPagesBySiteId($site->id) as $page)
{
$site->pages[] = $page;
}
return $site;
}
I don't know PHP that well so please excuse if there is something wrong syntactically.
[Edit: this entry attempts to address the fact that it is oftentimes easier to write custom code to directly deal with a situation than it is to try to fit the problem into a pattern.]
Patterns are nice in concept, but they don't always "map". After years of high end PHP development, we have settled on a very direct way of handling such matters. Consider this:
File: Site.php
class Site
{
public static function Select($ID)
{
//Ensure current user has access to ID
//Lookup and return data
}
public static function Insert($aData)
{
//Validate $aData
//In the event of errors, raise a ValidationError($ErrorList)
//Do whatever it is you are doing
//Return new ID
}
public static function Update($ID, $aData)
{
//Validate $aData
//In the event of errors, raise a ValidationError($ErrorList)
//Update necessary fields
}
Then, in order to call it (from anywhere), just run:
$aData = Site::Select(123);
Site::Update(123, array('FirstName' => 'New First Name'));
$ID = Site::Insert(array(...))
One thing to keep in mind about OO programming and PHP... PHP does not keep "state" between requests, so creating an object instance just to have it immediately destroyed does not often make sense.
I'd probably start by extracting the common task to a helper method somewhere, then waiting to see what the design calls for. It feels like it's too early to tell.
What would you name this method ? The name usually hints at where the method belongs.
class Page {
public $id, $title, $url;
public function __construct($id=false) {
$this->id = $id;
}
public function save() {
// ...
}
}
class Site {
public $id = '';
public $pages = array();
function __construct($id) {
$this->id = $id;
foreach ($this->getPages() as $page_id) {
$this->pages[] = new Page($page_id);
}
}
private function getPages() {
// ...
}
public function addPage($url) {
$page = ($this->pages[] = new Page());
$page->url = $url;
return $page;
}
public function save() {
foreach ($this->pages as $page) {
$page->save();
}
// ..
}
}
$site = new Site($id);
$page = $site->addPage('/');
$page->title = 'Home';
$site->save();
Make your Site object an Aggregate Root to encapsulate the complex association and ensure consistency.
Then create a SiteRepository that has the responsibility of retrieving the Site aggregate and populating its children (including all Pages).
You will not need a separate PageRepository (assuming that you don't make Page a separate Aggregate Root), and your SiteRepository should have the responsibility of retrieving the Page objects as well (in your case by using your existing Mappers).
So:
$siteRepository = new SiteRepository($myDbConfig);
$site = $siteRepository->findById(1); // will have Page children attached
And then the findById method would be responsible for also finding all Page children of the Site. This will have a similar structure to the answer CodeMonkey1 gave, however I believe you will benefit more by using the Aggregate and Repository patterns, rather than creating a specific Service for this task. Any other retrieval/querying/updating of the Site aggregate, including any of its child objects, would be done through the same SiteRepository.
Edit: Here's a short DDD Guide to help you with the terminology, although I'd really recommend reading Evans if you want the whole picture.