NHibernate insert/lookup performance - nhibernate

I have several XML files and each file contains data of ‘root objects’ which I parse using Linq to XML and then create actual root objects which I persist using NHibernate and the sharp architecture repository. I have started to optimise the data insert and manage to add 30000 objects in about 1 hour and 40 minutes to the database. However, this is still too slow.
I think one bottle neck is the lookup of objects in the database which requires IO. Objects have to be looked up for reuse.
The root object has several authors:
public virtual IList<Author> Authors { get; set; }
Authors have this structure:
public class Author : Entity
{
public virtual Initials Initials { get; set; }
public virtual ForeName ForeName { get; set; }
public virtual LastName LastName { get; set; }
}
I have achieved a great speed up by using a typed Id (something I wouldn't normally do):
public class LastName : EntityWithTypedId<string>, IHasAssignedId<string>
{
public LastName()
{
}
public LastName(string Id)
{
SetAssignedIdTo(Id);
}
public virtual void SetAssignedIdTo(string assignedId)
{
Id = assignedId;
}
}
Which I look up (and potentially create) like this:
LastName LastName = LastNameRepository.Get(TLastName);
if (LastName == null)
{
LastName = LastNameRepository.Save(new LastName(TLastName));
LastNameRepository.DbContext.CommitChanges();
}
Author.LastName = LastName;
I am looking authors up like this:
propertyValues = new Dictionary<string, object>();
propertyValues.Add("Initials", Author.Initials);
propertyValues.Add("ForeName", Author.ForeName);
propertyValues.Add("LastName", Author.LastName);
Author TAuthor = AuthorRepository.FindOne(propertyValues);
if (TAuthor == null)
{
AuthorRepository.SaveOrUpdate(Author);
AuthorRepository.DbContext.CommitChanges();
Root.Authors.Add(Author);
}
else
{
Root.Authors.Add(TAuthor);
}
Can I improve this? Should I use stored procedures/HQL/pure SQL/ICriteria instead to perform the lookup? Could I use some form of caching to speed up the lookup and reduce IO? The CommitChanges seems to be necessary or should I wrap everything into a transaction?
I already flush my session etc. every 10 root objects.
Any feedback would be very much welcome. Many thanks in advance.
Best wishes,
Christian

In all honesty I would say that you shouldn't even be using SA/NHibernate for something like this. It's a bulk data import from XML - an ETL tool like SSIS would be a better choice. Even a hand-cranked process on the DB server would work better - step 1, load XML to a table, step 2, do the UPSERT. Incidentally, SQL 2008 introduced the MERGE command for UPSERT operations, which might be of use.
I would also agree with Dan's comment - is it really necessary to treat initials, forename and surname as separate entities? Treating them as simple strings would boost performance. What in your domain model specifies that they are entities in their own right?
If you really must continue using SA/NHibernate, have a read of this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2010/06/24/bulk-processing-with-nhibernate.aspx
The suggestion in Jimmy's blog about batching SELECTs should help quite a lot. If you plan to process a batch of 250 records at once, do all the SELECTs as a single NH command, process all the data, then do all the updates as another single batch (which I believe your use of EntityWithTypedId and the adonet.batch_size config setting will help achieve)
Finally - regarding the statement "which I parse using Linq to XML" - is that really the best way of doing it? I'm guessing that it might be, given the size of your input file, but are you aware of the approach of simply deserializing the XML file into an object graph? SO won't let me post the link to a page describing this, because I haven't earned enough reputation yet - but if you want to read up on it, Google "don't parse that xml" and the first article will explain it.
Hope this helps.
Jon

The first thing I would do is simplify the Authors entity as I don't think you need the Initials, ForeName, and LastName objects as separate entities. I think using plain strings would be more efficient:
public class Author : Entity
{
public virtual string Initials { get; set; }
public virtual string ForeName { get; set; }
public virtual string LastName { get; set; }
}

Related

EF Core generates too many queries for nested data

I have a simple class to represent a tree structure, defined like this:
public class LicenceCategory
{
[Key]
[Column("LicenceCategoryID")]
public Guid ID { get; set; }
public string Name { get; set; }
public Guid? ParentLicenceCategoryID { get; set; }
[ForeignKey("ParentLicenceCategoryID")]
public virtual List<LicenceCategory> Categories { get; set; }
}
Then, from an ASP.NET Core controller, I simply return myContext.LicenceCategory, which is about as simple as it gets.
Right now, there are five records on the database: the parent (null ParentLicenceCategoryID), and four children for that one parent. So no massive volumes and no very deep nesting. This is the SQL that gets generated, and is as I expect:
SELECT [obj].[LicenceCategoryID], [obj].[Name], [obj].[ParentLicenceCategoryID]
FROM [LicenceCategory] AS [obj]
However, it also generates this, five times:
SELECT [e].[LicenceCategoryID], [e].[Name], [e].[ParentLicenceCategoryID]
FROM [LicenceCategory] AS [e]
WHERE [e].[ParentLicenceCategoryID] = #__get_Item_0
Notice how the first statement already contains every field you need to build the tree structure client-side. Why on earth even do the extra select statements?
I noticed that if I Include navigation properties, things get much worse: For three navigation properties, I wound up with 21 select statements! Most of which are just the same statement executed again and again and again. It may do so with different parameters perhaps, but there is hardly any way to make a program any less efficient. And these are five records - what will EF do when I throw our millions of transaction records its way?
Is there a way to prevent this kind of code generation, or is EF Core simply a non-starter?

ValueObject Persistence in NHibernate / Fluent NHibernate

I'm a total newbie with ORMs and the DDD, so please, be patient with me. Also, I'm no native speaker so the domain lingo will be a little hard to express in English.
I'm developing a system to control lawsuits.
My domain has an Entity called Case.
Public class Case
{
public virtual int Id { get; set; }
public virtual List<Clients> Clients { get; set;}
public virtual LawsuitType LawsuitType { get; set;}
}
The CaseType is, from what I gathered, a Value Object. It's a simple type, it has only the case type description. Example: "Divorce", "Child Support", etc. It is only the description that interests me. But I don't want to be a free descriptor. I want to control the options presented to the user, and also do some reports.
So I was thinking to map this on Database with the table "LawsuitTypes". The table would have a int Id, and a string descriptor.
Can I accomplish that using ComponentMap? Or have I got things wrong and the CaseType is an Entity?
Thanks, Luiz Angelo.
Edit:
Using an enum was suggested. But that wouldn't work because it would mean that the LawsuitTypes are set by the developer, and not the user. Some users have the power to add/remove LawsuitTypes, while others don't.
IMHO you should treat LawsuitTypes as an own entity. Keep in mind, that you may want to extend the LawsuitTypes with additional information some day (requirements changes very fast sometimes). What comes in my mind is a "default" property or somethig like that... This means additional work of cource, but this way you are more flexible for future needs.
If I understand your question correctly, the Description("") attribute and a simple enum should work. More on that here.
public enum LawsuitTypes
{
Divorce,
[Description("Child Support")]
ChildSupport,
[Description("Some Other Element")]
SomeOtherElement
}

Repository pattern, ViewModel and ORMs

With Repository pattern and ViewModels, how do you build queries against the database if you don't want the raw database objects to leak outside the repository? How do I actually create queries without loading ALL the database in memory and using LINQ to Objects? I can't expose IQueryable to the rest of the app.
For example, with EF I have a bunch of POCOs with several properties that match db fields, but also some stuff to work around enums not being directly support (for now) as well as foreign key IDs to prevent N+1 and easier querying and so on. I don't want them to leak out to the rest of the application, I want the application to just see a normal object graph.
public class DbUser
{
public int Id { get; set; }
public string Name { get set; }
public int GroupId { get; set; }
public DbGroup Group { get; set; }
public ICollection<DbComment> { get; set; }
}
public class User
{
public int Id { get; set; }
public string Name { get set; }
public Group Group { get; set; }
public ICollection<Comment> { get; set; }
}
The problem here is my repository will internally use EF for the querying (and in-memory stuff when unit testing). But how do I implement IQueryable<User> FindAll()? I can't just do return dbContext.Users.Select(u => new User(u)), as in that case I lose all possible query ability; it'll just load the whole user collection in memory, convert all the types to User from DbUser and then build LINQ queries on the in-memory collection - that is horribly inefficient.
I can't just build queries in the repository. On some pages I have queries that select a few fields, but also calculate some complex stuff from other related objects, filter them based on the result (for example count of comments with positive score), but I also need that back in the application. I could select all objects used to get the complex stuff and return them to the application (but not as db entities) but that would mean select a LOT of data.
Basically how do I prevent the database entities from polluting the rest of the application with their cruft and hacks, while still maintaining the ability to build queries outside of the repository?
CQRS (Command Query Responsibility Segregation) solves this problem. You have the 'real' model , the Domain model, with all the business rules and all that, and a 'query-ony' model which basically is a simple poco (which can be used directly by Views) that will be returned by a specialised query only repository.
The peristence model (EF entities) are used only to 'talk' with the db, the repos always returns or deals with domain/ application objects. Basically, you have to map the EF entities to the Domain ones (and viceversa when saving). In this way, you'll have separated models each with its own purpose.

NHibernate Attributes Mapping List

I'm a new NHibernate developer. I'm using attributes and not map files and I have configured the application to create the tables automatically.
I Have two classes , Group and User.
Withing the Group class I have a list of users
public class Group
{
[NHibernate.Mapping.Attributes.Id(Name = "GroupId")]
[NHibernate.Mapping.Attributes.Generator(Class = "guid")]
public virtual Guid GroupId { get; set; }
// What Attributes do I place here
public virtual List<User> Users { get; set; }
}
I can't find the right attributes so that there will be two tables that have one to many relation.
Can anyone help?
Thanks,
Ronny
[ManyToMany], [OneToMany] or [ManyToOne] (those linked docs are fairly useless though) depending on how you want it setup. Probably [OneToMany], and then the same on a User.
You could avoid the pain by using the Fluent NHibernate library instead, if you haven't already tried it.

NHibernate Domain Model - Adding Child Objects to Collections

I'm still learning here and have a question about child collections. I have an aggregate root called Audio, which has a collection of AudioDownloads.
The downloads are records of each IP address which downloads the audio, i don't want to have duplicate records of the same IP for each Audio.
In my domain i have the following function:
public virtual void Add(AudioDownload download)
{
if (!AudioDownloads.Contains(download)) {
TotalDownloads++;
AudioDownloads.Add(download);
}
}
And this is how i am calling the Add function:
var download = new AudioDownload();
audio.Add(download);
This is returning all downloads from the database for this Audio (which chould be thousands!), also it's still adding the download even though one already exists.
I'm using S#arp with the DomainSignature approach for comparing my entities.
Here is my Domain:
public class AudioDownload : Entity, ITenantSpecific
{
public AudioDownload() { DateAdded = DateTime.Now; }
[DomainSignature]
public virtual Audio Audio { get; set; }
[DomainSignature]
public virtual string Ip { get; set; }
public virtual DateTime DateAdded { get; set; }
}
My question is...even if i can get AudioDownloads not to add duplicate entries, should i be doing it this way at all?
Thank you very much!
Paul
I expect that most ways to do this will always lead you to query all downloads from the database, which is probably not what you want.
Another approach that might be cheaper is just to have a unique key in the database defined based on AudioId and Ip. If you then insert a record that duplicates these you will get an exception from NHibernate telling you a unique key was violated: handle that exception gracefully (i.e. don't show it as an error, load the existing AudioDownload and use that in future) and you will have achieved your goal, I believe.
When you use this approach do not check whether the download is already contained in the collection, since that would still trigger loading of all records.
On the other hand: would it not be interesting to see that something was downloaded from the same Ip multiple times?