EF Core generates too many queries for nested data - sql

I have a simple class to represent a tree structure, defined like this:
public class LicenceCategory
{
[Key]
[Column("LicenceCategoryID")]
public Guid ID { get; set; }
public string Name { get; set; }
public Guid? ParentLicenceCategoryID { get; set; }
[ForeignKey("ParentLicenceCategoryID")]
public virtual List<LicenceCategory> Categories { get; set; }
}
Then, from an ASP.NET Core controller, I simply return myContext.LicenceCategory, which is about as simple as it gets.
Right now, there are five records on the database: the parent (null ParentLicenceCategoryID), and four children for that one parent. So no massive volumes and no very deep nesting. This is the SQL that gets generated, and is as I expect:
SELECT [obj].[LicenceCategoryID], [obj].[Name], [obj].[ParentLicenceCategoryID]
FROM [LicenceCategory] AS [obj]
However, it also generates this, five times:
SELECT [e].[LicenceCategoryID], [e].[Name], [e].[ParentLicenceCategoryID]
FROM [LicenceCategory] AS [e]
WHERE [e].[ParentLicenceCategoryID] = #__get_Item_0
Notice how the first statement already contains every field you need to build the tree structure client-side. Why on earth even do the extra select statements?
I noticed that if I Include navigation properties, things get much worse: For three navigation properties, I wound up with 21 select statements! Most of which are just the same statement executed again and again and again. It may do so with different parameters perhaps, but there is hardly any way to make a program any less efficient. And these are five records - what will EF do when I throw our millions of transaction records its way?
Is there a way to prevent this kind of code generation, or is EF Core simply a non-starter?

Related

Repository pattern, ViewModel and ORMs

With Repository pattern and ViewModels, how do you build queries against the database if you don't want the raw database objects to leak outside the repository? How do I actually create queries without loading ALL the database in memory and using LINQ to Objects? I can't expose IQueryable to the rest of the app.
For example, with EF I have a bunch of POCOs with several properties that match db fields, but also some stuff to work around enums not being directly support (for now) as well as foreign key IDs to prevent N+1 and easier querying and so on. I don't want them to leak out to the rest of the application, I want the application to just see a normal object graph.
public class DbUser
{
public int Id { get; set; }
public string Name { get set; }
public int GroupId { get; set; }
public DbGroup Group { get; set; }
public ICollection<DbComment> { get; set; }
}
public class User
{
public int Id { get; set; }
public string Name { get set; }
public Group Group { get; set; }
public ICollection<Comment> { get; set; }
}
The problem here is my repository will internally use EF for the querying (and in-memory stuff when unit testing). But how do I implement IQueryable<User> FindAll()? I can't just do return dbContext.Users.Select(u => new User(u)), as in that case I lose all possible query ability; it'll just load the whole user collection in memory, convert all the types to User from DbUser and then build LINQ queries on the in-memory collection - that is horribly inefficient.
I can't just build queries in the repository. On some pages I have queries that select a few fields, but also calculate some complex stuff from other related objects, filter them based on the result (for example count of comments with positive score), but I also need that back in the application. I could select all objects used to get the complex stuff and return them to the application (but not as db entities) but that would mean select a LOT of data.
Basically how do I prevent the database entities from polluting the rest of the application with their cruft and hacks, while still maintaining the ability to build queries outside of the repository?
CQRS (Command Query Responsibility Segregation) solves this problem. You have the 'real' model , the Domain model, with all the business rules and all that, and a 'query-ony' model which basically is a simple poco (which can be used directly by Views) that will be returned by a specialised query only repository.
The peristence model (EF entities) are used only to 'talk' with the db, the repos always returns or deals with domain/ application objects. Basically, you have to map the EF entities to the Domain ones (and viceversa when saving). In this way, you'll have separated models each with its own purpose.

Programmatically ignore children with nHibernate

I'm just trying to get my head around nHibernate and have a query. When setting up the mappings file (with Fluent or regular .hbm.xml files) you specify relationships (bags; one-to-many, etc) and sub-types - the idea being (I believe) is that when you fetch an object it also fetches and matching data. My question is can I programmatically tell my query to ignore that relationship?
So, below, there is a Foo class with a list of Bar objects. Within the mappings file this would be a one-to-many relationship and sometimes I want to retrieve a Foo with all Bars BUT sometimes I want to just retrieve the Foo object without the Bar, for performance reasons. How can I do this?
public class Foo { public int Id { get; set; } public List<Bar> { get; set; } }
public class Bar { public int Id { get; set; }
Cheers
The relationship shouldn't be loaded automatically unless you turn off Lazy Loading or specify it to be eager loaded in the query.
Edit:
To answer your questions in the comment below.
1) It's done as part of the query. An basic example using QueryOver in NHibernate 3.0 would look something like:
var result = Session.QueryOver()
.Fetch(x => x.Category).Eager
.Where(x => x.Price > 10)
.List();
I think with ICriteria it's "SetFetchMode("Category", FetchMode.Eager)"
2) If you turn off lazy-loading on the mapping for an object, it will effectively always be eager loaded. Tho I suggest you eager load on a query-by-query basis to avoid the possibility of having a massive chain of data loaded, or loading data you don't actually need.

NHibernate insert/lookup performance

I have several XML files and each file contains data of ‘root objects’ which I parse using Linq to XML and then create actual root objects which I persist using NHibernate and the sharp architecture repository. I have started to optimise the data insert and manage to add 30000 objects in about 1 hour and 40 minutes to the database. However, this is still too slow.
I think one bottle neck is the lookup of objects in the database which requires IO. Objects have to be looked up for reuse.
The root object has several authors:
public virtual IList<Author> Authors { get; set; }
Authors have this structure:
public class Author : Entity
{
public virtual Initials Initials { get; set; }
public virtual ForeName ForeName { get; set; }
public virtual LastName LastName { get; set; }
}
I have achieved a great speed up by using a typed Id (something I wouldn't normally do):
public class LastName : EntityWithTypedId<string>, IHasAssignedId<string>
{
public LastName()
{
}
public LastName(string Id)
{
SetAssignedIdTo(Id);
}
public virtual void SetAssignedIdTo(string assignedId)
{
Id = assignedId;
}
}
Which I look up (and potentially create) like this:
LastName LastName = LastNameRepository.Get(TLastName);
if (LastName == null)
{
LastName = LastNameRepository.Save(new LastName(TLastName));
LastNameRepository.DbContext.CommitChanges();
}
Author.LastName = LastName;
I am looking authors up like this:
propertyValues = new Dictionary<string, object>();
propertyValues.Add("Initials", Author.Initials);
propertyValues.Add("ForeName", Author.ForeName);
propertyValues.Add("LastName", Author.LastName);
Author TAuthor = AuthorRepository.FindOne(propertyValues);
if (TAuthor == null)
{
AuthorRepository.SaveOrUpdate(Author);
AuthorRepository.DbContext.CommitChanges();
Root.Authors.Add(Author);
}
else
{
Root.Authors.Add(TAuthor);
}
Can I improve this? Should I use stored procedures/HQL/pure SQL/ICriteria instead to perform the lookup? Could I use some form of caching to speed up the lookup and reduce IO? The CommitChanges seems to be necessary or should I wrap everything into a transaction?
I already flush my session etc. every 10 root objects.
Any feedback would be very much welcome. Many thanks in advance.
Best wishes,
Christian
In all honesty I would say that you shouldn't even be using SA/NHibernate for something like this. It's a bulk data import from XML - an ETL tool like SSIS would be a better choice. Even a hand-cranked process on the DB server would work better - step 1, load XML to a table, step 2, do the UPSERT. Incidentally, SQL 2008 introduced the MERGE command for UPSERT operations, which might be of use.
I would also agree with Dan's comment - is it really necessary to treat initials, forename and surname as separate entities? Treating them as simple strings would boost performance. What in your domain model specifies that they are entities in their own right?
If you really must continue using SA/NHibernate, have a read of this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2010/06/24/bulk-processing-with-nhibernate.aspx
The suggestion in Jimmy's blog about batching SELECTs should help quite a lot. If you plan to process a batch of 250 records at once, do all the SELECTs as a single NH command, process all the data, then do all the updates as another single batch (which I believe your use of EntityWithTypedId and the adonet.batch_size config setting will help achieve)
Finally - regarding the statement "which I parse using Linq to XML" - is that really the best way of doing it? I'm guessing that it might be, given the size of your input file, but are you aware of the approach of simply deserializing the XML file into an object graph? SO won't let me post the link to a page describing this, because I haven't earned enough reputation yet - but if you want to read up on it, Google "don't parse that xml" and the first article will explain it.
Hope this helps.
Jon
The first thing I would do is simplify the Authors entity as I don't think you need the Initials, ForeName, and LastName objects as separate entities. I think using plain strings would be more efficient:
public class Author : Entity
{
public virtual string Initials { get; set; }
public virtual string ForeName { get; set; }
public virtual string LastName { get; set; }
}

NHibernate Attributes Mapping List

I'm a new NHibernate developer. I'm using attributes and not map files and I have configured the application to create the tables automatically.
I Have two classes , Group and User.
Withing the Group class I have a list of users
public class Group
{
[NHibernate.Mapping.Attributes.Id(Name = "GroupId")]
[NHibernate.Mapping.Attributes.Generator(Class = "guid")]
public virtual Guid GroupId { get; set; }
// What Attributes do I place here
public virtual List<User> Users { get; set; }
}
I can't find the right attributes so that there will be two tables that have one to many relation.
Can anyone help?
Thanks,
Ronny
[ManyToMany], [OneToMany] or [ManyToOne] (those linked docs are fairly useless though) depending on how you want it setup. Probably [OneToMany], and then the same on a User.
You could avoid the pain by using the Fluent NHibernate library instead, if you haven't already tried it.

Fluent NHibernate Architecture Question

I have a question that I may be over thinking at this point but here goes...
I have 2 classes Users and Groups. Users and groups have a many to many relationship and I was thinking that the join table group_users I wanted to have an IsAuthorized property (because some groups are private -- users will need authorization).
Would you recommend creating a class for the join table as well as the User and Groups table? Currently my classes look like this.
public class Groups
{
public Groups()
{
members = new List<Person>();
}
...
public virtual IList<Person> members { get; set; }
}
public class User
{
public User()
{
groups = new Groups()
}
...
public virtual IList<Groups> groups{ get; set; }
}
My mapping is like the following in both classes (I'm only showing the one in the users mapping but they are very similar):
HasManyToMany<Groups>(x => x.Groups)
.WithTableName("GroupMembers")
.WithParentKeyColumn("UserID")
.WithChildKeyColumn("GroupID")
.Cascade.SaveUpdate();
Should I write a class for the join table that looks like this?
public class GroupMembers
{
public virtual string GroupID { get; set; }
public virtual string PersonID { get; set; }
public virtual bool WaitingForAccept { get; set; }
}
I would really like to be able to adjust the group membership status and I guess I'm trying to think of the best way to go about this.
I generally only like to create classes that represent actual business entities. In this case I don't think 'groupmembers' represents anything of value in your code. To me the ORM should map the database to your business objects. This means that your classes don't have to exactly mirror the database layout.
Also I suspect that by implementing GroupMembers, you will end up with some nasty collections in both your user and group classes. I.E. the group class will have the list of users and also a list of groupmembers which references a user and vice versa for the user class. To me this isn't that clean and will make it harder to maintain and propagate changes to the tables.
I would suggest keeping the join table in the database as you have suggested, and add a List of groups called waitingtoaccept in users and (if it makes sense too) add List of users called waitingtoaccept in groups.
These would then pull their values from your join-table in the database based on the waitingtoaccept flag.
Yes, sure you need another class like UserGroupBridge. Another good side-effect is that you can modify user membership and group members without loading potentially heavy User/Group objects to NHibernate session.
Cheers.