NHibernate - Performance Tuning with Native SQL - sql

I am trying to use NHibernate to map a graph built with the following entites to the relational database (The code is incomplete and just for demo purpose, which should look straightforward). Potentially Node and Edge classes may have subclases and there are already a string of subclasses defined inherited from Node class. For all the inheritance relationships in this model, the mapping type used is joined-subclass (table-per-subclass);
class GraphObject { ... }
class Node : GraphObject {
List<Edge> IngoingEdges;
List<Edge> OutgoingEdges;
}
class Edge : GraphObject {
Node StartNode { get; set; }
Node EndNode { get; set; }
}
For connections between nodes and edges a dual many-to-one mapping is used as follows,
many-to-one from Edge.StartNode to nodes(.OutgoingEdges);
many-to-one from Edge.EndNode to nodes(.IngoingEdges)
Since there's a need to work with huge volume data in our project (millions of nodes and edges) and we would like to both keep having the benefits NHibernate provides and minimize the performance issues. Unfortunately it seems to take nearly an hour to save or load such a model. What I'm currently doing is trying to figure out a way to finish loading in one statement and see how long it takes. I made a few attempts and I used NHibernate Profiler to keep track of the SQL statements generated by the NHibernate framework when doing things like loading the entire graph from the data persistence, but so far I haven't managed to eliminate that huge amount of individual queries apparently for determining which are the start and end nodes for specific edges which look like
select ...StartNode as .., ..Id as .., ... from Link link where link.StartNode=10 (a number indicating node id)
which means I am kind of suffering from the so-called N+1 issues.
So is there anyone wo has come across a similar problem and can give me some idea, either in native SQLs or improving performance for this particular case by other approaches. I would really appreciate that. Any questions regarding points unclear are also welcome.

some optimisations come to mind:
disable lazyloading of StartNode and EndNode (get Egde in 1 Query instead 3)
EagerLoad the collections of an edge (http://ayende.com/blog/4367/eagerly-loading-entity-associations-efficiently-with-nhibernate)
this would give something along the lines of
// initialize the collections efficiently
session.QueryOver<Node>()
.Where(n => n.Id == nodeId)
.Fetch(n => n.IngoingEdges)
.ToFuture();
firstNode = session.QueryOver<Node>()
.Where(n => n.Id == nodeId)
.Fetch(n => n.OutgoingEdges)
.ToFuture().Value;
var egdeIds = firstNode
.SelectMany(n => n.IngoingEdges)
.SelectMany(edge => new [] { edge.StartNode.Id, edge.EndNode.Id });
EagerLoadNode(nodeIds);
void EagerLoadNode(IEnumerable<int> nodeIds)
{
// initialize the collections efficiently
session.QueryOver<Node>()
.Where(n => n.Id.IsIn(nodeIds))
.Fetch(n => n.IngoingEdges)
.ToFuture();
firstNode = session.QueryOver<Node>()
.Where(n => n.Id.IsIn(nodeIds))
.Fetch(n => n.OutgoingEdges)
.ToFuture();
}

Related

How do I wrap an EF 4.1 DbContext in a repository?

All,
I have a requirement to hide my EF implementation behind a Repository. My simple question: Is there a way to execute a 'find' across both a DbSet AND the DbSet.Local without having to deal with them both.
For example - I have standard repository implementation with Add/Update/Remove/FindById. I break the generic pattern by adding a FindByName method (for demo purposes only :). This gives me the following code:
Client App:
ProductCategoryRepository categoryRepository = new ProductCategoryRepository();
categoryRepository.Add(new ProductCategory { Name = "N" });
var category1 = categoryRepository.FindByName("N");
Implementation
public ProductCategory FindByName(string s)
{
// Assume name is unique for demo
return _legoContext.Categories.Where(c => c.Name == s).SingleOrDefault();
}
In this example, category1 is null.
However, if I implement the FindByName method as:
public ProductCategory FindByName(string s)
{
var t = _legoContext.Categories.Local.Where(c => c.Name == s).SingleOrDefault();
if (t == null)
{
t = _legoContext.Categories.Where(c => c.Name == s).SingleOrDefault();
}
return t;
}
In this case, I get what I expect when querying against both a new entry and one that is only in the database. But this presents a few issues that I am confused over:
1) I would assume (as a user of the repository) that cat2 below is not found. But it is found, and the great part is that cat2.Name is "Goober".
ProductCategoryRepository categoryRepository = new ProductCategoryRepository();
var cat = categoryRepository.FindByName("Technic");
cat.Name = "Goober";
var cat2 = categoryRepository.FindByName("Technic");
2) I would like to return a generic IQueryable from my repository.
It just seems like a lot of work to wrap the calls to the DbSet in a repository. Typically, this means that I've screwed something up. I'd appreciate any insight.
With older versions of EF you had very complicated situations that could arise quite fast due to the required references. In this version I would recomend not exposing IQueryable but ICollections or ILists. This will contain EF in your repository and create a good seperation.
Edit: furthermore, by sending back ICollection IEnumerable or IList you are restraining and controlling the queries being sent to the database. This will also allow you to fine tune and maintain the system with greater ease. By exposing IQueriable, you are exposing yourself to side affects which occur when people add more to the query, .Take() or .Where ... .SelectMany, EF will see these additions and will generate sql to reflect these uncontrolled queries. Not confining the queries can result in queries getting executed from the UI and is more complicated tests and maintenance issues in the long run.
since the point of the repository pattern is to be able to swap them out at will. the details of DbSets should be completly hidden.
I think that you're on a good path. The only thing I probaly ask my self is :
Is the context long lived? if not then do not worry about querying Local. An object that has been Inserted / Deleted should only be accessible once it has been comitted.
if this is a long lived context and you need access to deleted and inserted objects then querying the Local is a good idea, but as you've pointed out, you may run into difficulties at some point.

NHibernate Eager Loading with Queryover API on a complex object graph

I've got a pretty complex object graph that I want to load in one fell
swoop.
Samples have Daylogs which have Daylog Tests which have Daylog
Results
Daylog Tests have Testkeys, Daylog Results have Resultkeys, and
TestKeys have Resultkeys.
I'm using the QueryOver API and Future to run these all as one query,
and all the data that NHibernate should need to instantiate the entire
graph IS being returned, verfied by NHProf.
public static IList<Daylog> DatablockLoad(Isession sess,
ICollection<int> ids)
{
var daylogQuery = sess.QueryOver<Daylog>()
.WhereRestrictionOn(dl => dl.DaylogID).IsIn(ids.ToArray())
.Fetch(dl => dl.Tests).Eager
.TransformUsing(Transformers.DistinctRootEntity)
.Future<Daylog>();
sess.QueryOver<DaylogTest>()
.WhereRestrictionOn(dlt =>
dlt.Daylog.DaylogID).IsIn(ids.ToArray())
.Fetch(dlt => dlt.Results).Eager
.Inner.JoinQueryOver<TestKey>(dlt => dlt.TestKey)
.Fetch(dlt => dlt.TestKey).Eager
.Inner.JoinQueryOver<ResultKey>(tk => tk.Results)
.Fetch(dlt => dlt.TestKey.Results).Eager
.Future<DaylogTest>();
sess.QueryOver<DaylogResult>()
.Inner.JoinQueryOver(dlr => dlr.DaylogTest)
.WhereRestrictionOn(dlt =>
dlt.Daylog.DaylogID).IsIn(ids.ToArray())
.Fetch(dlr => dlr.ResultKey).Eager
.Fetch(dlr => dlr.History).Eager
.Future<DaylogResult>();
var daylogs = daylogQuery.ToList();
return daylogs;
}
However, I still end up with proxies to represent the relationship
between Testkey and ResultKey, even though I'm specifically loading
that relationship.
I think this entire query is probably representative of a poor
understanding of the QueryOver API, so I would like any and all advice
on it, but primarily, I'd like to understand why I get a proxy and not
a list of results when later I try to get
daylogresult.resultkey.testkey.results.
Any help?
The answer was to call NHibernateUtil.Initialize on the various objects. Simply pulling the data down does not mean that NHibernate will hydrate all the proxies.
You have to load all your entities in one QueryOver clause to get rid of proxies. But in this case you will have a lot of joins in your query, so I recommend to use lazy loading with batching.

Programmatically ignore children with nHibernate

I'm just trying to get my head around nHibernate and have a query. When setting up the mappings file (with Fluent or regular .hbm.xml files) you specify relationships (bags; one-to-many, etc) and sub-types - the idea being (I believe) is that when you fetch an object it also fetches and matching data. My question is can I programmatically tell my query to ignore that relationship?
So, below, there is a Foo class with a list of Bar objects. Within the mappings file this would be a one-to-many relationship and sometimes I want to retrieve a Foo with all Bars BUT sometimes I want to just retrieve the Foo object without the Bar, for performance reasons. How can I do this?
public class Foo { public int Id { get; set; } public List<Bar> { get; set; } }
public class Bar { public int Id { get; set; }
Cheers
The relationship shouldn't be loaded automatically unless you turn off Lazy Loading or specify it to be eager loaded in the query.
Edit:
To answer your questions in the comment below.
1) It's done as part of the query. An basic example using QueryOver in NHibernate 3.0 would look something like:
var result = Session.QueryOver()
.Fetch(x => x.Category).Eager
.Where(x => x.Price > 10)
.List();
I think with ICriteria it's "SetFetchMode("Category", FetchMode.Eager)"
2) If you turn off lazy-loading on the mapping for an object, it will effectively always be eager loaded. Tho I suggest you eager load on a query-by-query basis to avoid the possibility of having a massive chain of data loaded, or loading data you don't actually need.

nhibernate - sproutcore : How to only retrieve reference ID's and not load the reference/relation?

I use as a front-end sproutcore, and as back-end an nhibernate driven openrasta REST solution.
In sproutcore, references are actualy ID's / guid's. So an Address entity in the Sproutcore model could be:
// sproutcore code
App.Address = App.Base.extend(
street: SC.Record.attr(String, { defaultValue: "" }),
houseNumber: SC.Record.attr(String),
city: SC.Record.toOne('Funda.City')
);
with test data:
Funda.Address.FIXTURES = [
{ guid: "1",
street: "MyHomeStreet",
houseNumber: "34",
city: "6"
}
]
Here you see that the reference city has a value of 6. When, at some point in your program, you want to use that reference, it is done by:
myAddress.Get("city").MyCityName
So, Sproutcore automatically uses the supplied ID in a REST Get, and retrieves the needed record. If the record is available in de local memory of the client (previously loaded), then no round trip is made to the server, otherwise a http get is done for that ID : "http://servername/city/6". Very nice.
Nhibernate (mapped using fluent-nhibernate):
public AddressMap()
{
Schema(Config.ConfigElement("nh_default_schema", "Funda"));
Not.LazyLoad();
//Cache.ReadWrite();
Id(x => x.guid).Unique().GeneratedBy.Identity();
Table("Address");
Map(x => x.street);
Map(x => x.houseNumber);
References(x => x.city,
"cityID").LazyLoad().ForeignKey("fk_Address_cityID_City_guid");
}
Here i specified the foreign key, and to map "cityID" on the database table. It works ok.
BUT (and these are my questions for the guru's):
You can specify to lazy load / eager load a reference (city). Off course you do not want to eager load all your references. SO generally your tied to lazy loading.
But when Openrast (or WCF or ...) serializes such an object, it iterates the properties, which causes all the get's of the properties to be fired, which causes all of the references to be lazy loaded.
SO if your entity has 5 references, 1 query for the base object, and 5 for the references will be done. You might better be off with eager loading then ....
This sucks... Or am i wrong?
As i showed how the model inside sproutcore works, i only want the ID's of the references. So i Don't want eagerloading, and also not lazy loading.
just a "Get * from Address where ID = %" and get that mapped to my Address entity.
THen i also have the ID's of the references which pleases Sproutcore and me (no loading of unneeded references). But.... can NHibernate map the ID's of the references only?
And can i later indicate nHibernate to fully load the reference?
One approach could be (but is not a nice one) to load all reference EAGER (with join) (what a waste of resources.. i know) and in my Sever-side Address entity:
// Note: NOT mapped as Datamember, is NOT serialized!
public virtual City city { get; set; }
Int32 _cityID;
[Datamember]
public virtual Int32 cityID
{
get
{
if (city != null)
return city .guid;
else
return _cityID;
}
set
{
if (city!= null && city.guid != value)
{
city= null;
_cityID = value;
}
else if (city == null)
{
_cityID = value;
}
}
}
So i get my ID property for Sproutcore, but on the downside all references are loaded.
A better idea for me???
nHibernate-to-linq
3a. I want to get my address without their references (but preferably with their id's)
Dao myDao = new Dao();
from p in myDao.All()
select p;
If cities are lazy loading in my mapping, how can i specify in the linq query that i want it also to include my city id only?
3b.
I want to get addresses with my cities loaded in 1 query: (which are mapped as lazyloaded)
Dao myDao = new Dao();
from p in myDao.All()
join p.city ???????
select p;
My Main Question:
As argued earlier, with lazy loading, all references are lazy loaded when serializing entities. How can I prevent this, and only get ID's of references in a more efficient way?
Thank you very much for reading, and hopefully you can help me and others with the same questions. Kind regards.
as a note you wrote you do this
myAddress.Get("city").MyCityName
when it should be
myAddress.get("city").get("MyCityName")
or
myAddress.getPath("city.MyCityName")
With that out of the way, I think your question is "How do I not load the city object until I want to?".
Assuming you are using datasources, you need to manage in your datasource when you request the city object. So in retrieveRecord in your datasource simply don't fire the request, and call dataSourceDidComplete with the appropriate arguments (look in the datasource.js file) so the city record is not in the BUSY state. You are basically telling the store the record was loaded, but you pass an empty hash, so the record has no data.
Of course the problem with this is at some point you will need to retrieve the record. You could define a global like App.WANTS_CITY and in retrieveRecords only do the retrieve when you want the city. You need to manage the value of that trigger; statecharts are a good place to do this.
Another part of your question was "How do I load a bunch of records at once, instead of one request for each record?"
Note on the datasource there is a method retrieveRecords. You can define your own implementation to this method, which would allow you to fetch any records you want -- that avoids N requests for N child records -- you can do them all in one request.
Finally, personally, I tend to write an API layer with methods like
getAddress
and
getCity
and invoke my API appropriately, when I actually want the objects. Part of this approach is I have a very light datasource -- I basically bail out of all the create/update/fetch methods depending on what my API layer handles. I use the pushRetrieve and related methods to update the store.
I do this because the store uses in datasources in a very rigid way. I like more flexibility; not all server APIs work in the same way.

AutoMapper limit depth of mapping or map lazily

AutoMapper is great, saves a lot of time, but when I started looking at the performance of my application AutoMapper is responsible for performance loss.
I'm using lazy loading with NHibernate. Most of the time a need parent entity without needing to access child entities at all. In reality what happens is that AutoMapper tries to map as many relationships as possible causing NHibernate to lazy load all the child entities (I'm seeing SELECT N+1 happening all the time).
Is there way to limit how deep AutoMapper goes or is it possible for AutoMapper to map child entities lazily?
You could use the ignore method for associations you don't need to have loaded.
Mapper.CreateMap<User, UserDto>()
.ForMember(dest => dest.LazyCollection, opt => opt.Ignore())
.ForMember(dest => dest.AnotherLazyCollection, opt => opt.Ignore())
Mapper.CreateMap<UserProperty, UserPropertyDto>()
.ForMember(dest => dest.PropertyLazyReference, opt => opt.Ignore());
return Mapper.Map<User, UserDto>(user);
For associations you know you will need in your dto, you should look at ways of fetching these more efficiently with the initial query, but that is a whole new problem.
Perhaps you should consider using two different dtos; one that includes the child entities, and one that doesn't. You can then return the proper dto from your service layer depending upon the context.
I'm using pre conditions to prevent data from being mapped.
CreateMap<Team, TeamDto>()
.ForMember(dto => dto.Users, options =>
{
options.PreCondition(ctx => !ctx.Items.ContainsKey(AutoMapperItemKeys.SKIP_TEAM_USERS));
options.MapFrom(t => t.TeamUsers.Where(tu => tu.IsDeleted == false));
})
.ReverseMap();
When Map() is called I feed the Items dictionary with skip keys for the properties I don't want mapped.
this.mapper.Map<IEnumerable<Team>, IEnumerable<TeamDto>>(teams, opts =>
{
opts.Items.Add(AutoMapperItemKeys.SKIP_TEAM_USERS, true);
});
Advantages:
you can fine-grain which properties not to map
prevents from mapping to deep with nested objects
no need for duplicate dto's
no duplicate mapping profiles