I needed to eagerly fetch an entity with collections of collections, etc. so that I could process the data concurrently (a Thing entity with many YELLOWEntities that each have many SubYellowEntities). The key code needed was:
NHibernateUtil.Initialize(fetchedThings);
(See accepted answer below.)
There are several problems with the current approach. Let me give you a fresh look at the problem.
You are, I understand, looking to retrieve Things, which have many-to-one BlueEntities, and a collection of YellowEntities, which in turn have a collection of SubYellowEntities.
Instead of using join fetch, transformers and loops, you should look into batch fetching of entities and collections.
With a good batch size set for BlueEntity, Thing.YellowEntities and YellowEntity.SubYellowEntities, you should not need TPL stuff at all.
If it turns out you do, you won't be able to return safe-to-use entities, because they have to be associated with a single session to be useful, and sessions are not thread-safe.
Solution (if you can refine this, go ahead and post the revision as the answer): Runs in 17 seconds (down from 97) on 8 processors with correct results. Please remember that the vast majority of time was spent in calculations, not in waiting for Nhibernate to return.
var fetchedThings = new List<Thing>();
var sessFT = Session.SessionFactory.OpenSession();
Task getThingsTask = new Task(() =>
{
fetchedThings = sessFT.CreateQuery(#" from Thing t " +
" inner join fetch t.BlueEntity " )
.SetResultTransformer(Transformers.DistinctRootEntity)
.List<Thing>();
NHibernateUtil.Initialize(fetchedThings);
});
getThingsTask.Start();
var fetchedYellows = new List<YELLOWEntity>();
var sessFY = Session.SessionFactory.OpenSession();
Task getYellowsTask = new Task(() =>
{
fetchedYellows = sessFY.CreateQuery(#" from YELLOWEntity y " +
" left join fetch y.SubYellowEntities " + ... )
.SetResultTransformer(Transformers.DistinctRootEntity)
.List<YELLOWEntity>();
NHibernateUtil.Initialize(fetchedThings);
});
getYellowsTask.Start();
getThingsTask.Wait();
getYellowsTask.Wait();
Parallel.ForEach(fetchedThings, thing =>
{
thing.YELLOWEntities = fetchedYellows.Where(y=>y.Thing.Id == thing.Id).ToList();
...
//...inner foreach() loop through 'YELLOWthings'... doing threadsafe stuff
...
});
//dispose the temp sessions
Related
I have a system with two database servers I am working with:
One of them is database first - a database managed by a legacy enterprise application and I don't have full control over changing the database structure.
The second is code first and I have full control in the code first database to make changes.
Security policies prevent me from making a view that joins tables from the two database servers in the code first DB which might be a way to make this better according to what i've seen on SO posts.
I have one context for each database since they are separate.
The data and structure in the code first tables is designed to be able to join to the non-code first database as if they were all in one database.
I CAN get what I need working using this set of queries:
// Set up EF tables
var person = await _context1.Person.ToListAsync();
var labor = await _context1.Labor.ToListAsync();
var laborCraftRate = await context1.LaborCraftRate.ToListAsync();
var webUsers = await context2.WebUsers.ToListAsync();
var workOrders = await _context1.Workorder
.Where(r => r.Status == "LAPPR" || r.Status == "APPR" || r.Status == "REC")
.ToListAsync();
var specialRequests = await _context1.SwSpecialRequest
.Where(r => r.Requestdate > DateTime.Now)
.ToListAsync();
var distributionListQuery = (
from l in labor
from p in person.Where(p => p.Personid == l.Laborcode).DefaultIfEmpty()
from wu in webUsers.Where(wu => wu.Laborcode == l.Laborcode).DefaultIfEmpty()
from lcr in laborCraftRate.Where(lcr => lcr.Laborcode == l.Laborcode).DefaultIfEmpty()
select new
{
Laborcode = l.Laborcode,
Displayname = p.Displayname,
Craft = lcr.Craft,
Crew = l.Crewid,
Active = wu.Active,
Admin = wu.FrIsAdmin,
FrDistLocation = wu.FrDistLocation,
}).Where(r => r.Active == "Y" && (r.FrDistLocation == "IPC" || r.FrDistLocation == "IPC2" || r.FrDistLocation == "both"))
.OrderBy(r => r.Craft)
.ThenBy(r => r.Displayname);
// Build a subquery for the next query to use
var ptoSubQuery =
from webUser in webUsers
join workOrder in workOrders on webUser.Laborcode equals workOrder.Wolablnk
join specialRequest in specialRequests on workOrder.Wonum equals specialRequest.Wonum
select new
{
workOrder.Wonum,
Laborcode = workOrder.Wolablnk,
specialRequest.Requestdate
};
// Build the PTO query to join with the distribution list
var ptoQuery =
from a in ptoSubQuery
group a by a.Wonum into g
select new
{
Wonum = g.Key,
StartDate = g.Min(x => x.Requestdate),
EndDate = g.Max(x => x.Requestdate),
Laborcode = g.Min(x => x.Laborcode)
};
// Join the distribution list and the object list to return
// list items with PTO information
var joinedQuery = from dl in distributionListQuery
join fl in ptoQuery on dl.Laborcode equals fl.Laborcode
select new
{
dl.Laborcode,
dl.Displayname,
dl.Craft,
dl.Crew,
dl.Active,
dl.Admin,
dl.FrDistLocation,
fl.StartDate,
fl.EndDate
};
// There are multiple records that result from the join,
// strip out all but the first instance of PTO for all users
var distributionList = joinedQuery.GroupBy(r => r.Laborcode)
.Select(r => r.FirstOrDefault()).OrderByDescending(r => r.Laborcode).ToList();
Again, this works and gets my data back in a reasonable but clearly not optimal timeframe that I can work with in my UI that needs this by preloading the data before it is needed. Not the best, but works.
If I change the variable declarations to not be async which I was told I should do in another SO post, this turns into a cross db query and netcore says no:
// Set up EF tables
var person = _context1.Person;
var labor = _context1.Labor;
var laborCraftRate = context1.LaborCraftRate;
var webUsers = context2.WebUsers;
var workOrders = _context1.Workorder
.Where(r => r.Status == "LAPPR" || r.Status == "APPR" || r.Status == "REC");
var specialRequests = _context1.SwSpecialRequest
.Where(r => r.Requestdate > DateTime.Now);
Adding ToListAsync() is what allows the join functionality I need to work.
Q - Can anyone elaborate on possible downsides and problems with what I am doing?
Thank you for helping me understand!
It's not that calling ToList() "doesn't work." The problem is that it materializes (I think that's the right word) the query and returns a potentially larger than intended amount of data to the client. Any further LINQ operations are done on the client side. This can increase the load on the database and network. In your case, it works because you're bringing all that data to the client side. At that point, it no longer matters that it was a cross-database query.
This was a frequent concern during the transition from .NET Core 2.x to 3.x. If an operation could not be performed server side, .NET Core 2.x would silently insert something like ToList(). (Well, not completely silently. I think it was logged somewhere. But many developers weren't aware of it.) 3.x stopped doing that and would give you an error. When people tried to upgrade to 3.x, they often found it difficult to convert the queries into something that could run server side. And people resisted throwing in an explicit ToList() because muh performance. But remember, that's what it was always doing. If there wasn't a performance issue before, there isn't one now. And at least now you're aware of what it's actually doing, and can fix it if you really need to.
I am currently updating a BackEnd project to .NET Core and having performance issues with my Linq queries.
Main Queries:
var queryTexts = from text in _repositoryContext.Text
where text.KeyName.StartsWith("ApplicationSettings.")
where text.Sprache.Equals("*")
select text;
var queryDescriptions = from text in queryTexts
where text.KeyName.EndsWith("$Descr")
select text;
var queryNames = from text in queryTexts
where !(text.KeyName.EndsWith("$Descr"))
select text;
var queryDefaults = from defaults in _repositoryContext.ApplicationSettingsDefaults
where defaults.Value != "*"
select defaults;
After getting these IQueryables I run a foreach loop in another context to build my DTO model:
foreach (ApplicationSettings appl in _repositoryContext.ApplicationSettings)
{
var applDefaults = queryDefaults.Where(c => c.KeyName.Equals(appl.KeyName)).ToArray();
description = queryDescriptions.Where(d => d.KeyName.Equals("ApplicationSettings." + appl.KeyName + ".$Descr"))
.FirstOrDefault()?
.Text1 ?? "";
var name = queryNames.Where(n => n.KeyName.Equals("ApplicationSettings." + appl.KeyName)).FirstOrDefault()?.Text1 ?? "";
// Do some stuff with data and return DTO Model
}
In my old Project, this part had an execution from about 0,45 sec, by now I have about 5-6 sec..
I thought about using compiled queries but I recognized these don't support returning IEnumerable yet. Also I tried to avoid Contains() method. But it didn't improve performance anyway.
Could you take short look on my queries and maybe refactor or give some hints how to make one of the queries faster?
It is to note that _repositoryContext.Text has compared to other contexts the most entries (about 50 000), because of translations.
queryNames, queryDefaults, and queryDescriptions are all queries not collections. And you are running them in a loop. Try loading them outside of the loop.
eg: load queryNames to a dictionary:
var queryNames = from text in queryTexts
where !(text.KeyName.EndsWith("$Descr"))
select text;
var queryNamesByName = queryName.ToDictionary(n => n.KeyName);
one can write queries like below
var Profile="developer";
var LstUserName = alreadyUsed.Where(x => x.Profile==Profile).ToList();
you can also use "foreach" like below
lstUserNames.ForEach(x=>
{
//do your stuff
});
I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.
Is there a way to get at runtime the number of calls NHibernate is making to the database?
I know I can use the NHibernate profiler (NHProf, http://nhprof.com/) to manually view this count while debugging - but what I'd like to do is at runtime to get the actual number of calls to the database NHibernate is making so I can throw an exception if it's a ridiculous number like 50 or 300 (exact number to be determined).
This would be an indication to the developer that he/she needs to use 'Eager' and 'Future' and tweak the NHibernate code so hundreds of calls are not being made to the database.
Below I have an example where I'm seeing 284 calls being made to the database -
We can fix this code - so I'm not looking for a solution on how to make this NHibernate code better. I would like instead to implement a system that would notify developers they need to tweak the Nhibernate code.
Example - suppose we have the following model/db -
customer
customeraddress
order
orderstate
orderDetail
The code below makes one select call for each order detail to each of the related tables, etc... With only 72 order details in the db, we end up with 284 calls made to the database.
var query = QueryOver.Of<OrderDetail>()
.Where(c => c.UpdateTime >= updateTime);
using (var context = _coreSessionFactory.OpenSession())
{
var count = query.GetExecutableQueryOver(context)
.ToRowCountQuery()
.FutureValue<Int32>();
var items = query
.OrderBy(a => a.UpdateTime).Desc
.Skip(index ?? 0)
.Take(itemsPerPage ?? 20)
.GetExecutableQueryOver(context)
.Fetch(c => c.Order.OrderState).Eager
.Fetch(c => c.Order.Customer.CustomerAddress).Eager
.Future();
return new
{
Typename = "PagedCollectionOfContract",
Index = index ?? 0,
ItemsPerPage = itemsPerPage ?? 20,
TotalCount = count.Value,
Items = items.ToList().Select(c => new
{
Typename = "OrderDetail",
c.Id,
OrderId = c.Order.Id,
OrderStateAffiliates = c.Order.OrderStates.First(n => n.Name == StateNames.California).AffiliateCount,
CustomerZipCode = c.Order.Customer.CustomerAddresses.First(n => n.StateName == StateNames.California).ZipCode,
CustomerName = c.Order.Customer.Name,
c.DateTimeApproved,
c.Status
})
.ToArray()
};
}
It's not important for this order/customer model to be understood and to improve it - it's just an example so we get the idea on why I need to get the number of calls NHibernate makes to the db.
The SessionFactory can be configured to collect statistics. E.g. the number of successful transactions or the number of sessions that were opened.
This Article at JavaLobby gives some details.
You can use log4net to gather that info.
Logging NHibernate with log4net
I want to fetch Hierarchical/Tree data something like below from a Table which has following definiton.
Tree Table:
"""""""""""
Id |ParentId
"""""""""""
Work1|null
Work2|Work1
Work3|Work2
...
Required Query result Data (no need to be tabbed)- If I Pick 'Work1' I should complete Ids which are under its root something like below. If I pick 'Work2' then also I should complete Ids above and below its root.
> Work1
----------
> Work2
----------
> Work3
---------
What is the best way in NHibernate to fetch data in the above scenario in optimized manner.
To find out what the "best way" is, more information regarding the actual scenario would be needed. What kind of "optimization" are you looking for? Minimal amount of data (only the rows you are really going to need) or minimal number of SQL queries (preferably one roundtrip to the database) or any other?
Scenario 1: Menu or tree structure that is loaded once and kept in memory for longer periods of time (not a list that updates every few seconds). Small number of rows in the table (small is relative but I'd say anything below 200).
In this case I would just get the whole table with one query like this:
var items = session.Query<Work>()
.Fetch(c => c.ParentWork)
.Fetch(c => c.ChildWorks).ToList();
var item = session.Get<Work>(id);
This will result in a single SQL query which simply loads all the rows from the table. item will contain the complete tree (parents, grandparents, children, etc.).
Scenario 2: Large number of rows and only a fraction of rows needed. Only few levels in the hierarchy are to be expected.
In this case, just load the item and let NHibernate to the rest with lazy loading or force it to load everything by writing a recursive method to traverse parents and children. This will cause a N+1 select, which may or may not be slower than scenario 1 (depending on your data).
Here is a quick hack demonstrating this:
var item = session.Get<Work>(id);
Work parent = item.ParentWork;
Work root = item;
// find the root item
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// scan the whole tree
this.ScanChildren(root);
// -----
private void ScanChildren(Work item)
{
if (item == null)
{
return;
}
foreach (Work child in item.ChildWorks)
{
string name = child.Name;
this.ScanChildren(child);
}
}
Edit:
Scenario 3: Huge amount of data. Minimal number of queries and minimal amount of data.
In this case, I would think not of a tree structure but of having layers of data that we load one after another.
var work = repo.Session.Get<Work>(id);
// get root of that Work
Work parent = work.ParentWork;
Work root = work;
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// Get all the Works for each level
IList<Work> worksAll = new List<Work>() { root };
IList<Work> worksPerLevel = new List<Work>() { root };
// get each level until we don't have any more Works in the next level
int count = worksPerLevel.Count;
while (count > 0)
{
worksPerLevel = this.GetChildren(session, worksPerLevel);
// add the Works to our list of all Works
worksPerLevel.ForEach(c => worksAll.Add(c));
count = worksPerLevel.Count;
}
// here you can get the names of the Works or whatever
foreach (Work c in worksAll)
{
string s = c.Name;
}
// this methods gets the Works in the next level and returns them
private IList<Work> GetChildren(ISession session, IList<Work> worksPerLevel)
{
IList<Work> result = new List<Work>();
// get the IDs for the works in this level
IList<int> ids = worksPerLevel.Select(c => c.Id).ToList();
// use a WHERE IN clause do get the Works
// with the ParentId of Works in the current level
result = session.QueryOver<Work>()
.Where(
NHibernate.Criterion.Restrictions.InG<int>(
NHibernate.Criterion.Projections.Property<Work>(
c => c.ParentWork.Id),
ids)
)
.Fetch(c => c.ChildWorks).Eager // this will prevent the N+1 problem
.List();
return result;
}
This solution will not cause a N+1 problem, because we use an eager load for the children, so NHibernate will know the state of the child lists and not hit the DB again. You will only get x+y selects, where x is the number of selects to find the root Work and y is the number of levels (max depth of he tree).