Is there a way to get at runtime the number of calls NHibernate is making to the database?
I know I can use the NHibernate profiler (NHProf, http://nhprof.com/) to manually view this count while debugging - but what I'd like to do is at runtime to get the actual number of calls to the database NHibernate is making so I can throw an exception if it's a ridiculous number like 50 or 300 (exact number to be determined).
This would be an indication to the developer that he/she needs to use 'Eager' and 'Future' and tweak the NHibernate code so hundreds of calls are not being made to the database.
Below I have an example where I'm seeing 284 calls being made to the database -
We can fix this code - so I'm not looking for a solution on how to make this NHibernate code better. I would like instead to implement a system that would notify developers they need to tweak the Nhibernate code.
Example - suppose we have the following model/db -
customer
customeraddress
order
orderstate
orderDetail
The code below makes one select call for each order detail to each of the related tables, etc... With only 72 order details in the db, we end up with 284 calls made to the database.
var query = QueryOver.Of<OrderDetail>()
.Where(c => c.UpdateTime >= updateTime);
using (var context = _coreSessionFactory.OpenSession())
{
var count = query.GetExecutableQueryOver(context)
.ToRowCountQuery()
.FutureValue<Int32>();
var items = query
.OrderBy(a => a.UpdateTime).Desc
.Skip(index ?? 0)
.Take(itemsPerPage ?? 20)
.GetExecutableQueryOver(context)
.Fetch(c => c.Order.OrderState).Eager
.Fetch(c => c.Order.Customer.CustomerAddress).Eager
.Future();
return new
{
Typename = "PagedCollectionOfContract",
Index = index ?? 0,
ItemsPerPage = itemsPerPage ?? 20,
TotalCount = count.Value,
Items = items.ToList().Select(c => new
{
Typename = "OrderDetail",
c.Id,
OrderId = c.Order.Id,
OrderStateAffiliates = c.Order.OrderStates.First(n => n.Name == StateNames.California).AffiliateCount,
CustomerZipCode = c.Order.Customer.CustomerAddresses.First(n => n.StateName == StateNames.California).ZipCode,
CustomerName = c.Order.Customer.Name,
c.DateTimeApproved,
c.Status
})
.ToArray()
};
}
It's not important for this order/customer model to be understood and to improve it - it's just an example so we get the idea on why I need to get the number of calls NHibernate makes to the db.
The SessionFactory can be configured to collect statistics. E.g. the number of successful transactions or the number of sessions that were opened.
This Article at JavaLobby gives some details.
You can use log4net to gather that info.
Logging NHibernate with log4net
Related
I have a system with two database servers I am working with:
One of them is database first - a database managed by a legacy enterprise application and I don't have full control over changing the database structure.
The second is code first and I have full control in the code first database to make changes.
Security policies prevent me from making a view that joins tables from the two database servers in the code first DB which might be a way to make this better according to what i've seen on SO posts.
I have one context for each database since they are separate.
The data and structure in the code first tables is designed to be able to join to the non-code first database as if they were all in one database.
I CAN get what I need working using this set of queries:
// Set up EF tables
var person = await _context1.Person.ToListAsync();
var labor = await _context1.Labor.ToListAsync();
var laborCraftRate = await context1.LaborCraftRate.ToListAsync();
var webUsers = await context2.WebUsers.ToListAsync();
var workOrders = await _context1.Workorder
.Where(r => r.Status == "LAPPR" || r.Status == "APPR" || r.Status == "REC")
.ToListAsync();
var specialRequests = await _context1.SwSpecialRequest
.Where(r => r.Requestdate > DateTime.Now)
.ToListAsync();
var distributionListQuery = (
from l in labor
from p in person.Where(p => p.Personid == l.Laborcode).DefaultIfEmpty()
from wu in webUsers.Where(wu => wu.Laborcode == l.Laborcode).DefaultIfEmpty()
from lcr in laborCraftRate.Where(lcr => lcr.Laborcode == l.Laborcode).DefaultIfEmpty()
select new
{
Laborcode = l.Laborcode,
Displayname = p.Displayname,
Craft = lcr.Craft,
Crew = l.Crewid,
Active = wu.Active,
Admin = wu.FrIsAdmin,
FrDistLocation = wu.FrDistLocation,
}).Where(r => r.Active == "Y" && (r.FrDistLocation == "IPC" || r.FrDistLocation == "IPC2" || r.FrDistLocation == "both"))
.OrderBy(r => r.Craft)
.ThenBy(r => r.Displayname);
// Build a subquery for the next query to use
var ptoSubQuery =
from webUser in webUsers
join workOrder in workOrders on webUser.Laborcode equals workOrder.Wolablnk
join specialRequest in specialRequests on workOrder.Wonum equals specialRequest.Wonum
select new
{
workOrder.Wonum,
Laborcode = workOrder.Wolablnk,
specialRequest.Requestdate
};
// Build the PTO query to join with the distribution list
var ptoQuery =
from a in ptoSubQuery
group a by a.Wonum into g
select new
{
Wonum = g.Key,
StartDate = g.Min(x => x.Requestdate),
EndDate = g.Max(x => x.Requestdate),
Laborcode = g.Min(x => x.Laborcode)
};
// Join the distribution list and the object list to return
// list items with PTO information
var joinedQuery = from dl in distributionListQuery
join fl in ptoQuery on dl.Laborcode equals fl.Laborcode
select new
{
dl.Laborcode,
dl.Displayname,
dl.Craft,
dl.Crew,
dl.Active,
dl.Admin,
dl.FrDistLocation,
fl.StartDate,
fl.EndDate
};
// There are multiple records that result from the join,
// strip out all but the first instance of PTO for all users
var distributionList = joinedQuery.GroupBy(r => r.Laborcode)
.Select(r => r.FirstOrDefault()).OrderByDescending(r => r.Laborcode).ToList();
Again, this works and gets my data back in a reasonable but clearly not optimal timeframe that I can work with in my UI that needs this by preloading the data before it is needed. Not the best, but works.
If I change the variable declarations to not be async which I was told I should do in another SO post, this turns into a cross db query and netcore says no:
// Set up EF tables
var person = _context1.Person;
var labor = _context1.Labor;
var laborCraftRate = context1.LaborCraftRate;
var webUsers = context2.WebUsers;
var workOrders = _context1.Workorder
.Where(r => r.Status == "LAPPR" || r.Status == "APPR" || r.Status == "REC");
var specialRequests = _context1.SwSpecialRequest
.Where(r => r.Requestdate > DateTime.Now);
Adding ToListAsync() is what allows the join functionality I need to work.
Q - Can anyone elaborate on possible downsides and problems with what I am doing?
Thank you for helping me understand!
It's not that calling ToList() "doesn't work." The problem is that it materializes (I think that's the right word) the query and returns a potentially larger than intended amount of data to the client. Any further LINQ operations are done on the client side. This can increase the load on the database and network. In your case, it works because you're bringing all that data to the client side. At that point, it no longer matters that it was a cross-database query.
This was a frequent concern during the transition from .NET Core 2.x to 3.x. If an operation could not be performed server side, .NET Core 2.x would silently insert something like ToList(). (Well, not completely silently. I think it was logged somewhere. But many developers weren't aware of it.) 3.x stopped doing that and would give you an error. When people tried to upgrade to 3.x, they often found it difficult to convert the queries into something that could run server side. And people resisted throwing in an explicit ToList() because muh performance. But remember, that's what it was always doing. If there wasn't a performance issue before, there isn't one now. And at least now you're aware of what it's actually doing, and can fix it if you really need to.
I am currently updating a BackEnd project to .NET Core and having performance issues with my Linq queries.
Main Queries:
var queryTexts = from text in _repositoryContext.Text
where text.KeyName.StartsWith("ApplicationSettings.")
where text.Sprache.Equals("*")
select text;
var queryDescriptions = from text in queryTexts
where text.KeyName.EndsWith("$Descr")
select text;
var queryNames = from text in queryTexts
where !(text.KeyName.EndsWith("$Descr"))
select text;
var queryDefaults = from defaults in _repositoryContext.ApplicationSettingsDefaults
where defaults.Value != "*"
select defaults;
After getting these IQueryables I run a foreach loop in another context to build my DTO model:
foreach (ApplicationSettings appl in _repositoryContext.ApplicationSettings)
{
var applDefaults = queryDefaults.Where(c => c.KeyName.Equals(appl.KeyName)).ToArray();
description = queryDescriptions.Where(d => d.KeyName.Equals("ApplicationSettings." + appl.KeyName + ".$Descr"))
.FirstOrDefault()?
.Text1 ?? "";
var name = queryNames.Where(n => n.KeyName.Equals("ApplicationSettings." + appl.KeyName)).FirstOrDefault()?.Text1 ?? "";
// Do some stuff with data and return DTO Model
}
In my old Project, this part had an execution from about 0,45 sec, by now I have about 5-6 sec..
I thought about using compiled queries but I recognized these don't support returning IEnumerable yet. Also I tried to avoid Contains() method. But it didn't improve performance anyway.
Could you take short look on my queries and maybe refactor or give some hints how to make one of the queries faster?
It is to note that _repositoryContext.Text has compared to other contexts the most entries (about 50 000), because of translations.
queryNames, queryDefaults, and queryDescriptions are all queries not collections. And you are running them in a loop. Try loading them outside of the loop.
eg: load queryNames to a dictionary:
var queryNames = from text in queryTexts
where !(text.KeyName.EndsWith("$Descr"))
select text;
var queryNamesByName = queryName.ToDictionary(n => n.KeyName);
one can write queries like below
var Profile="developer";
var LstUserName = alreadyUsed.Where(x => x.Profile==Profile).ToList();
you can also use "foreach" like below
lstUserNames.ForEach(x=>
{
//do your stuff
});
Using Entity Framework, I am updating about 300 rows, and 9 columns about every 30 seconds. Below is how I am currently doing it. My question is, how can I make the code more efficient?
Every once in a while, I feel my database gets hit with the impact and I just want to make it as efficient as possible.
// FOREACH OF MY 300 ROWS
var original = db.MarketDatas.FirstOrDefault(x => x.BBSymbol == targetBBsymbol);
if (original != null)
{
//if (original.BBSymbol.ToUpper() == "NOH7 INDEX")
//{
// var x1 = 1;
//}
original.last_price = marketDataItem.last_price;
original.bid = marketDataItem.bid;
original.ask = marketDataItem.ask;
if (marketDataItem.px_settle_last_dt_rt != null)
{
original.px_settle_last_dt_rt = marketDataItem.px_settle_last_dt_rt;
}
if (marketDataItem.px_settle_actual_rt != 0)
{
original.px_settle_actual_rt = marketDataItem.px_settle_actual_rt;
}
original.chg_on_day = marketDataItem.chg_on_day;
if (marketDataItem.prev_close_value_realtime != 0)
{
original.prev_close_value_realtime = marketDataItem.prev_close_value_realtime;
}
if (marketDataItem.px_settle_last_dt_rt != null)
{
DateTime d2 = (DateTime)marketDataItem.px_settle_last_dt_rt;
if (d1.Day == d2.Day)
{
//market has settled
original.settled = "yes";
}
else
{
//market has NOT settled
original.settled = "no";
}
}
if (marketDataItem.updateTime.Year != 1)
{
original.updateTime = marketDataItem.updateTime;
}
db.SaveChanges();
}
Watching what is being hit in the debugger...
SELECT TOP (1)
[Extent1].[MarketDataID] AS [MarketDataID],
[Extent1].[BBSymbol] AS [BBSymbol],
[Extent1].[Name] AS [Name],
[Extent1].[fut_Val_Pt] AS [fut_Val_Pt],
[Extent1].[crncy] AS [crncy],
[Extent1].[fut_tick_size] AS [fut_tick_size],
[Extent1].[fut_tick_val] AS [fut_tick_val],
[Extent1].[fut_init_spec_ml] AS [fut_init_spec_ml],
[Extent1].[last_price] AS [last_price],
[Extent1].[bid] AS [bid],
[Extent1].[ask] AS [ask],
[Extent1].[px_settle_last_dt_rt] AS [px_settle_last_dt_rt],
[Extent1].[px_settle_actual_rt] AS [px_settle_actual_rt],
[Extent1].[settled] AS [settled],
[Extent1].[chg_on_day] AS [chg_on_day],
[Extent1].[prev_close_value_realtime] AS [prev_close_value_realtime],
[Extent1].[last_tradeable_dt] AS [last_tradeable_dt],
[Extent1].[fut_notice_first] AS [fut_notice_first],
[Extent1].[updateTime] AS [updateTime]
FROM [dbo].[MarketDatas] AS [Extent1]
WHERE ([Extent1].[BBSymbol] = #p__linq__0) OR (([Extent1].[BBSymbol] IS NULL) AND (#p__linq__0 IS NULL))
It seems it updates the same thing multiple times, if I am understanding it correctly.
UPDATE [dbo].[MarketDatas]
SET [last_price] = #0, [chg_on_day] = #1, [updateTime] = #2
WHERE ([MarketDataID] = #3)
UPDATE [dbo].[MarketDatas]
SET [last_price] = #0, [chg_on_day] = #1, [updateTime] = #2
WHERE ([MarketDataID] = #3)
You can reduce this to 2 round trips.
Don't call SaveChanges() in side the loop. Move it outside and call it after you are done processing everything.
Write the select in such a way that it retrieves all the originals in one go and pushes them to a memory collection, then retrieve from that for each item you are updating/inserting.
code
// use this as your source
// to retrieve an item later use TryGetValue
var originals = db.MarketDatas
.Where(x => arrayOftargetBBsymbol.Contains(x.BBSymbol));
.ToDictionary(x => x.BBSymbol, y => y);
// iterate over changes you want to make
foreach(var change in changes){
MarketData original = null;
// is there an existing entity
if(originals.TryGetValue(change.targetBBsymbol, out original)){
// update your original
}
}
// save changes all at once
db.SaveChanges();
You could only execute "db.SaveChanges" after your foreach loop. It think it you would do exactly what your are asking for.
It seems it updates the same thing multiple times, if I am
understanding it correctly.
Entity Framework performs a database round-trip for every entity to update.
Just check the parameter value, they will be different.
how can I make the code more efficient
The major problem is your current solution is not scalable.
It works well when you only have a few entities to update but will become worse and worse are the number of items to update in a batch will increase.
It's often better to make this kind of logic all in the database, but perhaps you cannot do it.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library can make your code more efficient by allowing you to save multiples entities at once. All bulk operations are supported:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
BulkSynchronize
Example:
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});
I needed to eagerly fetch an entity with collections of collections, etc. so that I could process the data concurrently (a Thing entity with many YELLOWEntities that each have many SubYellowEntities). The key code needed was:
NHibernateUtil.Initialize(fetchedThings);
(See accepted answer below.)
There are several problems with the current approach. Let me give you a fresh look at the problem.
You are, I understand, looking to retrieve Things, which have many-to-one BlueEntities, and a collection of YellowEntities, which in turn have a collection of SubYellowEntities.
Instead of using join fetch, transformers and loops, you should look into batch fetching of entities and collections.
With a good batch size set for BlueEntity, Thing.YellowEntities and YellowEntity.SubYellowEntities, you should not need TPL stuff at all.
If it turns out you do, you won't be able to return safe-to-use entities, because they have to be associated with a single session to be useful, and sessions are not thread-safe.
Solution (if you can refine this, go ahead and post the revision as the answer): Runs in 17 seconds (down from 97) on 8 processors with correct results. Please remember that the vast majority of time was spent in calculations, not in waiting for Nhibernate to return.
var fetchedThings = new List<Thing>();
var sessFT = Session.SessionFactory.OpenSession();
Task getThingsTask = new Task(() =>
{
fetchedThings = sessFT.CreateQuery(#" from Thing t " +
" inner join fetch t.BlueEntity " )
.SetResultTransformer(Transformers.DistinctRootEntity)
.List<Thing>();
NHibernateUtil.Initialize(fetchedThings);
});
getThingsTask.Start();
var fetchedYellows = new List<YELLOWEntity>();
var sessFY = Session.SessionFactory.OpenSession();
Task getYellowsTask = new Task(() =>
{
fetchedYellows = sessFY.CreateQuery(#" from YELLOWEntity y " +
" left join fetch y.SubYellowEntities " + ... )
.SetResultTransformer(Transformers.DistinctRootEntity)
.List<YELLOWEntity>();
NHibernateUtil.Initialize(fetchedThings);
});
getYellowsTask.Start();
getThingsTask.Wait();
getYellowsTask.Wait();
Parallel.ForEach(fetchedThings, thing =>
{
thing.YELLOWEntities = fetchedYellows.Where(y=>y.Thing.Id == thing.Id).ToList();
...
//...inner foreach() loop through 'YELLOWthings'... doing threadsafe stuff
...
});
//dispose the temp sessions
I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.