Statistical query in SQL - is this possible with NHibernate LINQ? - sql

I have an application that uses a few data warehousing principles such as dimensional modeling to do reporting on a fairly simple database.
An example (simplified) entity named Call looks like this:
public virtual long Id { get; set; }
public virtual string OriginatorNumber { get; set; }
public virtual string DestinationNumber { get; set; }
public virtual DateDimension DateDimension { get; set; }
A few of the properties of the real model have been removed as they are irrelevant. The simplified DateDimension looks like this:
public virtual long Id { get; set; }
public virtual DateTime Date { get; set; }
public virtual int DayOfMonth { get; set; }
public virtual int Weekday { get; set; }
There are a LOT more columns like this - they are prepopulated for the current decade by application setup. So each date in the entire decade has a row in this table, and each Call has a link to the date that it occured. This is all mapped in Fluent NHibernate and working fine.
If I want to do some reporting, I can do this easily with the improved NHibernate LINQ provider in 3.0. We would like to use LINQ for the improved maintainability it gives us, but if we really MUST, we'll consider HQL, ICriteria or even plain SQL.
So say I want to build a report that shows the number of calls from a certain number, divided by the day of the week they occur. I can do that easily this way:
var query = Calls
.Where(c => c.OriginatorNumber == "402")
.GroupBy(c => c.DateDimension.Weekday)
.Select(g => new { Day = g.Key, Calls = g.Count() } );
In this example, "Calls" is basically an IQueryable returned from NHibernates LINQ provider (Query) through a repository interface. The query above gives me the correct results, NHibernate Profiler shows me that the SQL is pretty optimal, all is well.
However, if I want to do something slightly more advanced, I get stuck. Say I want the average number of calls per weekday. Not too far from the above, right? I just need to figure out the number of unique dates each weekday has in the result set, divide the total number of calls by it, and we're all set - right? Well, no, this is where I start to hit the limitations of the NHibernate LINQ provider. With LINQ to objects I could construct a query to do it - something along the lines of
.Select(g => g.Count() / g.GroupBy(c => c.DateDimension.Date).Count());
However, this does not convert into the correct query when using it in NHibernate. Rather, it turns both .Count() calls in the above to the same count(*) of call records, so the result is always 1.
I COULD of course just query for each call, weekday and date as a new anonymous object, then do the math on the application side, but according to conventional wisdom, That's Just Wrong (tm). I could end up doing it in desperation, tho, even tho it means pain when the table grows to a million++ calls.
The below is an SQL query that gives me the result I am looking for.
select ss.Weekday, AVG(cast(ss.Count as decimal))
from
(
select dd.Weekday, dd.Date, COUNT(*) as Count
from Call c
left outer join DateDimension dd
on c.DateDimension_id = dd.Id
where c.OriginatorNumber = '402'
group by dd.Weekday, dd.Date
) ss
group by ss.Weekday
order by ss.Weekday
Is it possible to do this with the NHibernate LINQ provider? Or, if that is not possible, how close can I get before I have to let the application fetch the intermediary result and do the rest?

There are a lot of things you can't do with the LINQ provider. Using HQL or CreateCriteria is just something you'll have to accept with NHibernate.
I haven't tried it, but it looks like you should be able to do what you want to do using HQL or CreateCriteria (with DetatchedCriteria).
If you are desperate you can also fall back to plain SQL using CreateSqlQuery.

Related

EF Core generates too many queries for nested data

I have a simple class to represent a tree structure, defined like this:
public class LicenceCategory
{
[Key]
[Column("LicenceCategoryID")]
public Guid ID { get; set; }
public string Name { get; set; }
public Guid? ParentLicenceCategoryID { get; set; }
[ForeignKey("ParentLicenceCategoryID")]
public virtual List<LicenceCategory> Categories { get; set; }
}
Then, from an ASP.NET Core controller, I simply return myContext.LicenceCategory, which is about as simple as it gets.
Right now, there are five records on the database: the parent (null ParentLicenceCategoryID), and four children for that one parent. So no massive volumes and no very deep nesting. This is the SQL that gets generated, and is as I expect:
SELECT [obj].[LicenceCategoryID], [obj].[Name], [obj].[ParentLicenceCategoryID]
FROM [LicenceCategory] AS [obj]
However, it also generates this, five times:
SELECT [e].[LicenceCategoryID], [e].[Name], [e].[ParentLicenceCategoryID]
FROM [LicenceCategory] AS [e]
WHERE [e].[ParentLicenceCategoryID] = #__get_Item_0
Notice how the first statement already contains every field you need to build the tree structure client-side. Why on earth even do the extra select statements?
I noticed that if I Include navigation properties, things get much worse: For three navigation properties, I wound up with 21 select statements! Most of which are just the same statement executed again and again and again. It may do so with different parameters perhaps, but there is hardly any way to make a program any less efficient. And these are five records - what will EF do when I throw our millions of transaction records its way?
Is there a way to prevent this kind of code generation, or is EF Core simply a non-starter?

Repository pattern, ViewModel and ORMs

With Repository pattern and ViewModels, how do you build queries against the database if you don't want the raw database objects to leak outside the repository? How do I actually create queries without loading ALL the database in memory and using LINQ to Objects? I can't expose IQueryable to the rest of the app.
For example, with EF I have a bunch of POCOs with several properties that match db fields, but also some stuff to work around enums not being directly support (for now) as well as foreign key IDs to prevent N+1 and easier querying and so on. I don't want them to leak out to the rest of the application, I want the application to just see a normal object graph.
public class DbUser
{
public int Id { get; set; }
public string Name { get set; }
public int GroupId { get; set; }
public DbGroup Group { get; set; }
public ICollection<DbComment> { get; set; }
}
public class User
{
public int Id { get; set; }
public string Name { get set; }
public Group Group { get; set; }
public ICollection<Comment> { get; set; }
}
The problem here is my repository will internally use EF for the querying (and in-memory stuff when unit testing). But how do I implement IQueryable<User> FindAll()? I can't just do return dbContext.Users.Select(u => new User(u)), as in that case I lose all possible query ability; it'll just load the whole user collection in memory, convert all the types to User from DbUser and then build LINQ queries on the in-memory collection - that is horribly inefficient.
I can't just build queries in the repository. On some pages I have queries that select a few fields, but also calculate some complex stuff from other related objects, filter them based on the result (for example count of comments with positive score), but I also need that back in the application. I could select all objects used to get the complex stuff and return them to the application (but not as db entities) but that would mean select a LOT of data.
Basically how do I prevent the database entities from polluting the rest of the application with their cruft and hacks, while still maintaining the ability to build queries outside of the repository?
CQRS (Command Query Responsibility Segregation) solves this problem. You have the 'real' model , the Domain model, with all the business rules and all that, and a 'query-ony' model which basically is a simple poco (which can be used directly by Views) that will be returned by a specialised query only repository.
The peristence model (EF entities) are used only to 'talk' with the db, the repos always returns or deals with domain/ application objects. Basically, you have to map the EF entities to the Domain ones (and viceversa when saving). In this way, you'll have separated models each with its own purpose.

What's the difference in how Nhibernate treats .FirstOrDefault vs .SingleOrDefault? possible bug?

Just starting out with NHibernate and using Nhib 3.0's (3.0.0.2001) Linq with following models
public class Request
{
public virtual Guid Id { get; set; }
public virtual State {get;set;}
}
public class State
{
public virtual Guid Id {get;set;}
}
So I'm just trying to retrieve a Request based on it's State Id.
_session.Query<Request>().Where(x => x.State.Id==someGuidValue).FirstOrDefault();
Seems pretty straight forward, but this gets a sql error based on generated sql where it looks like #p0 parameter is missing, though not sure why'd it be included here.
{"Line 1: Incorrect syntax near '('."}
select TOP (#p0) requ0_.Id as Id0_
, requ0_.State_id as State8_0_
from [Request] requ0_ where requ0_.State_id=#p1 ]
Name:p1 - Value:a2e63925-6628-4786-a621-9e5200d5ab71
However, using SingleOrDefault works, just fine.
_session.Query<Request>().Where(x => x.State.Id==someGuidValue).SingleOrDefault();
Any insight would be appreciated. Thanks
I believe SingleOrDefault will error when more than one records meets your select criteria. Where FirstOrDefault will just pull the first regardless of how many records are returned.
That's how it works in LINQ-To-SQL

How do I express this LINQ query using the NHibernate ICriteria API?

My current project is using NHibernate 3.0b1 and the NHibernate.Linq.Query<T>() API. I'm pretty fluent in LINQ, but I have absolutely no experience with HQL or the ICriteria API. One of my queries isn't supported by the IQueryable API, so I presume I need to use one of the previous APIs -- but I have no idea where to start.
I've tried searching the web for a good "getting started" guide to ICriteria, but the only examples I've found are either far too simplistic to apply here or far too advanced for me to understand. If anyone has some good learning materials to pass along, it would be greatly appreciated.
In any case, the object model I'm querying against looks like this (greatly simplified, non-relevant properties omitted):
class Ticket {
IEnumerable<TicketAction> Actions { get; set; }
}
abstract class TicketAction {
Person TakenBy { get; set; }
DateTime Timestamp { get; set; }
}
class CreateAction : TicketAction {}
class Person {
string Name { get; set; }
}
A Ticket has a collection of TicketAction describing its history. TicketAction subtypes include CreateAction, ReassignAction, CloseAction, etc. All tickets have a CreateAction added to this collection when created.
This LINQ query is searching for tickets created by someone with the given name.
var createdByName = "john".ToUpper();
var tickets = _session.Query<Ticket>()
.Where(t => t.Actions
.OfType<CreateAction>()
.Any(a => a.TakenBy.Name.ToUpper().Contains(createdByName));
The OfType<T>() method causes a NotSupportedException to be thrown. Can I do this using ICriteria instead?
try something like this. It's uncompiled, but it should work as long as IEnumerable<TicketAction> Actions and Person TakenBy is never null. If you set it to an empty list in the ticket constructor, that will solve a problem with nulls.
If you add a reference to the Ticket object in the TicketAction, you could do something like this:
ICriteria criteria = _session.CreateCriteria(typeof(CreateAction))
.Add(Expression.Eq("TakenBy.Name", createdByName));
var actions = criteria.List<CreateAction>();
var results = from a in criteria.List<>()
select a.Ticket;
In my experience, nhibernate has trouble with criteria when it comes to lists when the list is on the object side - such as is your case. When it is a list of values on the input side, you can use Expression.Eq. I've always had to find ways around this limitation through linq, where I get an initial result set filtered down as best as I can, then filter again with linq to get what I need.
OfType is supported. I'm not sure ToUpper is though, but as SQL ignores case it does not matter (as long as you are not also running the query in memory...). Here is a working unit test from the nHibernate.LINQ project:
var animals = (from animal in session.Linq<Animal>()
where animal.Children.OfType<Mammal>().Any(m => m.Pregnant)
select animal).ToArray();
Assert.AreEqual("789", animals.Single().SerialNumber);
Perhaps your query should look more like the following:
var animals = (from ticket in session.Linq<Ticket>()
where ticket.Actions.OfType<CreateAction>().Any(m => m.TakenBy.Name.Contains("john"))
select ticket).ToArray();

NHibernate insert/lookup performance

I have several XML files and each file contains data of ‘root objects’ which I parse using Linq to XML and then create actual root objects which I persist using NHibernate and the sharp architecture repository. I have started to optimise the data insert and manage to add 30000 objects in about 1 hour and 40 minutes to the database. However, this is still too slow.
I think one bottle neck is the lookup of objects in the database which requires IO. Objects have to be looked up for reuse.
The root object has several authors:
public virtual IList<Author> Authors { get; set; }
Authors have this structure:
public class Author : Entity
{
public virtual Initials Initials { get; set; }
public virtual ForeName ForeName { get; set; }
public virtual LastName LastName { get; set; }
}
I have achieved a great speed up by using a typed Id (something I wouldn't normally do):
public class LastName : EntityWithTypedId<string>, IHasAssignedId<string>
{
public LastName()
{
}
public LastName(string Id)
{
SetAssignedIdTo(Id);
}
public virtual void SetAssignedIdTo(string assignedId)
{
Id = assignedId;
}
}
Which I look up (and potentially create) like this:
LastName LastName = LastNameRepository.Get(TLastName);
if (LastName == null)
{
LastName = LastNameRepository.Save(new LastName(TLastName));
LastNameRepository.DbContext.CommitChanges();
}
Author.LastName = LastName;
I am looking authors up like this:
propertyValues = new Dictionary<string, object>();
propertyValues.Add("Initials", Author.Initials);
propertyValues.Add("ForeName", Author.ForeName);
propertyValues.Add("LastName", Author.LastName);
Author TAuthor = AuthorRepository.FindOne(propertyValues);
if (TAuthor == null)
{
AuthorRepository.SaveOrUpdate(Author);
AuthorRepository.DbContext.CommitChanges();
Root.Authors.Add(Author);
}
else
{
Root.Authors.Add(TAuthor);
}
Can I improve this? Should I use stored procedures/HQL/pure SQL/ICriteria instead to perform the lookup? Could I use some form of caching to speed up the lookup and reduce IO? The CommitChanges seems to be necessary or should I wrap everything into a transaction?
I already flush my session etc. every 10 root objects.
Any feedback would be very much welcome. Many thanks in advance.
Best wishes,
Christian
In all honesty I would say that you shouldn't even be using SA/NHibernate for something like this. It's a bulk data import from XML - an ETL tool like SSIS would be a better choice. Even a hand-cranked process on the DB server would work better - step 1, load XML to a table, step 2, do the UPSERT. Incidentally, SQL 2008 introduced the MERGE command for UPSERT operations, which might be of use.
I would also agree with Dan's comment - is it really necessary to treat initials, forename and surname as separate entities? Treating them as simple strings would boost performance. What in your domain model specifies that they are entities in their own right?
If you really must continue using SA/NHibernate, have a read of this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2010/06/24/bulk-processing-with-nhibernate.aspx
The suggestion in Jimmy's blog about batching SELECTs should help quite a lot. If you plan to process a batch of 250 records at once, do all the SELECTs as a single NH command, process all the data, then do all the updates as another single batch (which I believe your use of EntityWithTypedId and the adonet.batch_size config setting will help achieve)
Finally - regarding the statement "which I parse using Linq to XML" - is that really the best way of doing it? I'm guessing that it might be, given the size of your input file, but are you aware of the approach of simply deserializing the XML file into an object graph? SO won't let me post the link to a page describing this, because I haven't earned enough reputation yet - but if you want to read up on it, Google "don't parse that xml" and the first article will explain it.
Hope this helps.
Jon
The first thing I would do is simplify the Authors entity as I don't think you need the Initials, ForeName, and LastName objects as separate entities. I think using plain strings would be more efficient:
public class Author : Entity
{
public virtual string Initials { get; set; }
public virtual string ForeName { get; set; }
public virtual string LastName { get; set; }
}