SQL to Entity Framework Count Group-By - sql

I need to translate this SQL statement to a Linq-Entity query...
SELECT name, count(name) FROM people
GROUP by name

Query syntax
var query = from p in context.People
group p by p.name into g
select new
{
name = g.Key,
count = g.Count()
};
Method syntax
var query = context.People
.GroupBy(p => p.name)
.Select(g => new { name = g.Key, count = g.Count() });

Edit: EF Core 2.1 finally supports GroupBy
But always look out in the console / log for messages. If you see a notification that your query could not be converted to SQL and will be evaluated locally then you may need to rewrite it.
Entity Framework 7 (now renamed to Entity Framework Core 1.0 / 2.0) does not yet support GroupBy() for translation to GROUP BY in generated SQL (even in the final 1.0 release it won't). Any grouping logic will run on the client side, which could cause a lot of data to be loaded.
Eventually code written like this will automagically start using GROUP BY, but for now you need to be very cautious if loading your whole un-grouped dataset into memory will cause performance issues.
For scenarios where this is a deal-breaker you will have to write the SQL by hand and execute it through EF.
If in doubt fire up Sql Profiler and see what is generated - which you should probably be doing anyway.
https://blogs.msdn.microsoft.com/dotnet/2016/05/16/announcing-entity-framework-core-rc2

A useful extension is to collect the results in a Dictionary for fast lookup (e.g. in a loop):
var resultDict = _dbContext.Projects
.Where(p => p.Status == ProjectStatus.Active)
.GroupBy(f => f.Country)
.Select(g => new { country = g.Key, count = g.Count() })
.ToDictionary(k => k.country, i => i.count);
Originally found here:
http://www.snippetsource.net/Snippet/140/groupby-and-count-with-ef-in-c

Here are simple examples of group-by in .NET Core 2.1:
var query = this.DbContext.Notifications
.Where(n => n.Sent == false)
.GroupBy(n => new { n.AppUserId })
.Select(g => new { AppUserId = g.Key, Count = g.Count() });
var query2 = from n in this.DbContext.Notifications
where n.Sent == false
group n by n.AppUserId into g
select new { id = g.Key, Count = g.Count()};
Both of these translate to:
SELECT [n].[AppUserId], COUNT(*) AS [Count]
FROM [Notifications] AS [n]
WHERE [n].[Sent] = 0
GROUP BY [n].[AppUserId]

with EF 6.2 it worked for me
var query = context.People
.GroupBy(p => new {p.name})
.Select(g => new { name = g.Key.name, count = g.Count() });

Related

Selecting an object from the GroupBy key

I'm accustomed to GroupBy() being more of an art than a science, but maybe someone can help me with a very specific problem:
Given the following code
var results = session.Query<MyClass>()
.GroupBy(c => c.OtherPersistentObject)
.Select(group => new
{
key = group.Key,
count = group.Count()
})
.ToList();
The generated query comes out like this:
/* [expression] */select
otherclass_.ID as col_0_0_,
cast(count(*) as INT) as col_1_0_,
otherclass_.ID as id1_1_,
otherclass_.START_DATE as start2_1_,
otherclass_.END_DATE as end3_1_,
otherclass_.Zone as zone9_1_
from
mytable mytable0_
left outer join
otherclass otherclass_
on mytable0_.otherID=otherclass_.ID
group by
mytable0_.otherID
which gives me the SQL error "Column 'otherclass .ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
Is there a way to get the Select to do what I want?
TIA
It's a known NHibernate issue NH-3027.
As a workaround you can use last approach described in this answer (rewrite GroupBy part as sub-query). So your query can be rewritten to something like:
var results = session.Query<MyClass>()
.Where(c => c == session.Query<MyClass>().First(cs => cs.OtherPersistentObject == c.OtherPersistentObject))
.Select(x => new
{
key = x.OtherPersistentObject,
count = session.Query<MyClass>().Count(cs => cs.OtherPersistentObject == x.OtherPersistentObject)
}).ToList();
Try this:
var results = session
.Query<MyClass>()
.GroupBy(c => c.OtherPersistentObject)
.Select(group => new
{
key = group.Key.Id,
count = group.Count()
})
.ToList();
Here you can find the reason for the error.

Combine two queries in linq

I have two linq queries as follows:
GroupNamesWithCorrespondingEffects
= new ObservableCollection<GroupNameWithCorrespondingEffect>(
from g in db.Groups
select new GroupNameWithCorrespondingEffect
{
GroupID = g.GroupID,
GroupName = g.GroupName,
CorrespondingEffect = g.Master_Effects.Effect
}
);
GroupNamesWithCorrespondingEffects
= new ObservableCollection<GroupNameWithCorrespondingEffect>
(GroupNamesWithCorrespondingEffects.
Where(u => !GetAllChildren(25).
Select(x => x.GroupID).
Contains(u.GroupID)).ToList());
Now how can I combine these two queries?
You can pass directly this to the constructor of the ObservableCollection:
from g in groups
let g = select new GroupNameWithCorrespondingEffect
{
GroupID = g.GroupID,
GroupName = g.GroupName,
CorrespondingEffect = g.Master_Effects.Effect
}
where !GetAllChildren(25)
.Select(x => x.GroupID)
.Contains(g.GroupID)
select g
I'm not sure if EF is able to compose the first and the second part (I can't remember from the top of my head if Contains is resolved in an IN clause, my EF is a bit rusty), but you were not doing that anyway, so the effect is the same as yours. If it is able to compose, then this way you are getting a more efficient execution.
If you don't mind mixing SQL-style and extension method syntax, you can do this:
GroupNamesWithCorrespondingEffects
= new ObservableCollection<GroupNameWithCorrespondingEffect>(
(from g in groups
select new GroupNameWithCorrespondingEffect
{ GroupID = g.GroupID,
GroupName = g.GroupName,
CorrespondingEffect = g.Master_Effects.Effect
})
.Where(u => !GetAllChildren(25)
.Select(x => x.GroupID)
.Contains(u.GroupID))
.ToList());

Nhibernate query<T> / queryover<T> orderby a subquery

I am having issues getting Nhibernate 3.3.2.4000 to generate the correct subquery used in the orderby clause as displayed below:
select *
from dbo.Person p inner join dbo.Task t on p.Task_FK = p.TaskId
order by (select p.CustomerNumber where p.IsMain=1) desc
We have two entities: Task and Person
One task can have N persons related to it. I.e Task has an IList property.
How can I make Nhibernate generate the correct subquery ? I have gotten as far as something like this with the Query API:
query = query.OrderBy(x => x.Persons.Single(t => t.CustomerNumber));
but I am unsure how I can correctly generate the where clause as displayed in the original sql query. Is this perhaps easier done using the queryover api somehow?
Any advice or guidance is most welcome.
Task task = null
Person person = null;
var subquery = QueryOver.Of<Task>()
.Where(t => t.Id == task.Id)
.JoinQueryOver(t => t.Persons, () => person)
.Where(p => p.IsMain)
.Select(() => person.CustomerNumber);
var query = session.QueryOver(() => task)
.OrderBy(Projections.SubQuery(subquery))
.FetchMany(x => x.Persons)
return query.List();

Duplicated and unnecessary joins when using Linq in NHibernate

Basically I crossed the same problem of Linq provider in this linq-to-nhibernate-produces-unnecessary-joins
List<Competitions> dtoCompetitions;
dtoCompetitions = (from compset in session.Query<FWBCompetitionSet>()
where compset.HeadLine == true
&& compset.A.B.CurrentSeason == true
select (new Competitions
{
CompetitionSetID = compset.CompetitionSetID,
Name = compset.Name,
Description = compset.Description,
Area = compset.Area,
Type = compset.Type,
CurrentSeason = compset.A.B.CurrentSeason,
StartDate = compset.StartDate
}
)).ToList();
Which leads to duplicated join in its generated SQL
SELECT fwbcompeti0_.competitionsetid AS col_0_0_,
fwbcompeti0_.name AS col_1_0_,
fwbcompeti0_.DESCRIPTION AS col_2_0_,
fwbcompeti0_.area AS col_3_0_,
fwbcompeti0_.TYPE AS col_4_0_,
fwbseason3_.currentseason AS col_5_0_,
fwbcompeti0_.startdate AS col_6_0_
FROM fwbcompetitionset fwbcompeti0_
INNER JOIN A fwbcompeti1_
ON fwbcompeti0_.competitionseasonid = fwbcompeti1_.competitionseasonid
INNER JOIN A fwbcompeti2_
ON fwbcompeti0_.competitionseasonid = fwbcompeti2_.competitionseasonid
INNER JOIN B fwbseason3_
ON fwbcompeti2_.seasonid = fwbseason3_.seasonid
WHERE fwbcompeti0_.headline = #p0
AND fwbseason3_.currentseason = #p1
Notice these joins, which are totally duplicated and also affect my SQL Server's performence.
INNER JOIN A fwbcompeti1_
ON fwbcompeti0_.competitionseasonid = fwbcompeti1_.competitionseasonid
INNER JOIN A fwbcompeti2_
ON fwbcompeti0_.competitionseasonid = fwbcompeti2_.competitionseasonid
Update1
In the NHibernate 3.2, this LiNQ bug is still valid, and I could not find a simple and reasonable Linq solution.
So I used QueryOver + JoinAlias + TransformUsing finishing the job, workds perfect to me.
FWBCompetitionSet compset = null;
FWBCompetitionSeason compseason = null;
FWBSeason season = null;
IList<Competitions> dtoCompetitions;
dtoCompetitions = session.QueryOver<FWBCompetitionSet>(() => compset)
.JoinAlias(() => compset.FWBCompetitionSeason, () => compseason)
.JoinAlias(() => compseason.FWBSeason, () => season)
.Where(() => compset.HeadLine == true)
.And(() => season.CurrentSeason == true)
.SelectList(
list => list
.Select(c => c.CompetitionSetID).WithAlias(() => compset.CompetitionSetID)
.Select(c => c.Name).WithAlias(() => compset.Name)
.Select(c => c.Description).WithAlias(() => compset.Description)
.Select(c => c.Area).WithAlias(() => compset.Area)
.Select(c => c.Type).WithAlias(() => compset.Type)
.Select(c => season.CurrentSeason).WithAlias(() => season.CurrentSeason)
.Select(c => c.StartDate).WithAlias(() => compset.StartDate)
)
.TransformUsing(Transformers.AliasToBean<Competitions>())
.List<Competitions>();
Yet Another Edit:
I think I finally found out what's going on. It seems that the LINQ to NHibernate provider has trouble navigating associations from the target to the source table and generates a separate join each time it encounters such an association.
Since you don't provide your mapping, I used the mapping from linq-to-nhibernate-produces-unnecessary-joins. This model has a Document with one Job and many TranslationUnits. Each TranslationUnit has many Translation entities.
When you try to find a Translation based on a Job, you are traversing the associations in the reverse order and the LINQ provider generates multiple joins: one for Translation -> TranslationUnit and one for TranslationUnit to Document.
This query will generate redundant joins:
session.Query<TmTranslation>()
.Where(x => x.TranslationUnit.Document.Job == job)
.OrderBy(x => x.Id)
.ToList();
If you reverse the navigation order to Document -> TranslationUnit -> Translation, you get a query that doesn't produce any redundant joins:
var items=(from doc in session.Query<Document>()
from tu in doc.TranslationUnits
from translation in tu.Translations
where doc.Job ==job
orderby translation.Id
select translation).ToList();
Given this quirkiness, QueryOver seems like a better option.
Previous Edit:
I suspect the culprit is compset.A.B.CurrentSeason. The first joined table (fwbcompeti1_) returns A.B while the next two (fwbcompeti2_ and fwbseason3_) are used to return A.B. The LINQ to NHibernate provider doesn't seem to guess that A is not used anywhere else and fails to remove it from the generated statement.
Try to help the optimizer a little by replacing CurrentSeason = compset.A.B.CurrentSeason with CurrentSeason = true from the select, since your where statement returns only items with CurrentSeason == true.
EDIT: What I mean is to change the query like this:
List<Competitions> dtoCompetitions;
dtoCompetitions = (from compset in session.Query<FWBCompetitionSet>()
where compset.HeadLine == true
&& compset.A.B.CurrentSeason == true
select (new Competitions
{
CompetitionSetID = compset.CompetitionSetID,
Name = compset.Name,
Description = compset.Description,
Area = compset.Area,
Type = compset.Type,
CurrentSeason = true,
StartDate = compset.StartDate
}
)).ToList();
I simply replace the value compset.A.B.CurrentSeason with true

NHibernate uses value from the first query

So, I have a query like
public static IEnumerable<Archive> GetArchivesRecursive(this ISession session, Page rootPage)
{
var archives = session.Query<Page>().Where(p => p != rootPage && p.Path.StartsWith(rootPage.Path))
.GroupBy(p => new { Year = p.Published.Year, Month = p.Published.Month })
.Select(g => new Archive
{
ContextPageId = rootPage.Id,
Year = g.Key.Year,
Month = g.Key.Month,
TotalPageCount = g.Count(),
PublicPageCount = g.Count(p => p.State == PageState.Public && p.Published <= DateTime.UtcNow)
})
.ToList();
// ContextPageId has old value (id of the first rootPage used since app start)
// Why do I have to do this?
archives.ForEach(a => a.ContextPageId = rootPage.Id);
return archives;
}
For some reason ContextPageId property gets value of the first rootPage parameter that was used.
Well, quite interesting, my NH 3.2 actually fails with MismatchedTreeNodeException for even simplier queries when trying to have value from input in Select inside query. Which version are you using?
Anyway, looks like you just can't use values from outside the query in projection (Select) and this is probably NHibernate's Linq limitation. Your version seems to cache the compiled expression from Select ignoring the fact that it depends from a variable. DateTime value is the same for all calls, too, isn't it?
A bit cleaner workaround could go like this:
.Select(g => new
{
Year = g.Key.Year,
Month = g.Key.Month,
TotalPageCount = g.Count(),
PublicPageCount = g.Count(p => p.State == PageState.Public && p.Published <= DateTime.UtcNow)
})
.AsEnumerable()
.Select(g => new Archive
{
ContextPageId = rootPage.Id,
Year = g.Year,
Month = g.Month,
TotalPageCount = g.TotalPageCount,
PublicPageCount = g.PublicPageCount
})
.ToList();
EDIT I've looked a bit more carefully and this indeed is a NHibernate bug, already known. See this blog post and this JIRA bug entry.