NHibernate SetFirstResult causes duplicate results - sql

We are having trouble with the way NHibernate (version 4.0.0.4000 AND 4.0.4.4000 tested) returns duplicate results. In the sample below, I get 566 results (the correct number of results), but only 549 are unique, meaning there are 17 duplicates.
#region Get Record IDs
public IList<string> GetRecordIds(string user, string agency, DateTime utcFrom, DateTime utcTo, SearchDateRangeType dateRangeType, IEnumerable<string> status, IEnumerable<string> billingStatus, IEnumerable<string> qaStatus, IEnumerable<string> transmissionStatus, IEnumerable<string> scheduledTransmissions, int pageSize = -1, int pageNumber = -1)
{
using (ISession session = NHibernateHelper.OpenSession())
{
ICriteria crit = session.CreateCriteria<Metadata>();
var dateDisjunction = Restrictions.Disjunction();
dateDisjunction.Add(Restrictions.Between("IncidentDate", utcFrom, utcTo));
crit.Add(dateDisjunction);
if (string.IsNullOrEmpty(agency) == false)
{
crit.CreateAlias("Ownership._entities.AsIList", "entities");
crit.Add(Restrictions.Eq("entities._entityName._value", agency));
crit.Add(Restrictions.Eq("entities._isDeleted._value", false) || Restrictions.IsNull("entities._isDeleted._value"));
}
crit.AddOrder(Order.Asc(Projections.Property("RecordId")));
crit.SetProjection(Projections.Property("RecordId"));
if (pageSize > 0 && pageNumber > 0)
{
crit.SetFirstResult(pageSize * (pageNumber - 1)).SetMaxResults(pageSize);
}
var ret = crit.List<string>();
return ret;
}
}
#endregion
SQL Sample 1 is the generated first iteration code from NHibernate. Subsequent pages (second page onward) use ROW_NUMBER() OVER. SQL Sample 2 is a manually-created first page, which uses ROW_NUMBER() OVER as if it was a subsequent page. NHibernate has apparently "optimized" away the ROW_NUMBER() OVER for the first page and that seems(?) to be the cause of our issues.
SQL Sample 1: Generated by NHibernate. Causes duplicates.
SELECT
TOP (100) this_.RecordId as y0_
FROM
PcrMetadata this_
inner join
PcrEntities entities1_
on this_.Id=entities1_.ListKey
WHERE
(
this_.IncidentDate between '0001-01-01 00:00:00.0000000' and '9999-01-01 00:00:00.0000000'
)
and entities1_.Name = 'ClientIDNumber'
and (
entities1_.Entities_IsDeleted = 0
or entities1_.Entities_IsDeleted is null
)
SQL Sample 2: Manually created based on NHibernate second page on. Does not cause duplicates.
SELECT
TOP (100) this_.RecordId as y0_
FROM
(SELECT
this_.Record as y0_,
ROW_NUMBER() OVER(
ORDER BY
CURRENT_TIMESTAMP) as __hibernate_sort_row
FROM
PcrMetadata this_
inner join
PcrEntities entities1_
on this_.Id=entities1_.ListKey
WHERE
(
this_.IncidentDate between '0001-01-01 00:00:00.0000000' and '9999-01-01 00:00:00.0000000'
)
and entities1_.Name = 'ClientIDNumber'
and (
entities1_.Entities_IsDeleted = 0
or entities1_.Entities_IsDeleted is null
)) as query
WHERE
query.__hibernate_sort_row > 0 -- CHANGE THIS NUMBER
Am I doing something wrong? Or there anything I can do to force NHibernate to use ROW_NUMBER?
Thanks in advance for any help!

We cannot JOIN collections and apply paging. Because we are getting cartesian product, which is paged (experience described above).
The solution I would suggest, is to (my way NEVER) join collection. To get the similar results, we should:
use subquery to apply WHERE
use fetch batching to later recieve all collection items without 1 + N issue
There is detailed answer about this issue.
see also:
How to Eager Load Associations without duplication in NHibernate?
What is the solution for the N+1 issue in hibernate?
There is more about making result distinct, but this could not help here:
Criteria.DISTINCT_ROOT_ENTITY vs Projections.distinct

Related

Why is this query too slow? Getting list of A with one associated foreign-keyed B for each item

I have Website(Id) table, each record may have multiple CheckLog(FK WebsiteId) entries associated. CheckLog also has a compound index of [WebsiteId, CreatedTime]. Website has only around 20 records but overtime CheckLog would grow, 3 millions entries at the time I have this problem. (See schema using EF Core at the end of the question).
A frequent query I have is to query the list of all Websites, along with zero/one latest CheckLog record:
return await this.ctx.Websites.AsNoTracking()
.Select(q => new WebsiteListItem()
{
Website = q,
LatestCheckLog = q.CheckLogs
.OrderByDescending(q => q.CreatedTime)
.FirstOrDefault(),
})
.ToListAsync();
I believe the [WebsiteId, CreatedTime] index should help. However, the query takes around 11s to execute. Here's the translated query, along with EXPLAIN QUERY PLAN:
SELECT "w"."Id", "t0"."Id", "t0"."CreatedTime", "t0"."WebsiteId"
FROM "Websites" AS "w"
LEFT JOIN (
SELECT "t"."Id", "t"."CreatedTime", "t"."WebsiteId"
FROM (
SELECT "c"."Id", "c"."CreatedTime", "c"."WebsiteId", ROW_NUMBER() OVER(PARTITION BY "c"."WebsiteId" ORDER BY "c"."CreatedTime" DESC) AS "row"
FROM "CheckLogs" AS "c"
) AS "t"
WHERE "t"."row" <= 1
) AS "t0" ON "w"."Id" = "t0"."WebsiteId"
MATERIALIZE 1
CO-ROUTINE 4
SCAN TABLE CheckLogs AS c USING INDEX IX_CheckLogs_WebsiteId_CreatedTime
USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
SCAN SUBQUERY 4
SCAN TABLE Websites AS w
SEARCH SUBQUERY 1 AS t USING AUTOMATIC COVERING INDEX (WebsiteId=?)
Is this fixable with Index? If not, is there an efficient way to query it without creating N+1 queries? I tried to think of a way to do that with 2 queries but can't think of any better way to translate it the way EF Core does).
Also I believe this is a very common problem but I don't know what keyword I should use to find out solution for this kind of problem. I am okay with a general solution for this kind of problem (i.e. get the latest Product of a list of Categories). Thank you.
I use EF Core for DB Schema:
public class Website
{
public int Id { get; set; }
// Other properties
public ICollection<CheckLog> CheckLogs { get; set; }
}
[Index(nameof(CreatedTime))]
[Index(nameof(WebsiteId), nameof(CreatedTime))]
public class CheckLog
{
public int Id { get; set; }
public DateTime CreatedTime { get; set; }
public int WebsiteId { get; set; }
public Website Website { get; set; }
// Other properties
}
If what you want is to get the row with the latest CreatedTime for each WebsiteId then there is no need for any join.
Just aggregate and set the condition:
HAVING MAX(CreatedTime)
This is not standard SQL, but utilizes SQLite's bare columns:
SELECT *
FROM CheckLogs
GROUP BY WebsiteId
HAVING MAX(CreatedTime);
If you want to join it to Websites:
SELECT w.Id, t.Id, t.CreatedTime, t.WebsiteId
FROM Websites AS w
LEFT JOIN (
SELECT *
FROM CheckLogs
GROUP BY WebsiteId
HAVING MAX(CreatedTime)
) AS t ON w.Id = t.WebsiteId;
Thanks to this answer, I found out how to rewrite the query:
SELECT L.*
FROM CheckLogs L
INNER JOIN
(SELECT WebsiteId, Max(CreatedTime) AS CreatedTime
FROM CheckLogs
GROUP BY WebsiteId) L2
ON L.WebsiteId = L2.WebsiteId AND L.CreatedTime = L2.CreatedTime
Since EF Core cannot translate JOIN query, I rewrote the code this way, 2 round trip to database server. Note that since WebsiteIds are numbers, it's safe to put it this way, if you put in string parameters, you need to sanitize them.
var websites = await query.ToListAsync();
var websiteIds = websites.Select(q => q.Id).ToList();
var websiteIdsString = string.Join(",", websiteIds);
var logQuery = this.ctx.CheckLogs.FromSqlRaw(#$"
SELECT L.*
FROM CheckLogs L
INNER JOIN
(SELECT WebsiteId, Max(CreatedTime) AS CreatedTime
FROM CheckLogs
GROUP BY WebsiteId) L2
ON L.WebsiteId = L2.WebsiteId AND L.CreatedTime = L2.CreatedTime
WHERE L.WebsiteId IN ({websiteIdsString})");
var logs = await logQuery.AsNoTracking().ToListAsync();
var logDict = logs.ToLookup(q => q.WebsiteId);
return websites.Select(q => new WebsiteListItem()
{
Website = q,
LatestCheckLog = logDict[q.Id].FirstOrDefault(),
}).ToList();
First try this query:
SELECT MAX("c"."CreatedTime"), "c"."WebsiteId"
FROM "CheckLogs" AS "c"
GROUP BY "c"."WebsiteId"
It will be hard to optimize your query to be faster than the above.
If it is fast, then there is hope.
you could try:
with logs as (
select max("c"."Id") "Id", max("c"."CreatedTime") "CreatedTime", "c"."WebsiteId"
from "CheckLogs" AS "c"
group by "c"."WebsiteId"
having count(*) = 1)
select "w"."Id", "l"."Id", "l"."CreatedTime", "l"."WebsiteId"
from "Websites" AS "w"
join logs as "l" on "w"."Id" = "l"."WebsiteId"
union all
select "w"."Id", null, null, null
from "Websites" AS "w"
where not exists (
select 'x'
from "CheckLogs" AS "l"
where "l"."WebsiteId" = "w"."Id")
You could also try the two halves of union all above to see how fast they are.
If still slow you can add a new table with the result you are looking for and create trigger on "CheckLogs" to populate it, so the data is ready.

Force LINQ to SQL to use RowNumber() instead of Top n When Using .Skip(0)

Is there a way to force LINQ to SQL to avoid using TOP X when using Skip(0)? I have a query that runs just fine for every paged result...except for page 1. I've profiled the query and the introduction of a TOP clause just kills it. I'm perplexed on why that is, but it just does. However, using RowNumber Between 1 AND 10 works just fine.
Is there a way to force LINQ to SQL to avoid using TOP X when using Skip(0)? I have a query that runs just fine for every paged result...except for page 1. I've profiled the query and the introduction of a TOP clause just kills it. I'm perplexed on why that is, but it just does. However, using RowNumber Between 1 AND 10 works just fine.
The culprit seems to be an EXISTS condition in my WHERE clause. The produced SQL is below. In SQL Manager, this query runs fine and returns 14 results...however it times out once I add a TOP 10 (as LINQ would do). However, if I comment the EXISTS in my where clause, then the problem goes away.
SELECT
t0.ProtectiveOrderID,
t3.DocketID,
t3.DocketNumber AS CaseNumber,
t3.PartySuffix AS CaseNumberSuffix,
t5.FirstName AS RespondentNameFirst,
t5.MiddleName AS RespondentNameMiddle,
t5.LastName AS RespondentNameLast,
t5.NameSuffix AS RespondentNameSuffix,
t4.FirstName AS ProtectedNameFirst,
t4.MiddleName AS ProtectedNameMiddle,
t4.LastName AS ProtectedNameLast,
t4.NameSuffix AS ProtectedNameSuffix,
t3.ChildNextFriendFirstName AS ChildNextFriendNameFirst,
t3.ChildNextFriendMiddleName AS ChildNextFriendNameMiddle,
t3.ChildNextFriendLastName AS ChildNextFriendNameLast,
t3.ChildNextFriendNameSuffix
FROM dbo.ProtectiveOrder AS t0
INNER JOIN (
SELECT MAX(t1.ProtectiveOrderID) AS value
FROM dbo.ProtectiveOrder AS t1
GROUP BY t1.DocketID
) AS t2 ON t0.ProtectiveOrderID = t2.value
LEFT OUTER JOIN dbo.Docket AS t3 ON t3.DocketID = t0.DocketID
LEFT OUTER JOIN dbo.Directory AS t4 ON t4.DirectoryID = t3.ProtectedPartyID
LEFT OUTER JOIN dbo.Directory AS t5 ON t5.DirectoryID = t3.SubjectID
WHERE
(
((t4.LastName LIKE 'smith%') AND (t4.FirstName LIKE 'jane%'))
OR ((t5.LastName LIKE 'smith%') AND (t5.FirstName LIKE 'jane%'))
OR ((t3.ChildNextFriendLastName LIKE 'smith%') AND (t3.ChildNextFriendFirstName LIKE 'jane%'))
OR (
-- ***************
-- THIS GUY KILLS THE QUERY WHEN A TOP IS INTRODUCED IN THE TOP-LEVEL SELECT
-- ***************
EXISTS(
SELECT NULL AS EMPTY
FROM dbo.Child AS t6
WHERE (t6.LastName LIKE 'smith%') AND (t6.FirstName LIKE 'jane%') AND (t6.DocketID = t3.DocketID)
)
)
)
ORDER BY t3.DocketNumber
Override the Skip method and just check the input for zero. For any value but zero call the original skip method. For zero don't.
so if you modify the Skip provided in dynamic.cs you could do:
public static IQueryable Skip(this IQueryable source, int count)
{
if (count == 0)
{
return source;
}
if (source == null) throw new ArgumentNullException("source");
return source.Provider.CreateQuery(
Expression.Call(
typeof(Queryable), "Skip",
new Type[] { source.ElementType },
source.Expression, Expression.Constant(count)));
}

NHibernate 3 paging and determining the total number of rows

I have read somewhere (can't remeber where and how) that NHibernate 3 allows the determination of the total number of records whilst performing a paged query (in one database query). Is this right?
I have this code:
public IEnumerable<X> GetOrganisms(int PageSize, int Page, out int total)
{
var query = (from e in Session.Query<X>() select e).AsQueryable();
return query.Skip((Page - 1) * PageSize).Take(PageSize).ToList();
}
and would like to initialise 'total' as efficiently as possible.
Thanks.
Christian
PS:
Potential 'solution'?:
Total = (int) Session.CreateCriteria<X>()
.SetProjection(Projections.RowCount())
.FutureValue<Int32>().Value;
var query = (from e in Session.Query<X>() select e).AsQueryable();
return query.Skip((Page - 1) * PageSize).Take(PageSize).ToList();
Your potential solution will be handled in one transaction, but will be two db calls. If you must have only one db call, you should use a multiquery/future query as peer suggested. For more information on the future syntax, check out this post: http://ayende.com/blog/3979/nhibernate-futures.
Here are a few ways to accomplish your scenario...
QueryOver (2 db calls):
var query = session.QueryOver<Organism>();
var result = query
.Skip((Page - 1) * PageSize)
.Take(PageSize)
.List();
var rowcount = query.RowCount();
With a sample set of 100 Organisms, and querying for the organisms 11-20, here are the two queries sent to the db:
SELECT TOP (#p0) Id0_0_, Title0_0_ FROM (SELECT this_.Id as Id0_0_, this_.Title as Title0_0_, ROW_NUMBER() OVER(ORDER BY CURRENT_TIMESTAMP) as __hibernate_sort_row FROM Organism this_) as query WHERE query.__hibernate_sort_row > #p1 ORDER BY query.__hibernate_sort_row;#p0 = 10 [Type: Int32 (0)], #p1 = 10 [Type: Int32 (0)]
SELECT count(*) as y0_ FROM Organism this_
QueryOver (1 db call with Future):
var query = session.QueryOver<Organism>()
.Skip((Page - 1) * PageSize)
.Take(PageSize)
.Future<Organism>();
var result = query.ToList();
var rowcount = session.QueryOver<Organism>()
.Select(Projections.Count(Projections.Id()))
.FutureValue<int>().Value;
Querying for the same set of data as before, this is the query that is generated:
SELECT TOP (#p0) Id0_0_, Title0_0_ FROM (SELECT this_.Id as Id0_0_, this_.Title as Title0_0_, ROW_NUMBER() OVER(ORDER BY CURRENT_TIMESTAMP) as __hibernate_sort_row FROM Organism this_) as query WHERE query.__hibernate_sort_row > #p1 ORDER BY query.__hibernate_sort_row;SELECT count(this_.Id) as y0_ FROM Organism this_;;#p0 = 10 [Type: Int32 (0)], #p1 = 10 [Type: Int32 (0)]
Criteria(1 db call with Future):
var criteria = session.CreateCriteria<Organism>()
.SetFirstResult((Page - 1) * PageSize)
.SetMaxResults(PageSize)
.Future<Organism>();
var countCriteria = session.CreateCriteria<Organism>()
.SetProjection(Projections.Count(Projections.Id()))
.FutureValue<int>().Value;
Again, querying for the same set of data, criteria with future results in the same query:
SELECT TOP (#p0) Id0_0_, Title0_0_ FROM (SELECT this_.Id as Id0_0_, this_.Title as Title0_0_, ROW_NUMBER() OVER(ORDER BY CURRENT_TIMESTAMP) as __hibernate_sort_row FROM Organism this_) as query WHERE query.__hibernate_sort_row > #p1 ORDER BY query.__hibernate_sort_row;SELECT count(this_.Id) as y0_ FROM Organism this_;;#p0 = 10 [Type: Int32 (0)], #p1 = 10 [Type: Int32 (0)]
Notice that all three query styles result in the same exact queries. The future syntax simply allows NHibernate to make one database call rather than two.
If you are using NHibernate 3, I think the most elegant way to handle this is using the new QueryOver syntax. (You used the old NHibernate.Linq syntax in your proposed solution. You are better off to learn the QueryOver syntax instead.)
I don't have enough reputation to comment on CodeProgression's solution above...but the proper ONE DB call using QueryOver w/ Future<> is:
var query = session.QueryOver<Organism>()
.Skip((Page - 1) * PageSize)
.Take(PageSize)
.Future<Organism>();
// var result = query.ToList();
var rowcount = session.QueryOver<Organism>()
.Select(Projections.Count(Projections.Id()))
.FutureValue<int>().Value;
var result = query.ToList();
int iRowCount = rowcount.Value();
Once you execute the .ToList() - It will hit the database. So you'd have to hit the database again to get the rowCount...Which defeats the purpose of Future<>. Do your ToList() AFTER you've done all your .Future<> queries.
I do not think nhibernate 'realizes' the meaning of any query it performs, so determine the total number of rows is not standard queried.
The most efficient way to get the row count is with futures or a IMultiQuery (to get all results in one roundtript to the database)
nhibernate-futures

Pagination with total row count in NHibernate

I am trying to paginate a simple query using HQL, and retrieve the total row count as part of the same query.
My query is simple enough...
var members = UnitOfWork.CurrentSession.CreateQuery(#"
select m
from ListMember as m
join fetch m.Individual as i")
.SetFirstResult(pageIndex*pageSize)
.SetMaxResults(pageSize)
.List<ListMember>();
The Individual is mapped as a many-to-one on the ListMember class. This works great. The pagination works as expected and generates the following Sql...
SELECT TOP ( 10 /* #p0 */ ) DirPeerG1_1_0_,
Director1_0_1_,
Director2_1_0_,
Forename2_0_1_,
Surname0_1_
FROM (SELECT listmember0_.DirPeerGrpMemberID as DirPeerG1_1_0_,
listmember1_.DirectorKeyID as Director1_0_1_,
listmember0_.DirectorKeyId as Director2_1_0_,
listmember1_.Forename1 as Forename2_0_1_,
listmember1_.Surname as Surname0_1_,
ROW_NUMBER()
OVER(ORDER BY CURRENT_TIMESTAMP) as __hibernate_sort_row
FROM tblMembers listmember0_
inner join tblIndividuals listmember1_
on listmember0_.DirectorKeyId = listmember1_.DirectorKeyID) as query
WHERE query.__hibernate_sort_row > 10 /* #p1 */
ORDER BY query.__hibernate_sort_row
I read this article posted by Ayende called Paged data + Count(*) with NHibernate: The really easy way!, so I tried to implement it in my query.
I followed the steps in the article to add the custom HQL function called rowcount(), and changed my query to this...
var members = UnitOfWork.CurrentSession.CreateQuery(#"
select m, rowcount()
from ListMember as m
join fetch m.Individual as i")
.SetFirstResult(pageIndex*pageSize)
.SetMaxResults(pageSize)
.List<ListMember>();
The Sql that is generated is almost correct, however it includes one of the columns twice resulting in this error...
System.Data.SqlClient.SqlException:
The column '...' was specified
multiple times for 'query'.
The Sql it generates looks like this...
SELECT TOP ( 10 /* #p0 */ )
col_0_0_,
col_1_0_,
Director1_0_1_,
DirPeerG1_1_0_,
Director1_0_1_,
Director2_1_0_,
Forename2_0_1_,
Surname0_1_
FROM (SELECT
listmember0_.DirPeerGrpMemberID as col_0_0_,
count(*) over() as col_1_0_,
listmember1_.DirectorKeyID as Director1_0_1_,
listmember0_.DirPeerGrpMemberID as DirPeerG1_1_0_,
listmember1_.DirectorKeyID as Director1_0_1_,
listmember0_.DirectorKeyId as Director2_1_0_,
listmember1_.Forename1 as Forename2_0_1_,
listmember1_.Surname as Surname0_1_,
ROW_NUMBER()
OVER(ORDER BY CURRENT_TIMESTAMP) as __hibernate_sort_row
FROM RCMUser.dbo.tblDirPeerGrpMembers listmember0_
inner join RCMAlpha.dbo.tblDirectorProfileDetails listmember1_
on listmember0_.DirectorKeyId = listmember1_.DirectorKeyID) as query
WHERE query.__hibernate_sort_row > 10 /* #p1 */
ORDER BY query.__hibernate_sort_row
For some reason it includes the Director1_0_1_ column twice in the projection, which causes this error. This Sql is frustratingly close to what I would like, and I’m hoping an NHibernate expert out there can help explain why this would happen.
Suggestions Tried
Thanks to the suggestion from #Jason . I tried it with the non-generic version of .List() method to execute the query but this unfortunately also produced the same Sql with the duplicate column...
var members = UnitOfWork.CurrentSession.CreateQuery(#"
select m, rowcount()
from ListMember as m
join fetch m.Individual as i")
.SetFirstResult(pageIndex * pageSize)
.SetMaxResults(pageSize)
.List()
.Cast<Tuple<ListMember, int>>()
.Select(x => x.First);
Update
It doesn't look like this is going to be possible without getting into the NH source code. My solution requirements have changed and I am no longer going to pursue the answer.
In summary, the solution would be to either...
Use Futures or MultiQuery to execute two statements in a single command - one to retrieve the page of data and one the total row count.
Modify your pagination solution to do without a total result count - Continuous scrolling for example.
Hmm, one issue is that you're using a ListMember-typed List method. In the example at the page you linked, he uses List() which returns a list of tuples. The first item of your tuple would be a ListMember and the second would be the row count. That List<> might affect your query and would probably throw an exception even if it did return.
Try using:
var tuples = UnitOfWork.CurrentSession.CreateQuery(#"
select m, rowcount()
from ListMember as m
join fetch m.Individual as i")
.SetFirstResult(pageIndex*pageSize)
.SetMaxResults(pageSize)
.List();
var members = tuples.Select<Tuple<ListMember, int>, ListMember>(x => x.Item1);
but I kinda agree with #dotjoe. A MultiQuery might be easier. It's what I use. Here's a a good link about it from the same author you linked to before (Ayende).

LINQ to SQL Every Nth Row From Table

Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.
Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);
Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1
Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));
Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.
There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!
This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.