LINQ thinks I need an extra INNER JOIN, but why? - sql

I have a LINQ query, which for some reason is generating an extra/duplicate INNER JOIN. This is causing the query to not return the expected output. If I manually comment that extra JOIN from the generated SQL, then I get seemingly correct output.
Can you detect what I might have done in this LINQ to have caused this extra JOIN?
Thanks.
Here is my approx LINQ
predicate=predicate.And(condition1);
predicate1=predicate1.And(condition2);
predicate1=predicate1.And(condition3);
predicate2=predicate2.Or(predicate1);
predicate=predicate.And(predicate2);
var ids = context.Code.Where(predicate);
var rs = from r in ids
group r by r.PersonID into g
let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count()
where matchcount ==2
select new
{
personid = g.Key
};
and here is the generated SQL (the duplicate join is [t7])
Declare #p1 VarChar(10)='Home'
Declare #p2 VarChar(10)='111'
Declare #p3 VarChar(10)='Office'
Declare #p4 VarChar(10)='222'
Declare #p5 int=2
SELECT [t9].[PersonID] AS [pid]
FROM (
SELECT [t3].[PersonID], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t7].[PhoneValue]
FROM [dbo].[Person] AS [t4]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t5] ON [t5].[PersonID] = [t4].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t6] ON [t6].[Code] = [t5].[PhoneType]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t7] ON [t7].[PersonID] = [t4].[PersonID]
WHERE ([t3].[PersonID] = [t4].[PersonID]) AND ([t6].[Enumeration] = #p0) AND ((([t6].[CodeDescription] = #p1) AND ([t5].[PhoneValue] = #p2)) OR (([t6].[CodeDescription] = #p3) AND ([t5].[PhoneValue] = #p4)))
) AS [t8]
) AS [value]
FROM (
SELECT [t0].[PersonID]
FROM [dbo].[Person] AS [t0]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t1] ON [t1].[PersonID] = [t0].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t2] ON [t2].[Code] = [t1].[PhoneType]
WHERE ([t2].[Enumeration] = #p0) AND ((([t2].[CodeDescription] = #p1) AND ([t1].[PhoneValue] = #p2)) OR (([t2].[CodeDescription] = #p3) AND ([t1].[PhoneValue] = #p4)))
GROUP BY [t0].[PersonID]
) AS [t3]
) AS [t9]
WHERE [t9].[value] = #p5

They aren't being duplicated. You are asking for two different values from the data source.
let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count()
is causing
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t7].[PhoneValue]
FROM [dbo].[Person] AS [t4]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t5] ON [t5].[PersonID] = [t4].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t6] ON [t6].[Code] = [t5].[PhoneType]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t7] ON [t7].[PersonID] = [t4].[PersonID]
WHERE ([t3].[PersonID] = [t4].[PersonID]) AND ([t6].[Enumeration] = #p0) AND ((([t6].[CodeDescription] = #p1) AND ([t5].[PhoneValue] = #p2)) OR (([t6].[CodeDescription] = #p3) AND ([t5].[PhoneValue] = #p4)))
) AS [t8]
and
from r in ids
group r by r.PersonID into g
is causing
SELECT [t0].[PersonID]
FROM [dbo].[Person] AS [t0]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t1] ON [t1].[PersonID] = [t0].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t2] ON [t2].[Code] = [t1].[PhoneType]
WHERE ([t2].[Enumeration] = #p0) AND ((([t2].[CodeDescription] = #p1) AND ([t1].[PhoneValue] = #p2)) OR (([t2].[CodeDescription] = #p3) AND ([t1].[PhoneValue] = #p4)))
GROUP BY [t0].[PersonID]
) AS [t3]
as for the INNER JOINS, the reason you are getting them is because of the relationship between those tables. For instance Person is 1..1 with PersonPhoneNumber (or 1..*). In either case I assume PersonID on PersonPhoneNumber is an FK and a PK value. So in that case the data source has to go out to that external table to see if the value for the PersonPhoneNumber navigation property actually exists. It does this by performing an INNER JOIN on that table.

My gut feeling is that the .DISTINCT().COUNT() is treated separately by the linq to sql translation.
I'd also wager that the execution plan on SQL just threw out the dupe.

Try to rewrite with explicit condition instead of thah abstract "predicate" construction. From what I see in SQL that composition might look weird to a parser in isolation and one join [t5] which you just called dupe :-) is there to serve that condition.
Also, try to tell us what tit you really want to find with that query and try to write normal SQL that does what you wanted. I'm supposed to be human :-) and it look weird to me as well :-))
Technically speaking, you forced double joint by using a condition on in in two separate queries (every var assignment it technically separate query).
Also doing group by a column without doing any aggregation is not alway equivalent to select distinct. In particular select distinct on a join is allowed to take precedence over a join - queries are declatative (can undergo reorderings) and you were trying to force it to be procedural. So LINQ gave you exact procedural :-) and then SQL reordered according to SQL rules :-))
So, just write normal SQL first, and if you can't LINQ-ize it put it into sproc - it's going to make it faster anyway :-)

Related

The generated query by EF Core runs very slow in some conditions

We are using EF core in our application
In one of the Services, the EF core generates the below TSQL and runs. but it is prolonged!
exec sp_executesql N'SELECT [di].[Id] AS [Key], [di].[Code], [di].[DocumentTypeRef], [di].[IsActive], [di].[IsVisible], [di].[IsGlobal], [di].[IsPrintable], [di].[RecordPrefix], [di].[Owner_PersonRef], [owner].[FullName] AS [DocumentInfoOwnerFullName], [dil].[_Title] AS [ThisIsA_Title], [dil].[_Description] AS [_Description0], [docw].[WorkbenchRef], [t].[_Title] AS [_WorkbenchTitle], [t].[_Description] AS [_WorkbenchDescription], [docc].[CompanyRef], [cl].[_Title] AS [_CompanyTitle], [dt].[IsRecordable] AS [DocumentTypeIsRecordable], [dt].[ReviewCycle] AS [DocumentTypeReviewCycle], [dtl].[_Title] AS [_DocumentTypeTitle], [dtl].[_Description] AS [_DocumentTypeDescription], [t0].[Id] AS [DocumentVersionKey], [t0].[Creator_PersonRef] AS [DocumentVersionCreator_PersonRef], [t0].[EffectiveDate] AS [DocumentVersionEffectiveDate], [t0].[ExpireDate] AS [DocumentVersionExpireDate], [t0].[IsActive] AS [DocumentVersionIsActive], [t0].[PublishDate] AS [DocumentVersionPublishDate], [t0].[VersionNo] AS [DocumentVersionVersionNo], [t0].[ReviewDate] AS [DocumentVersionReviewDate], [t0].[ItemRowRef_DocVersionState], [t1].[FullName] AS [DocumentVersionCreatorFullName], [t2].[_Title] AS [_DocVersionStateTitle]
FROM [QMS].[DocumentInfo] AS [di]
INNER JOIN [QMS].[DocumentInfoLanguage] AS [dil] ON [di].[Id] = [dil].[DocumentInfoRef]
INNER JOIN [QMS].[DocumentCompany] AS [docc] ON [di].[Id] = [docc].[DocumentInfoRef]
INNER JOIN [HRM].[CompanyLanguage] AS [cl] ON [docc].[CompanyRef] = [cl].[CompanyRef]
INNER JOIN [QMS].[DocumentType] AS [dt] ON [di].[DocumentTypeRef] = [dt].[Id]
INNER JOIN [QMS].[DocumentTypeLanguage] AS [dtl] ON [dt].[Id] = [dtl].[DocumentTypeRef]
INNER JOIN [BAS].[PersonLanguage] AS [owner] ON [di].[Owner_PersonRef] = [owner].[PersonRef]
LEFT JOIN [QMS].[DocumentWorkbench] AS [docw] ON [di].[Id] = [docw].[DocumentInfoRef]
LEFT JOIN (
SELECT [x].*
FROM [QMS].[WorkbenchLanguage] AS [x]
WHERE [x].[LanguageRef] = #__languageRef_0
) AS [t] ON [docw].[WorkbenchRef] = [t].[WorkbenchRef]
LEFT JOIN (
SELECT [x0].*
FROM [QMS].[DocumentVersion] AS [x0]
WHERE ([x0].[ExpireDate] IS NULL OR ([x0].[ExpireDate] > GETDATE())) AND ([x0].[EffectiveDate] < GETDATE())
) AS [t0] ON [di].[Id] = [t0].[DocumentInfoRef]
LEFT JOIN (
SELECT [x1].*
FROM [BAS].[PersonLanguage] AS [x1]
WHERE [x1].[LanguageRef] = #__languageRef_1
) AS [t1] ON [t0].[Creator_PersonRef] = [t1].[PersonRef]
LEFT JOIN (
SELECT [x2].*
FROM [BAS].[ItemRowLanguage] AS [x2]
WHERE [x2].[LanguageRef] = #__languageRef_2
) AS [t2] ON [t0].[ItemRowRef_DocVersionState] = [t2].[ItemRowRef]
WHERE (((((([dil].[LanguageRef] = #__languageRef_3) AND ([cl].[LanguageRef] = #__languageRef_4)) AND ([dtl].[LanguageRef] = #__languageRef_5)) AND ([owner].[LanguageRef] = #__languageRef_6))
AND (CHARINDEX(N''we'', [dil].[_Title]) > 0))
AND [docc].[CompanyRef] IN (CAST(3 AS smallint))) AND ([di].[IsVisible] = 1)
ORDER BY (SELECT 1)
OFFSET #__p_8 ROWS FETCH NEXT #__p_9 ROWS ONLY',N'#__languageRef_0 int,#__languageRef_1 int,#__languageRef_2 int,#__languageRef_3 int,#__languageRef_4 int,#__languageRef_5 int,#__languageRef_6 int,#__p_8 int,#__p_9 int',#__languageRef_0=1,#__languageRef_1=1,#__languageRef_2=1,#__languageRef_3=1,#__languageRef_4=1,#__languageRef_5=1,#__languageRef_6=1,#__p_8=0,#__p_9=25
I got this query from SQL Profiler and tried to run it on SSMS
It ran again, but again it was slow!
I tried to find the problem. after a while I realized when I ignored some portion of the query, the query runs fast! for example when I deleted some of the JOINs, everything runs perfectly, or when I deleted the left hand of WHERE condition, again everything was OK! Even when I replaced the CHARINDEX with LIKE, again! the query ran fast.
I finally realized that the query is only running slowly if these combinations are placed together, which is very strange
It is possible that I am wrong. But no matter how hard I tried, I did not understand the reason for this behavior
Now can anyone help to understand this problem and find a solution for it?

Why do I get this unexpected SQL performance gain?

This is more a quiz question rather than me panicking over a deadline, however understanding how/why would no doubt let me scratch my head a little less!
So I have this UPDATE statement:
/*** #Table is a TABLE Variable ***/
UPDATE O
SET O.PPTime = T.PPTime
FROM #Table AS [O]
INNER JOIN
(SELECT O.OSID, O.STID, DATEDIFF(SECOND, O.StartDateTime, O.EndDateTime) AS [PPTime]
FROM tblO AS [O]
INNER JOIN tblS AS [S] ON O.OSID = S.OSID
INNER JOIN tblE AS [E] ON S.EID = E.EID
INNER JOIN tblEF AS [EF] ON E.EFID = EF.EFID
GROUP BY O.OSID, O.STID, O.StartDateTime, O.EndDateTime) AS [T]
ON O.OSID = T.OSID
WHERE O.PPTime IS NULL
The execution time is approximately 12 seconds.
Now below I have added in a small WHERE statement which does not have any impact on how many rows of data are returned to the user:
/*** #Table is a TABLE Variable ***/
UPDATE O
SET O.PPTime = T.PPTime
FROM #Table AS [O]
INNER JOIN
(SELECT O.OSID, O.STID, DATEDIFF(SECOND, O.StartDateTime, O.EndDateTime) AS [PPTime]
FROM tblO AS [O]
INNER JOIN tblS AS [S] ON O.OSID = S.OSID
INNER JOIN tblE AS [E] ON S.EID = E.EID
INNER JOIN tblEF AS [EF] ON E.EFID = EF.EFID
WHERE O.OSID >= 0 /*** Somehow fixes performance slow down! ***/
GROUP BY O.OSID, O.STID, O.StartDateTime, O.EndDateTime) AS [T]
ON O.OSID = T.OSID
WHERE O.PPTime IS NULL
The execution time is now less than a second. If I run both SELECT statements individually, they execute in the same time and return the same data.
Why do I get such a performance gain?
After reviewing the code, I noticed that adding a Primary Key and/or indexing to the table variable done the trick! One for me to remember!

Strange performance issue with SELECT (SUBQUERY)

I have a stored procedure that has been having some issues lately and I finally narrowed it down to 1 SELECT. The problem is I cannot figure out exactly what is happening to kill the performance of this one query. I re-wrote it, but I am not sure the re-write is the exact same data.
Original Query:
SELECT
#userId, p.job, p.charge_code, p.code
, (SELECT SUM(b.total) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, p.ponumber
, p.billable
, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = #userId
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
This query will practically never finish running. It actually times out for some larger jobs we have.
And if I change the 2 subqueries in the select to read like joins instead:
SELECT
#userid, p.job, p.charge_code, p.code
, (SELECT SUM(b.TOTAL))
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX))
, p.ponumber, p.billable, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = 11190030
INNER JOIN [BACKORDER W/TOTAL] b
ON P.PONUMBER = b.ponumber AND P.code = b.code
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
The data comes out looking very nearly identical to me (though there are thousands of lines overall so I could be wrong), and it runs very quickly.
Any ideas appreciated..
Performace
In the second query you have less logical reads because the table [BACKORDER W/TOTAL] has been scanned only once. In the first query two separate subqueries are processed indenpendent and the table is scanned twice although both subqueries have the same predicates.
Correctness
If you want to check if two queries return the same resultset you can use the EXCEPT operator:
If both statements:
First SELECT Query...
EXCEPT
Second SELECT Query...
and
Second SELECT Query..
EXCEPT
First SELECT Query...
return an empty set the resultsets are identical.
In terms of correctness, you are inner joining [BACKORDER W/TOTAL] in the second query, so if the first query has Null values in the subqueries, these rows would be missing in the second query.
For performance, the optimizer is a heuristic - it will sometimes use spectacularly bad query plans, and even minimal changes can sometimes lead to a completely different query plan. Your best chance is to compare the query plans and see what causes the difference.

EF 4.1 code-first: difference between EF generated sql and custom sql

I have a question about sql generated by Entity Framework and the hand-writed one. In my project, I have some entities (they aren't important really for this Q), for a simple example, when I use this code:
var query = context.Employees.Select(e => new {
PersonalCode = e.PersonelCode,
Fname = e.Person.Fname,
Family = e.Person.Family,
Email = e.Person.Emails
});
the generated sql is something like this:
SELECT
[Project1].[EmployeeID] AS [EmployeeID],
[Project1].[EmployeeID1] AS [EmployeeID1],
[Project1].[PersonID] AS [PersonID],
[Project1].[EmployeeID2] AS [EmployeeID2],
[Project1].[PersonID1] AS [PersonID1],
[Project1].[PersonelCode] AS [PersonelCode],
[Project1].[Fname] AS [Fname],
[Project1].[Family] AS [Family],
[Project1].[C1] AS [C1],
[Project1].[EmailID] AS [EmailID],
[Project1].[Mail] AS [Mail]
FROM ( SELECT
[Extent1].[EmployeeID] AS [EmployeeID],
[Extent1].[PersonelCode] AS [PersonelCode],
[Join1].[PersonID] AS [PersonID],
[Join1].[Fname] AS [Fname],
[Join1].[EmployeeID] AS [EmployeeID1],
[Join3].[PersonID] AS [PersonID1],
[Join3].[Family] AS [Family],
[Join3].[EmployeeID] AS [EmployeeID2],
[Join5].[EmailID1] AS [EmailID],
[Join5].[Mail] AS [Mail],
CASE WHEN ([Join5].[EmailID2] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Employees] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[PersonID] AS [PersonID], [Extent2].[Fname] AS [Fname], [Extent3].[EmployeeID] AS [EmployeeID]
FROM [dbo].[Persons] AS [Extent2]
LEFT OUTER JOIN [dbo].[Employees] AS [Extent3] ON [Extent2].[PersonID] = [Extent3].[EmployeeID] ) AS [Join1] ON [Extent1].[EmployeeID] = [Join1].[PersonID]
LEFT OUTER JOIN (SELECT [Extent4].[PersonID] AS [PersonID], [Extent4].[Family] AS [Family], [Extent5].[EmployeeID] AS [EmployeeID]
FROM [dbo].[Persons] AS [Extent4]
LEFT OUTER JOIN [dbo].[Employees] AS [Extent5] ON [Extent4].[PersonID] = [Extent5].[EmployeeID] ) AS [Join3] ON [Extent1].[EmployeeID] = [Join3].[PersonID]
LEFT OUTER JOIN (SELECT [Extent6].[EmailID] AS [EmailID2], [Extent6].[PersonID] AS [PersonID], [Extent7].[EmailID] AS [EmailID1], [Extent7].[Mail] AS [Mail]
FROM [dbo].[EmailsForPersons] AS [Extent6]
INNER JOIN [dbo].[Emails] AS [Extent7] ON [Extent6].[EmailID] = [Extent7].[EmailID] ) AS [Join5] ON [Join5].[PersonID] = [Extent1].[EmployeeID]
) AS [Project1]
ORDER BY [Project1].[EmployeeID] ASC, [Project1].[EmployeeID1] ASC, [Project1].[PersonID] ASC, [Project1].[EmployeeID2] ASC, [Project1].[PersonID1] ASC, [Project1].[C1] ASC
but by this code:
SELECT Employees.PersonelCode, Persons.Fname, Persons.Family, Emails.Mail
FROM Employees
LEFT OUTER JOIN -- or: INNER JOIN
Persons ON Employees.EmployeeID = Persons.PersonID
LEFT OUTER JOIN
EmailsForPersons ON Persons.PersonID = EmailsForPersons.PersonID
LEFT OUTER JOIN
Emails ON EmailsForPersons.EmailID = Emails.EmailID
I'll give the same result! What is difference between these codes? Which one have a higher performance and higher speed?
You can analysed and sampled the two queries to see which performs better?
See also How to clean & optimise code generated by WCF OData service?
The SQL generated by EF is very generic and needs to work in a variety of situations. For whatever reason, it is very verbose. It often has a SELECT [Col1] FROM (SELECT [Col1] ...) nested structure, and lots of CAST statements for comparisons.
Whether this is done to ensure maximum compatibility and minimum chance of someone's tricky query not being able to be translated, or whether it's done because the code that generates the SQL is much clearer and simpler, we can only guess. It's a design decision made within the Entity Framework team.
Frankly I wouldn't worry about this at all unless you test the two queries side-by-side for performance using query analyser. I would expect very minimal difference between the two.
If the performance is worse for the generated query then the simplest pattern is to write the logic inside a stored procedure and have EF call the stored procedure. This takes all the control away from EF and puts it in your hands.

LEFT OUTER JOIN in Linq - How to Force

I have a LEFT OUTER OUTER join in LINQ that is combining with the outer join condition and not providing the desired results. It is basically limiting my LEFT side result with this combination. Here is the LINQ and resulting SQL. What I'd like is for "AND ([t2].[EligEnd] = #p0" in the LINQ query to not bew part of the join condition but rather a subquery to filter results BEFORE the join.
Thanks in advance (samples pulled from LINQPad) -
Doug
(from l in Users
join mr in (from mri in vwMETRemotes where met.EligEnd == Convert.ToDateTime("2009-10-31") select mri) on l.Mahcpid equals mr.Mahcpid into lo
from g in lo.DefaultIfEmpty()
orderby l.LastName, l.FirstName
where l.LastName.StartsWith("smith") && l.DeletedDate == null
select g)
Here is the resulting SQL
-- Region Parameters
DECLARE #p0 DateTime = '2009-10-31 00:00:00.000'
DECLARE #p1 NVarChar(6) = 'smith%'
-- EndRegion
SELECT [t2].[test], [t2].[MAHCPID] AS [Mahcpid], [t2].[FirstName], [t2].[LastName], [t2].[Gender], [t2].[Address1], [t2].[Address2], [t2].[City], [t2].[State] AS [State], [t2].[ZipCode], [t2].[Email], [t2].[EligStart], [t2].[EligEnd], [t2].[Dependent], [t2].[DateOfBirth], [t2].[ID], [t2].[MiddleInit], [t2].[Age], [t2].[SSN] AS [Ssn], [t2].[County], [t2].[HomePhone], [t2].[EmpGroupID], [t2].[PopulationIdentifier]
FROM [dbo].[User] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[MAHCPID], [t1].[FirstName], [t1].[LastName], [t1].[Gender], [t1].[Address1], [t1].[Address2], [t1].[City], [t1].[State], [t1].[ZipCode], [t1].[Email], [t1].[EligStart], [t1].[EligEnd], [t1].[Dependent], [t1].[DateOfBirth], [t1].[ID], [t1].[MiddleInit], [t1].[Age], [t1].[SSN], [t1].[County], [t1].[HomePhone], [t1].[EmpGroupID], [t1].[PopulationIdentifier]
FROM [dbo].[vwMETRemote] AS [t1]
) AS [t2] ON ([t0].[MAHCPID] = [t2].[MAHCPID]) AND ([t2].[EligEnd] = #p0)
WHERE ([t0].[LastName] LIKE #p1) AND ([t0].[DeletedDate] IS NULL)
ORDER BY [t0].[LastName], [t0].[FirstName]
I'm not sure if it will change the result set with "AND ([t2].[EligEnd] = #p0" as part of the subquery rather than the join condition. One thing I like to do with complex queries might help you here. I like to break them into smaller queries before combining them. The deferred execution of LINQ lets us do multiple statements with one eventual call to the database. Something like this:
var elig = from mri in vwMETRemotes
where met.EligEnd == Convert.ToDateTime("2009-10-31")
select mri;
var users = from l in Users
where l.LastName.StartsWith("smith")
where l.DeletedDate == null
var result = from l in users
join mr in elig on l.Mahcpid equals mr.Mahcpid into lo
from g in lo.DefaultIfEmpty()
orderby l.LastName, l.FirstName
select g
Breaking it down like that can make it easier to debug, and perhaps it can tell LINQ better what you intend.
Code ended up looking like this. RecodePopulation and RecordRegistration are just methods to translate values from the query.
var elig = from mri in db.MetRemote
where mri.EligEnd == Convert.ToDateTime(ConfigurationManager.AppSettings["EligibilityDate"])
orderby mri.EligEnd
select mri;
var users = from l in db.Users
where l.LastName.StartsWith(filter)
where l.DeletedDate == null
select l;
var results = (from l in users
join m in elig on l.MahcpId equals m.MAHCPID into lo
from g in lo.DefaultIfEmpty()
orderby l.LastName, l.FirstName
select new UserManage()
{
Username = l.Username,
FirstName = l.FirstName,
LastName = l.LastName,
DateOfBirth = l.DOB,
Gender = l.Gender,
Status = RecodePopulation(g.Population, l.CreatedDate),
UserId = l.Id,
WellAwardsRegistered = RecodeRegistration(l.Id, 1)
}).Distinct().OrderBy(a => a.LastName).ThenBy(n => n.FirstName).Skip((currentPage - 1) * resultsPerPage).Take(resultsPerPage);