Get DISTINCT record using INNER JOIN - sql

I have the follwong Query on multi tables
SELECT DISTINCT b.BoxBarcode as [Box Barcode], (select case when b.ImagesCount IS NULL
then 0
else b.ImagesCount end) as [Total Images], s.StageName as [Current Stage] ,d.DocuementTypeName as [Document Type],
u.UserName as [Start User],uu.UserName as [Finished User]
FROM [dbo].[Operations] o
inner join dbo.LKUP_Stages s on
o.stageid=s.id
inner join dbo.LKUP_Users u on
u.id=o.startuserid
left join dbo.LKUP_Users uu on
uu.id=o.FinishedUserID
inner join boxes b on
b.id=o.boxid
inner join LKUP_DocumentTypes d on
d.ID = b.DocTypeID
where b.IsExportFinished = 0
when i select count from the Boxes table where IsExportFinished = 0
i got the Count 42 records, when i run the above qoury i got 71 records,
i want just the 42 records in Boxes table to retrived.

You are doing a one-to-many join, i.e. at least one of the tables have multiple rows that match the join criteria.
Step one is to find which table(s) that give the "duplicates".
Once you have done that you may be able to fix the problem by adding additional criteria to the join. I'm taking a guess that the same boxid occurs several times in the Operations table. If that is the case you need to decide which Operation row you want to select and then update the SQL accordingly.

Try this one -
SELECT
Box_Barcode = b.BoxBarcode
, Total_Images = ISNULL(b.ImagesCount, 0)
, Current_Stage = s.StageName
, Document_Type = d.DocuementTypeName
, Start_User = u.UserName
, Finished_User = uu.UserName
FROM (
SELECT DISTINCT
o.stageid
, o.boxid
, o.startuserid
, o.FinishedUserID
FROM dbo.[Operations]
) o
JOIN dbo.LKUP_Stages s ON o.stageid = s.id
JOIN dbo.boxes b ON b.id = o.boxid
JOIN dbo.LKUP_DocumentTypes d ON d.id = b.DocTypeID
JOIN dbo.LKUP_Users u ON u.id = o.startuserid
LEFT JOIN dbo.LKUP_Users uu ON uu.id = o.FinishedUserID
WHERE b.IsExportFinished = 0

I guess that if you change your LEFT JOIN into INNER JOIN you will get 42 records as requested.

Related

Select an ID where there is only one row and that row is a specific value

I have this query. There's a lot of joins because I am checking if an ID is linked to any of those tables.
Currently, this query shows me any ID's that are not linked to any of those tables. I would like to add to it so that it also shows any IDs that are linked to the d table, but only if there is only 1 row in the D table and the type in the D field is 'member'.
SELECT
c.ID,
c.location,
c.pb,
c.name,
c.surname
FROM c
LEFT JOIN l on c.rowno = l.rowno
LEFT JOIN d on c.rowno = d.rowno
LEFT JOIN t on c.rowno = t.rowno
LEFT JOIN cj ON (c.rowno = cj.rowno OR c.rowno = cj.rowno2)
LEFT JOIN dj ON c.rowno = d.rowno
LEFT JOIN lg ON c.rowno = lg.rowno
LEFT JOIN tj ON c.rowno = tj.rowno
WHERE
c.status != 'closed'
AND l.rowno IS NULL
AND d.rowno IS NULL
AND t.rowno IS NULL
AND cj.rowno IS NULL
AND dj.rowno IS NULL
AND lg.rowno IS NULL
AND tj.rowno IS NULL
My first thought is to just add
WHERE D.type = 'member'
But that gives me all IDs that have a row with D.type = member (they could have 10 rows with all different types, but as long as 1 of those has type = member it shows up). I want to see ID's that ONLY have d.type = member
I'm sorry if I'm wording this badly, I'm having trouble getting this straight in my head. Any help is appreciated!
I would use exists for all conditions except the one on the D table:
SELECT c.*
FROM c JOIN
(SELECT d.rownum, COUNT(*) as cnt,
SUM(CASE WHEN d.type = 'Member' THEN 1 ELSE 0 END) as num_members
FROM t
GROUP BY d.rownum
) d
ON c.rownum = d.rownum
WHERE c.status <> 'closed' AND
NOT EXISTS (SELECT 1 FROM t WHERE c.rowno = t.rowno) AND
NOT EXISTS (SELECT 1 FROM l WHERE c.rowno = l.rowno) AND
. . .
I find NOT EXISTS is easier to follow logically. I don't think there is a big performance difference between the two methods in SQL Server.

Join on TOP 1 from subquery while referencing outer tables

I am starting with this query, which works fine:
SELECT
C.ContactSys
, ... a bunch of other rows...
FROM Users U
INNER JOIN Contacts C ON U.ContactSys = C.ContactSys
LEFT JOIN UserWatchList UW ON U.UserSys = UW.UserSys
LEFT JOIN Accounts A ON C.AccountSys = A.AccountSys
WHERE
C.OrganizationSys = 1012
AND U.UserTypeSys = 2
AND C.FirstName = 'steve'
Now, I've been given this requirement:
For every visitor returned by the Visitor Search, take ContactSys, get the most recent entry in the GuestLog table for that contact, then return the columns ABC and XYZ from the GuestLog table.
I'm having trouble with that. I need something like this (I think)...
SELECT
C.ContactSys
, GL.ABC
, GL.XYZ
, ... a bunch of other rows...
FROM Users U
INNER JOIN Contacts C ON U.ContactSys = C.ContactSys
LEFT JOIN UserWatchList UW ON U.UserSys = UW.UserSys
LEFT JOIN Accounts A ON C.AccountSys = A.AccountSys
LEFT JOIN (SELECT TOP 1 * FROM GuestLog GU WHERE GU.ContactSys = ????? ORDER BY GuestLogSys DESC) GL ON GL.ContactSys = C.ContactSys
WHERE
C.OrganizationSys = 1012
AND U.UserTypeSys = 2
AND C.FirstName = 'steve'
Only that's not it because that subquery on the JOIN doesn't know anything about the outer tables.
I've been looking at these posts and their answers, but I'm having a hard time translating them to my needs:
SQL: Turn a subquery into a join: How to refer to outside table in nested join where clause?
Reference to outer query in subquery JOIN
Referencing outer query in subquery
Referencing outer query's tables in a subquery
If that is the logic you want, you can use OUTER APPLY:
SELECT C.ContactSys, GL.ABC, GL.XYZ,
... a bunch of other columns ...
FROM Users U JOIN
Contacts C
ON U.ContactSys = C.ContactSys LEFT JOIN
UserWatchList UW
ON U.UserSys = UW.UserSys LEFT JOIN
Accounts A
ON C.AccountSys = A.AccountSys OUTER APPLY
(SELECT TOP 1 gl.*
FROM GuestLog gl
WHERE gl.ContactSys = C.ContactSys
ORDER BY gl.GuestLogSys DESC
) GL
WHERE C.OrganizationSys = 1012 AND
U.UserTypeSys = 2 AND
C.FirstName = 'steve'

SQL: Linking a Count to a specific value through multiple tables

I'm trying to link a COUNT to a specific value across several tables in a SQL Server Database. In this case the tables only share values through correlation. I am returning the values I want but the COUNT is counting everything in a given project not just the ones linked to their work items.
SELECT
[d].[Id]
,COUNT([t].[ItemId]) AS ItemCount
,[d].[ItemName]
FROM
[dbo].[Project_Map] [rm] WITH (NOLOCK)
INNER JOIN
[dbo].[WorkProjects] [r] WITH (NOLOCK)
ON [r].[DomainId] = [rm].[DomainId]
AND [r].[ProjectId] = [rm].[ProjectId]
AND [r].[ReleaseId] = [rm].[ReleaseId]
INNER JOIN
[dbo].[Items] [d] WITH (NOLOCK)
ON [d].[DomainId] = [r].[DomainId]
AND [d].[ProjectId] = [r].[ProjectId]
AND [d].[ReleaseId] = [r].[ReleaseId]
INNER JOIN [dbo].[Projects] [p] with (NOLOCK)
ON r.DomainId = p.DomainId
AND r.ProjectId = p.ProjectId
INNER JOIN [dbo].[Tests] [t] with (NOLOCK)
ON p.DomainId = t.DomainId
AND p.ProjectId = t.ProjectId
INNER JOIN
(
SELECT [Id], MAX([LastModifiedDate]) AS MostRecent
FROM Items
Group By [Id]
) AS updatedItem
ON updatedItem.Id = d.Id
INNER JOIN
[dbo].[WorkItemStates] [ds] WITH (NOLOCK)
ON [ds].[ItemStateName] = [d].[ItemStatus]
WHERE
d.Id = 111111
AND d.UserCategory Like 'SOMESTRING'
GROUP BY d.Id, d.ItemName
RETURNS: In this case the count should be 1 but it returns the count for the entire project.
ID COUNT ITEMNAME
86 5169 SOME NAME
173 5169 SOME NAME
170 5169 SOME NAME
Am I missing a join somewhere?
Currently, your counts are counting all JOIN instances and not just distinct Item level records. Consider turning your Item unit level join into an aggregate query join and include the count field in outer grouping:
Specifically, change:
INNER JOIN [dbo].[Tests] [t] with (NOLOCK)
ON p.DomainId = t.DomainId
AND p.ProjectId = t.ProjectId
Into:
INNER JOIN
(SELECT t.DomaindId, t.ProjectId, Count(*) As ItemCount
FROM [dbo].[Tests] t
GROUP BY t.DomaindId, t.ProjectId) agg
ON p.DomainId = agg.DomainId
AND p.ProjectId = agg.ProjectId
And then the outer query structure becomes:
SELECT
[d].[Id]
,agg.ItemCount
,[d].[ItemName]
FROM
...
GROUP BY
[d].[Id]
,agg.ItemCount
,[d].[ItemName]
Interestingly, you already do such an aggregate query join but never use that derived table updateItem or the field MostRecent.

How to use group by only for some columns in sql Query?

The following query returns 550 records, which I am then grouping by some columns in the controller via linq. However, how can I achieve the "group by" logic in the SQL query itself? Additionally, post-grouping, I need to show only 150 results to the user.
Current SQL query:
SELECT DISTINCT
l.Id AS LoadId
, l.LoadTrackingNumber AS LoadDisplayId
, planningType.Text AS PlanningType
, loadStatus.Id AS StatusId
, loadWorkRequest.Id AS LoadRequestId
, loadStatus.Text AS Status
, routeIds.RouteIdentifier AS RouteName
, planRequest.Id AS PlanId
, originPartyRole.Id AS OriginId
, originParty.Id AS OriginPartyId
, originParty.LegalName AS Origin
, destinationPartyRole.Id AS DestinationId
, destinationParty.Id AS DestinationPartyId
, destinationParty.LegalName AS Destination
, COALESCE(firstSegmentLocation.Window_Start, originLocation.Window_Start) AS StartDate
, COALESCE(firstSegmentLocation.Window_Start, originLocation.Window_Start) AS BeginDate
, destLocation.Window_Finish AS EndDate
AS Number
FROM Domain.Loads (NOLOCK) AS l
INNER JOIN dbo.Lists (NOLOCK) AS loadStatus ON l.LoadStatusId = loadStatus.Id
INNER JOIN Domain.Routes (NOLOCK) AS routeIds ON routeIds.Id = l.RouteId
INNER JOIN Domain.BaseRequests (NOLOCK) AS loadWorkRequest ON loadWorkRequest.LoadId = l.Id
INNER JOIN Domain.BaseRequests (NOLOCK) AS planRequest ON planRequest.Id = loadWorkRequest.ParentWorkRequestId
INNER JOIN Domain.Schedules AS planSchedule ON planSchedule.Id = planRequest.ScheduleId
INNER JOIN Domain.Segments (NOLOCK) os on os.RouteId = routeIds.Id AND os.[Order] = 0
INNER JOIN Domain.LocationDetails (NOLOCK) AS originLocation ON originLocation.Id = os.DestinationId
INNER JOIN dbo.EntityRoles (NOLOCK) AS originPartyRole ON originPartyRole.Id = originLocation.DockRoleId
INNER JOIN dbo.Entities (NOLOCK) AS originParty ON originParty.Id = originPartyRole.PartyId
INNER JOIN Domain.LocationDetails (NOLOCK) AS destLocation ON destLocation.Id = routeIds.DestinationFacilityLocationId
INNER JOIN dbo.EntityRoles (NOLOCK) AS destinationPartyRole ON destinationPartyRole.Id = destLocation.DockRoleId
INNER JOIN dbo.Entities (NOLOCK) AS destinationParty ON destinationParty.Id = destinationPartyRole.PartyId
INNER JOIN dbo.TransportationModes (NOLOCK) lictm on lictm.Id = l.LoadInstanceCarrierModeId
INNER JOIN dbo.EntityRoles (NOLOCK) AS carrierPartyRole ON lictm.CarrierId = carrierPartyRole.Id
INNER JOIN dbo.Entities (NOLOCK) AS carrier ON carrierPartyRole.PartyId = carrier.Id
INNER JOIN dbo.EntityRoles (NOLOCK) AS respPartyRole ON l.ResponsiblePartyId = respPartyRole.Id
INNER JOIN dbo.Entities (NOLOCK) AS respParty ON respPartyRole.PartyId = respParty.Id
INNER JOIN Domain.LoadOrders (NOLOCK) lo ON lo.LoadInstanceId = l.Id
INNER JOIN Domain.Orders (NOLOCK) AS o ON lo.OrderInstanceId = o.Id
INNER JOIN Domain.BaseRequests (NOLOCK) AS loadRequest ON loadRequest.LoadId = l.Id
--Load Start Date
LEFT JOIN Domain.Segments (NOLOCK) AS segment ON segment.RouteId = l.RouteId AND segment.[Order] = 0
LEFT JOIN Domain.LocationDetails (NOLOCK) AS firstSegmentLocation ON firstSegmentLocation.Id = segment.DestinationId
LEFT JOIN dbo.Lists (NOLOCK) AS planningType ON l.PlanningTypeId = planningType.Id
LEFT JOIN dbo.EntityRoles (NOLOCK) AS billToRole ON o.BillToId = billToRole.Id
LEFT JOIN dbo.Entities (NOLOCK) AS billTo ON billToRole.PartyId = billTo.Id
WHERE o.CustomerId in (34236) AND originLocation.Window_Start >= '07/19/2015 00:00:00' AND originLocation.Window_Start < '07/25/2015 23:59:59' AND l.IsHistoricalLoad = 0
AND loadStatus.Id in (285, 286,289,611,290)
AND loadWorkRequest.ParentWorkRequestId IS NOT NULL
AND routeIds.RouteIdentifier IS NOT NULL
AND (planSchedule.EndDate IS NULL OR (planSchedule.EndDate is not null and CAST(CONVERT(varchar(10), planSchedule.EndDate,101) as datetime) > CAST(CONVERT(varchar(10),GETDATE(),101) as datetime))) ORDER BY l.Id DESC
linq:
//Get custom grouped data
var loadRequest = (from lq in returnList
let loadDisplayId = lq.LoadDisplayId
let origin = lq.OriginId //get this origin for route
let destination = lq.DestinationId // get this destination for route
group lq by new
{
RouteId = lq.RouteName,
PlanId = lq.PlanId,
Origin = lq.OriginId,
Destination = lq.DestinationId
}
into grp
select new
{
RouteId = grp.Key.RouteId,
PlanId = grp.Key.PlanId,
Origin = grp.Key.Origin,
Destination = grp.Key.Destination,
Loads = (from l in grp select l)
}).OrderBy(x => x.Origin).ToList();
I'm guessing you want to Group By column 1 but include columns 2 and 3 in your Select. Using a Group By you cannot do this. However, you can do this using a T-SQL Windowing function using the OVER() operator. Since you don't say how you want to aggregate, I cannot provide an example. But look at T-SQL Windowing functions. This article might help you get started.
One important thing you need to understand about GROUP BY is that you must assume that there are multiple values in every column outside of the GROUP BY list. In your case, you must assume that for each value of Column1 there would be multiple values of Column2 and Column3, all considered as a single group.
If you want your query to process any of these columns, you must specify what to do about these multiple values.
Here are some choices you have:
Pick the smallest or the largest value for a column in a group - use MIN(...) or MAX(...) aggregator for that
Count non-NULL items in a group - use COUNT(...)
Produce an average of non-NULL values in a group - use AVG(...)
For example, if you would like to find the smallest Column2 and an average of Column3 for each value of Column1, your query would look like this:
select
Column1, MIN(Column2), AVG(Column3)
from
TableName
group by
Column1

Join questionn - Select must return only one record

I have 4 tables in my SQL Server 2008 database :
CONTACT
CONTACT_DETAILS
PLANS
PLANS_DETAILS
Every record is recorded in CONTACT and CONTACT_DATAILS but a CONTACT can have 0, 1, 2 or more records in PLANS, Active or Cancelled.
So I did this:
SELECT *
from CONTACT as c
left join PLANS as pp on pp.PKEY = c.PKEY
left join PLANS_DETAILS as pd on pd.PDKEY = p.PDKEY
inner join CONTACT_DETAILS as cd on cd.DKEY = c.DKEY
WHERE c.KEY = '267110' and PP.STATUS = 'Active'
"267110" have 1 active PLAN so it shows me 1 line, everything I need.
But if I put
WHERE c.KEY = '100003' and PP.STATUS = 'Active'
"100003" have 2 cancelled plans, so the result is empty. If I remove PP.STATUS = 'Active' , it returns me 2 identical results, but I need just one.
In resume: I need a select that returns me 1 row only. If there is an active plan, return the columns, if not, return the columns null. If someone have 1 cancelled and 1 active plan, return me only the active plan columns.
The answer to your question is to move the condition on pp to the on clause.
SELECT *
from CONTACT c inner join
CONTACT_DETAILS cd
on cd.DKEY = c.DKEY left join
PLANS pp
on pp.PKEY = c.PKEY AND PP.STATUS = 'Active' left join
PLANS_DETAILS pd
on pd.PDKEY = p.PDKEY
WHERE c.KEY = '267110' ;
In addition, when you have a series of inner and left joins, I recommend putting all the inner joins first, followed by the outer joins. That makes it clear which joins are used for keeping records and which for filtering.
Just add an ORDER BY PP.STATUS DESC and a TOP 1 clause, and delete the and PP.STATUS = 'Active', like this
SELECT TOP 1 * from CONTACT as c
left join PLANS as pp on pp.PKEY = c.PKEY
left join PLANS_DETAILS as pd on pd.PDKEY = p.PDKEY
inner join CONTACT_DETAILS as cd on cd.DKEY = c.DKEY
WHERE c.KEY = '100003'
ORDER BY PP.STATUS DESC
SELECT *
from CONTACT c
left join
(
select * from
(select
[whatever you need from this table]
,row_number over(partition by [keys in this table] order by status asc) rnk
from PLANS)
where rnk = 1
) pp
on pp.PKEY = c.PKEY
left join PLANS_DETAILS pd
on pd.PDKEY = p.PDKEY
inner join CONTACT_DETAILS cd
on cd.DKEY = c.DKEY
WHERE c.KEY = '100003'