Counting records in a SQL subquery - sql

I'm having difficult with a subquery. In plain English I'm trying to pick a random userID from the QCUsers table that has less than 20 records from the QCTier1_Assignments table. The problem is that my query below is only picking users where it meets the criteria of the inner query when I need it to pick any user from QCUsers table even if the user does not have any records at all in the QCTier1_Assignments table. I need something like this
AND (Sub.QCCount < 20 OR Sub.QCCount = 0 )
DECLARE #ReviewPeriodMonth varchar(10) = '10'
DECLARE #ReviewPeriodYear varchar(10) = '2015'
SELECT TOP 1
E1.UserID
,Sub.QCCount --Drawn from the subquery
FROM QCUsers E1
JOIN (SELECT
QCA.UserID,
COUNT(*) AS QCCount
FROM QCTier1_Assignments QCA
WHERE QCA.ReviewPeriodMonth = #ReviewPeriodMonth
AND QCA.ReviewPeriodYear = #ReviewPeriodYear
GROUP BY QCA.UserID
) Sub
ON E1.UserID = Sub.UserID
WHERE Active = 1
AND Grade = 12
AND Sub.QCCount < 20
ORDER BY NEWID()
I also tried it this way with no luck
DECLARE #ReviewPeriodMonth varchar(10) = '10'
DECLARE #ReviewPeriodYear varchar(10) = '2015'
SELECT TOP 1
E1.UserID
,Sub.QCCount --Drawn from the subquery
FROM QCUsers E1
RIGHT JOIN (SELECT
QCA.UserID,
ReviewPeriodMonth,
ReviewPeriodYear,
COUNT(*) AS QCCount
FROM QCTier1_Assignments QCA
GROUP BY
QCA.UserID,
ReviewPeriodMonth,
ReviewPeriodYear
) Sub
ON E1.UserID = Sub.UserID
WHERE Active = 1
AND Grade = 12
AND Sub.QCCount < 20
AND Sub.ReviewPeriodMonth = #ReviewPeriodMonth
AND Sub.ReviewPeriodYear = #ReviewPeriodYear
ORDER BY NEWID()

Try using your second query but change the WHERE clause to use COALESCE(Sub.QCCount, 0) instead of justSub.QCCount`
If the subquery returns no rows then with your RIGHT JOIN you'll at least still get the row, but the QCCount will be NULL which when compared to anything will result in a "false" effectively.
Also, you should look into the HAVING clause. It might allow you to do this without a subquery at all.
Here's an example with the HAVING clause. If it doesn't give the correct results please let me know as I'm not able to test this.
DECLARE
#ReviewPeriodMonth VARCHAR(10) = '10'
#ReviewPeriodYear VARCHAR(10) = '2015'
SELECT TOP 1
E1.UserID,
COUNT(QCA.UserID) AS QCCount
FROM
QCUsers E1
LEFT OUTER JOIN QCTier1_Assignments QCA ON
QCA.UserID = E1.UserID AND
QCA.ReviewPeriodMonth = #ReviewPeriodMonth AND
QCA.ReviewPeriodYear = #ReviewPeriodYear
WHERE
E1.Active = 1 AND
Grade = 12 AND
HAVING
COUNT(*) < 20
ORDER BY
NEWID()

You should use LEFT JOIN instead of JOIN(INNER JOIN), And you'd better to put the predicate to the outer query based on your practice, but I recommend the following way:
SELECT TOP1 ABC.UserID,ABC.QCCount
FROM
(
SELECT E1.UserID, COUNT(*) as QCCount
FROM QCUsers as E1
LEFT JOIN QCTier1_Assignments as QCA
ON QCA.UserID = E1.UserID
WHERE QCA.ReviewPeriodMonth = #ReviewPeriodMonth
AND QCA.ReviewPeriodYear = #ReviewPeriodYear
AND Active = 1
AND Grade = 12
GROUP BY E1.UserID
) as ABC
WHERE ABC.QCCount <20
ORDER BY NEWID()

I was able to work it out through a combination of responses here
DECLARE #ReviewPeriodMonth varchar(10) = '10'
DECLARE #ReviewPeriodYear varchar(10) = '2015'
SELECT TOP 1
QCUsers.UserID,
COUNT(QCTier1_Assignments.ReviewID) AS ReviewCount
FROM
QCTier1_Assignments RIGHT OUTER JOIN
QCUsers ON QCTier1_Assignments.UserID = QCUsers.UserID
WHERE
QCUsers.Active = 1
AND QCUsers.Grade = '12'
AND (ReviewPeriodMonth = #ReviewPeriodMonth OR ReviewPeriodMonth IS NULL)
AND (ReviewPeriodYear = #ReviewPeriodYear OR ReviewPeriodYear IS NULL)
GROUP BY
QCUsers.UserID
HAVING
(COALESCE(COUNT(QCTier1_Assignments.ReviewID),0) < 4)
ORDER BY NEWID()

Related

How do you properly query the result of a complex join statement in SQL?

New to advanced SQL!
I'm trying to write a query that returns the COUNT(*) and SUM of the resulting columns from this query:
DECLARE #Id INT = 1000;
SELECT
*,
CASE
WHEN Id1 >= 6 THEN 1
ELSE 0
END AS Tier1,
CASE
WHEN Id1 >= 4 THEN 1
ELSE 0
END AS Tier2,
CASE
WHEN Id1 >= 2 THEN 1
ELSE 0
END AS Tier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees
I've tried to do so by moving the #Id outside the original query, and adding a SELECT(*), SUM, and SUM to the top, like so:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1), SUM(employees.Tier2), SUM(employees.Tier3)
FROM
(SELECT *,
...
) AS employees
);
When I run the query, however, I'm getting the errors:
The multi-part identifier employees.Tier1 could not be bound
The same errors appear for the other identifiers in my SUM statements.
I'm assuming this has to do with the fact that the Tier1, Tier2, and Tier3 columns are being returned by the inner join query in my FROM(), and aren't values set by the existing tables that I'm querying. But I can't figure out how to rewrite it to initialize properly.
Thanks in advance for the help!
This is a scope problem: employees is defined in the subquery only, it is not available in the outer scope. You basically want to alias the outer query:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1) TotalTier1, SUM(employees.Tier2) TotalTier2, SUM(employees.Tier3) TotalTier3
FROM (
SELECT *,
...
) AS employees
) AS employees;
--^ here
Note that I added column aliases to the outer query, which is a good practice in SQL.
It might be easier to understand what is going on if you use another alias for the outer query:
SELECT COUNT(*), SUM(e.Tier1), SUM(e.Tier2), SUM(e.Tier3)
FROM (
SELECT *,
...
) AS employees
) AS e;
Note that you don't actually need to qualify the column names in the outer query, since column names are unambigous anyway.
And finally: you don't actually need a subquery. You could write the query as:
SELECT
SUM(CASE WHEN Id1 >= 6 THEN 1 ELSE 0 END) AS TotalTier1,
SUM(CASE WHEN Id1 >= 4 THEN 1 ELSE 0 END) AS TotalTier2,
SUM(CASE WHEN Id1 >= 2 THEN 1 ELSE 0 END) AS TotalTier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees

How to ignore duplicate records in CTE Select statement?

I am trying to ignore duplicate records in CTE but I am not able to do that, It seems like a SELECT statement inside CTE does not allow to use ROWNUM() variable numrows to condition in WHERE clause as it is showing Invalid column name 'numrows' error while trying to do so.
SQL Query:
DECLARE #BatchID uniqueidentifier = NEWID();
DECLARE #ClusterID SMALLINT = 1;
DECLARE #BatchSize integer = 20000;
DECLARE #myTableVariable TABLE(EventID BIGINT,HotelID int, BatchStatus varchar(50),BatchID uniqueidentifier);
WITH PendingExtResSvcEventsData_Batch
AS(
SELECT TOP (#BatchSize) t.EventID, t.HotelID, t.BatchStatus, t.BatchID, ROW_NUMBER() OVER (PARTITION BY t.EventID ORDER BY t.EventID) numrows
FROM ExtResSvcPendingMsg t WITH (NOLOCK)
WHERE t.ClusterID = #ClusterID AND numrows = 1 AND NOT EXISTS -- not allowed to use WHERE numrows = 1 here showing *Invalid Column Name*
(select 1 from ExtResSvcPendingMsg t2 where t2.BatchStatus = 'Batched'
and t2.EventID = t.EventID and t2.HotelID = t.HotelID)
)
UPDATE PendingExtResSvcEventsData_Batch
SET BatchStatus='Batched',
BatchID = #BatchID
-- WHERE numrows = 1 (not allowed to use WHERE here because of OUTPUT Clause)
OUTPUT INSERTED.* INTO #myTableVariable
SELECT e.ExtResSvcEventID,e.HotelID,e.ID1,e.ID2,e.ExtResSvcEventType,e.HostID,e.StatusCode,e.ChannelID,e.RequestAtTime,e.ProcessTime,e.DateBegin,e.DateEnd,
e.StatusMsg,em.MsgBodyOut,em.MsgBodyIn,e.ChannelResID
FROM ExtResSvcEvent e WITH (NOLOCK)
INNER JOIN #myTableVariable t ON e.ExtResSvcEventID = t.EventID
INNER JOIN ExtResSvcEventXML em with (nolock) on t.EventID = em.ExtResSvcEventID
ORDER BY e.ExtResSvcEventID
I have also tried to use numrows in final SELECT like INNER JOIN #myTableVariable t ON e.ExtResSvcEventID = t.EventID AND t.numrows = 1 but this gives me a error i.e. The column reference "inserted.numrows" is not allowed because it refers to a base table that is not being modified in this statement.
How do I ignore the duplicate records while using SELECT in CTE?
You can't refer to the numrows column in the WHERE clause of the CTE because that column is not calculated at this point in the plan execution. You need to add a second CTE with a select statement where you can refer to the numrows column:
WITH Base AS (
SELECT TOP (#BatchSize) t.EventID, t.HotelID, t.BatchStatus, t.BatchID, ROW_NUMBER() OVER (PARTITION BY t.EventID ORDER BY t.EventID) numrows
FROM ExtResSvcPendingMsg t WITH (NOLOCK)
WHERE t.ClusterID = #ClusterID
AND NOT EXISTS (select 1 from ExtResSvcPendingMsg t2 where t2.BatchStatus = 'Batched' and t2.EventID = t.EventID and t2.HotelID = t.HotelID)
), PendingExtResSvcEventsData_Batch AS (
SELECT EventID,
HotelID,
BatchStatus,
BatchID
WHERE numrows = 1
)
UPDATE...
I can't vouch for the update statement working as you expect it but the PendingExtResSvcEventsData_Batch should now have one row per EventID.

SQL Server - UNION ALL

I'm new to SQL development and I need to do UNION on two select statements. Below is a sample query. The Join tables & conditions, where criteria, columns names and everything is the same in both the select statements except the the primary tables after the FROM clause. I just wanted to know if there is a way to have a single static select query, instead of repeating the same query twice for the UNION (without going for a dynamic query).
SELECT Sum(ABC.Intakes) As TotalIntakes, Sum(ABC.ClientTarget) as TotalClientTarget
FROM(
SELECT Sum(tt.IntakesReceived) As Intakes, Sum(tt.ClientTarget) As ClientTarget,
tt.ProgramId
FROM
(SELECT Count(DISTINCT ClientID) As IntakesReceived,
DATEDIFF(MONTH, L.AwardStartDate, L.AwardEndDate)*L.MonthlyClientTarget As ClientTarget,
L.AwardId, L.ProgramId
FROM IntakeCoverageLegacy As L
LEFT JOIN UserRoleEntity URE ON URE.EntityId = L.AwardId
LEFT JOIN CDPUserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (#Program IS NULL OR L.ProgramId IN (SELECT ProgramID FROM #ProgramIDList)
AND (ufn_IsInternalUser(#UserId) = 1
OR (ufn_IsInternalUser(#UserId) = 0 AND UR.CDPUserId = #UserId ))
GROUP BY L.AwardId, L.ProgramId) As tt
GROUP BY tt.ProgramId, tt.ProgramName
UNION ALL
SELECT Sum(tt.IntakesReceived) As Intakes, Sum(tt.ClientTarget) As ClientTarget,
tt.ProgramId
FROM
(SELECT Count(DISTINCT C.ClientID) As IntakesReceived,
DATEDIFF(MONTH, C.AwardStartDate, C.AwardEndDate)*L.MonthlyClientTarget As ClientTarget,
C.AwardId, C.ProgramId
FROM IntakeCoverageCDP As C
LEFT JOIN UserRoleEntity URE ON URE.EntityId = L.AwardId
LEFT JOIN CDPUserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (#Program IS NULL OR C.ProgramId IN (SELECT ProgramID FROM #ProgramIDList)
AND (ufn_IsInternalUser(#UserId) = 1
OR (ufn_IsInternalUser(#UserId) = 0 AND UR.CDPUserId = #UserId ))
GROUP BY C.AwardId, C.ProgramId) As tt
GROUP BY tt.ProgramId, tt.ProgramName
) As ABC
GROUP BY ABC.ProgramId
OK... What I posted earlier was a sample query and I've updated the sample to my actual query to make it more clear. It's just the primary tables that are different. My requirement is that - after doing UNION ALL, I need to sum the aggregate columns in the final result, grouping by ProgramId.
I would probably first use UNION for the Client and LegacyClient tables as a derived table and then perform the JOINs:
SELECT C.AwardId,
C.ProgramName,
COUNT(ClientId) AS Intakes
FROM ( SELECT AwardId,
ProgramName,
Id
FROM Client
WHERE Id = #ClientId
UNION
SELECT AwardId,
ProgramName,
Id
FROM LegacyClient
WHERE Id = #ClientId) C
LEFT JOIN UserRoleEntity URE
ON C.AwardId = URE.EntityId
LEFT JOIN UserRole UR
ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE (testFunction(#UserId) = 0
OR (testFunction(#UserId) <> 0 AND UR.CDPUserId = #UserId))
GROUP BY C.AwardId,
C.ProgramName;
SELECT C.AwardId, C.ProgramName, Count(ClientId) as Intakes
FROM
(
SELECT Id, AwardId, ProgramName, ClientId FROM Client UNION ALL
SELECT Id, AwardId, ProgramName, ClientId FROM LegacyClient
) C
LEFT OUTER JOIN UserRoleEntity URE ON C.AwardId = URE.EntityId
LEFT OUTER JOIN UserRole UR ON URE.UserRoleId = UR.Id AND UR.CDPUserId = #UserId
WHERE
C.Id = #ClientId
AND (testFunction(#UserId) = 0 OR UR.CDPUserId = #UserId)
GROUP BY C.AwardId, C.ProgramName
Using testFunction() twice isn't really necessary (unless null is one of the outputs.)
You might also prefer to filter on ClientId outside of the union. I'm guess your purpose in rewriting it to avoid the duplicated logic. You might still want to see which one is better handled by the optimizer.
Also, I used a UNION ALL. I'm thinking you imagine only one result from one of the two tables. As you originally wrote it that count column is going to factor into the union.
Counting on ClientId seems odd. So does having a parameter named #ClientId that doesn't seem to match up with the ClientId column.

How to select minimum non duplicated value in a column?

Can you help me with SQL statements to find minimum non duplicated value?
This is my sql statement
DECLARE #currentDate DATETIME = CONVERT(VARCHAR(10), Getdate(), 120)
UPDATE Dinfo
SET WinnerID = result.CustomerID
FROM Daily_Info Dinfo
JOIN (SELECT CO.DailyInfoID,
CO.CustomerID
FROM Customer_Offer CO
WHERE CO.OfferDate = #currentDate
GROUP BY CO.DailyInfoID,
CO.CustomerID
HAVING ( Count(CO.OfferPrice) = 1 )) result
ON Dinfo.DailyID = result.DailyInfoID
and i want to update my winner who offered minimum unique offer. How can i select it?
If you want to find data, then I would expect a select. I think the following query might do what you want:
select min(offerprice)
from (select co.*, count(*) over (partition by co.offerprice) as cnt
from Customer_Offer co
where CO.OfferDate = #currentDate
) co
where cnt = 1;
If you want to update information based on this, then use join:
update dinfo
set winnerId = c.CustomerId
from dinfo cross join
(select top 1 co.*
from (select co.*, count(*) over (partition by co.offerprice) as cnt
from Customer_Offer co
where CO.OfferDate = #currentDate
) co
where cnt = 1
order by offerprice
) c
This follows the structure of your query, but it is going to update all rows in dinfo. You might want some other conditions to so only one row is updated.

Querying for Consecutive Rows with Certain Characteristics

I've got a table with the following columns:
id int(10)
user int(10)
winner int(10)
profit double
created datetime
The winner column can be either 0 or 1. I'd like to create a query that returns the maximum number of consecutive winners as ordered by the created datetime column along with the first and last created date as well as the sum of the profit column from that period of consecutive winners.
Here's a possible solution that looks at winning streaks per userid.
select head.userid, head.id, sum(profit), count(*)
from #bingo b
inner join (
select cur.userid, cur.id
from #bingo cur
left join #bingo prev
on cur.userid = prev.userid
and prev.id < cur.id
and not exists(
select *
from #bingo inbetween
where prev.userid = inbetween.userid
and prev.id < inbetween.id
and inbetween.id < cur.id)
where cur.winner = 1
and IsNull(prev.winner,0) = 0
) head
on head.userid = b.userid
and head.id <= b.id
left join (
select cur.userid, cur.id
from #bingo cur
left join #bingo prev
on cur.userid = prev.userid
and prev.id < cur.id
and not exists(
select *
from #bingo inbetween
where prev.userid = inbetween.userid
and prev.id < inbetween.id
and inbetween.id < cur.id)
where cur.winner = 1
and IsNull(prev.winner,0) = 0
) nexthead
on nexthead.userid = b.userid
and head.id < nexthead.id
and nexthead.id <= b.id
where nexthead.id is null
and b.winner = 1
group by head.userid, head.id
The two "heads" subqueries are identical, you could put them in a view or a WITH where those are supported. The "heads" subquery searches for each head of a winning streak; that is, the first win or a win that's preceeded by a loss. I'm assuming your id's increase over time, so I'm not using the Created column.
The query below that searches the corresponding head for every row. A head's id must be smaller or equal to the current row's id, and there must be no other head in between.
After that it's a simple matter of grouping on the head, and summing the profits and counting the rows.
I haven't tested it but maybe this will work.
select first_winner.created, last_winner.created, sum(mid_winner.profit)
from T first_winner
join T last_winner
on first_winner.created <= last_winner.created
and first_winner.winner = 1
and last_winner.winner = 1
and not exists -- no losers in between first_winner and last_winner
(
select * from T loser
where loser.winner = 0
and first_winner.created <= loser.created
and loser.created <= last_winner.created
)
join T mid_winner
on first_winner.created <= mid_winner.created
and mid_winner.created <= last_winner.created
and mid_winner.winner = 1
left join T bef_first_winner -- winner before first winner with no losers in between
on bef_first_winner.winner = 1
and bef_first_winner.created < first_winner.created
and not exists
(
select * from T b_loser
where b_loser.winner = 0
and bef_first_winner.created <= b_loser.created
and b_loser.created <= first_winner.created
)
left join T after_last_winner -- winner after last winner with no losers in between
on after_last_winner.winner = 1
and last_winner.created < after_last_winner.created
and not exists
(
select * from T a_loser
where a_loser.winner = 0
and last_winner.created <= a_loser.created
and a_loser.created <= after_last_winner.created
)
where bef_first_winner.id is null
and after_last_winner.id is null
group by first_winner.created, last_winner.created