Querying for Consecutive Rows with Certain Characteristics - sql

I've got a table with the following columns:
id int(10)
user int(10)
winner int(10)
profit double
created datetime
The winner column can be either 0 or 1. I'd like to create a query that returns the maximum number of consecutive winners as ordered by the created datetime column along with the first and last created date as well as the sum of the profit column from that period of consecutive winners.

Here's a possible solution that looks at winning streaks per userid.
select head.userid, head.id, sum(profit), count(*)
from #bingo b
inner join (
select cur.userid, cur.id
from #bingo cur
left join #bingo prev
on cur.userid = prev.userid
and prev.id < cur.id
and not exists(
select *
from #bingo inbetween
where prev.userid = inbetween.userid
and prev.id < inbetween.id
and inbetween.id < cur.id)
where cur.winner = 1
and IsNull(prev.winner,0) = 0
) head
on head.userid = b.userid
and head.id <= b.id
left join (
select cur.userid, cur.id
from #bingo cur
left join #bingo prev
on cur.userid = prev.userid
and prev.id < cur.id
and not exists(
select *
from #bingo inbetween
where prev.userid = inbetween.userid
and prev.id < inbetween.id
and inbetween.id < cur.id)
where cur.winner = 1
and IsNull(prev.winner,0) = 0
) nexthead
on nexthead.userid = b.userid
and head.id < nexthead.id
and nexthead.id <= b.id
where nexthead.id is null
and b.winner = 1
group by head.userid, head.id
The two "heads" subqueries are identical, you could put them in a view or a WITH where those are supported. The "heads" subquery searches for each head of a winning streak; that is, the first win or a win that's preceeded by a loss. I'm assuming your id's increase over time, so I'm not using the Created column.
The query below that searches the corresponding head for every row. A head's id must be smaller or equal to the current row's id, and there must be no other head in between.
After that it's a simple matter of grouping on the head, and summing the profits and counting the rows.

I haven't tested it but maybe this will work.
select first_winner.created, last_winner.created, sum(mid_winner.profit)
from T first_winner
join T last_winner
on first_winner.created <= last_winner.created
and first_winner.winner = 1
and last_winner.winner = 1
and not exists -- no losers in between first_winner and last_winner
(
select * from T loser
where loser.winner = 0
and first_winner.created <= loser.created
and loser.created <= last_winner.created
)
join T mid_winner
on first_winner.created <= mid_winner.created
and mid_winner.created <= last_winner.created
and mid_winner.winner = 1
left join T bef_first_winner -- winner before first winner with no losers in between
on bef_first_winner.winner = 1
and bef_first_winner.created < first_winner.created
and not exists
(
select * from T b_loser
where b_loser.winner = 0
and bef_first_winner.created <= b_loser.created
and b_loser.created <= first_winner.created
)
left join T after_last_winner -- winner after last winner with no losers in between
on after_last_winner.winner = 1
and last_winner.created < after_last_winner.created
and not exists
(
select * from T a_loser
where a_loser.winner = 0
and last_winner.created <= a_loser.created
and a_loser.created <= after_last_winner.created
)
where bef_first_winner.id is null
and after_last_winner.id is null
group by first_winner.created, last_winner.created

Related

How do you properly query the result of a complex join statement in SQL?

New to advanced SQL!
I'm trying to write a query that returns the COUNT(*) and SUM of the resulting columns from this query:
DECLARE #Id INT = 1000;
SELECT
*,
CASE
WHEN Id1 >= 6 THEN 1
ELSE 0
END AS Tier1,
CASE
WHEN Id1 >= 4 THEN 1
ELSE 0
END AS Tier2,
CASE
WHEN Id1 >= 2 THEN 1
ELSE 0
END AS Tier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees
I've tried to do so by moving the #Id outside the original query, and adding a SELECT(*), SUM, and SUM to the top, like so:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1), SUM(employees.Tier2), SUM(employees.Tier3)
FROM
(SELECT *,
...
) AS employees
);
When I run the query, however, I'm getting the errors:
The multi-part identifier employees.Tier1 could not be bound
The same errors appear for the other identifiers in my SUM statements.
I'm assuming this has to do with the fact that the Tier1, Tier2, and Tier3 columns are being returned by the inner join query in my FROM(), and aren't values set by the existing tables that I'm querying. But I can't figure out how to rewrite it to initialize properly.
Thanks in advance for the help!
This is a scope problem: employees is defined in the subquery only, it is not available in the outer scope. You basically want to alias the outer query:
DECLARE #OrgID INT = 1000;
SELECT COUNT(*), SUM(employees.Tier1) TotalTier1, SUM(employees.Tier2) TotalTier2, SUM(employees.Tier3) TotalTier3
FROM (
SELECT *,
...
) AS employees
) AS employees;
--^ here
Note that I added column aliases to the outer query, which is a good practice in SQL.
It might be easier to understand what is going on if you use another alias for the outer query:
SELECT COUNT(*), SUM(e.Tier1), SUM(e.Tier2), SUM(e.Tier3)
FROM (
SELECT *,
...
) AS employees
) AS e;
Note that you don't actually need to qualify the column names in the outer query, since column names are unambigous anyway.
And finally: you don't actually need a subquery. You could write the query as:
SELECT
SUM(CASE WHEN Id1 >= 6 THEN 1 ELSE 0 END) AS TotalTier1,
SUM(CASE WHEN Id1 >= 4 THEN 1 ELSE 0 END) AS TotalTier2,
SUM(CASE WHEN Id1 >= 2 THEN 1 ELSE 0 END) AS TotalTier3
FROM (
SELECT
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName,
MAX(AppSubmitU_Level.Id1) AS Id1
FROM Org
INNER JOIN AppEmployment
ON AppEmployment.OrgID = Org.OrgID
INNER JOIN App
ON App.AppID = AppEmployment.AppID
INNER JOIN AppSubmit
ON App.AppID = AppSubmit.AppID
INNER JOIN AppSubmitU_Level
ON AppSubmit.LevelID = AppSubmitU_Level.Id1
INNER JOIN AppEmpU_VerifyStatus
ON AppEmpU_VerifyStatus.VerifyStatusID = AppEmployment.VerifyStatusID
WHERE AppSubmitU_Level.SubmitTypeID = 1 -- Career
AND AppEmpU_VerifyStatus.StatusIsVerified = 1
AND AppSubmit.[ExpireDate] IS NOT NULL
AND AppSubmit.[ExpireDate] > GETDATE()
AND Org.OrgID = #Id
GROUP BY
Org.OrgID,
App.AppID,
App.FirstName,
App.LastName
) employees

Rewrite a query with GROUP BY ALL

Microsoft has deprecated GROUP BY ALL and while the query might work now, I'd like to future-proof this query for future SQL upgrades.
Currently, my query is:
SELECT qt.QueueName AS [Queue] ,
COUNT ( qt.QueueName ) AS [#ofUnprocessedEnvelopes] ,
COUNT ( CASE WHEN dq.AssignedToUserID = 0 THEN 1
ELSE NULL
END
) AS [#ofUnassignedEnvelopes] ,
MIN ( dq.DocumentDate ) AS [OldestEnvelope]
FROM dbo.VehicleReg_Documents_QueueTypes AS [qt]
LEFT OUTER JOIN dbo.VehicleReg_Documents_Queue AS [dq] ON dq.QueueID = qt.QueueTypeID
WHERE dq.IsProcessed = 0
AND dq.PageNumber = 1
GROUP BY ALL qt.QueueName
ORDER BY qt.QueueName ASC;
And the resulting data set:
<table><tbody><tr><td>Queue</td><td>#ofUnprocessedEnvelopes</td><td>#ofUnassignedEnvelopes</td><td>OldestEnvelope</td></tr><tr><td>Cancellations</td><td>0</td><td>0</td><td>NULL</td></tr><tr><td>Dealer</td><td>26</td><td>17</td><td>2018-04-06</td></tr><tr><td>Matched to Registration</td><td>93</td><td>82</td><td>2018-04-04</td></tr><tr><td>New Registration</td><td>166</td><td>140</td><td>2018-03-21</td></tr><tr><td>Remaining Documents</td><td>2</td><td>2</td><td>2018-04-04</td></tr><tr><td>Renewals</td><td>217</td><td>0</td><td>2018-04-03</td></tr><tr><td>Transfers</td><td>296</td><td>245</td><td>2018-03-30</td></tr><tr><td>Writebacks</td><td>53</td><td>46</td><td>2018-04-09</td></tr></tbody></table>
I've tried various versions using CTE's and UNION's but I cannot get result set to generate correctly - the records that have no counts will not display or I will have duplicate records displayed.
Any suggestions on how to make this work without the GROUP BY ALL?
Below is one attempt where I tried a CTE with a UNION:
;WITH QueueTypes ( QueueTypeID, QueueName )
AS ( SELECT QueueTypeID ,
QueueName
FROM dbo.VehicleReg_Documents_QueueTypes )
SELECT qt.QueueName AS [Queue] ,
COUNT ( qt.QueueName ) AS [#ofUnprocessedEnvelopes] ,
COUNT ( CASE WHEN dq.AssignedToUserID = 0 THEN 1
ELSE NULL
END
) AS [#ofUnassignedEnvelopes] ,
CONVERT ( VARCHAR (8), MIN ( dq.DocumentDate ), 1 ) AS [OldestEnvelope]
FROM QueueTypes AS qt
LEFT OUTER JOIN dbo.VehicleReg_Documents_Queue AS dq ON dq.QueueID = qt.QueueTypeID
WHERE dq.IsProcessed = 0
AND dq.PageNumber = 1
GROUP BY qt.QueueName
UNION ALL
SELECT qt.QueueName AS [Queue] ,
COUNT ( qt.QueueName ) AS [#ofUnprocessedEnvelopes] ,
COUNT ( CASE WHEN dq.AssignedToUserID = 0 THEN 1
ELSE NULL
END
) AS [#ofUnassignedEnvelopes] ,
CONVERT ( VARCHAR (8), MIN ( dq.DocumentDate ), 1 ) AS [OldestEnvelope]
FROM QueueTypes AS qt
LEFT OUTER JOIN dbo.VehicleReg_Documents_Queue AS dq ON dq.QueueID = qt.QueueTypeID
GROUP BY qt.QueueName
But the results are not close to being correct:
Your current query doesn't work as it seems to work, because while you outer join table VehicleReg_Documents_Queue you dismiss all outer joined rows in the WHERE clause, so you are where you would have been with a mere inner join. You may want to consider either moving your criteria to the ON clause or make this an inner join right away.
It is also weird that you join queue type and queue not on the queue ID or the queue type ID, but on dq.QueueID = qt.QueueTypeID. That's like joining employees and addresses on employee number matching the house number. At least that's what it looks like.
(Then why does your queue type table have a queue name? Shouldn't the queue table contain the queue name instead? But this is not about your query, but about your data model.)
GROUP BY ALL means: "Please give us all QueueNames, even when the WHERE clause dismisses them. I see two possibilities for your query:
You do want an outer join actually. Then there is no WHERE clause and you can simply make this GROUP BY qt.QueueName.
You don't want an outer join. Then we want a row per QueueName in the table, which we might not get with simply changing GROUP BY ALL qt.QueueName to GROUP BY qt.QueueName.
In that second case we want all QueueNames first and outer join your query:
select
qn.QueueName AS [Queue],
q.[#ofUnassignedEnvelopes],
q.[OldestEnvelope]
FROM (select distinct QueueName from VehicleReg_Documents_QueueTypes) qn
LEFT JOIN
(
SELECT qt.QueueName,
COUNT ( qt.QueueName ) AS [#ofUnprocessedEnvelopes] ,
COUNT ( CASE WHEN dq.AssignedToUserID = 0 THEN 1
ELSE NULL
END
) AS [#ofUnassignedEnvelopes] ,
MIN ( dq.DocumentDate ) AS [OldestEnvelope]
FROM dbo.VehicleReg_Documents_QueueTypes AS [qt]
JOIN dbo.VehicleReg_Documents_Queue AS [dq] ON dq.QueueID = qt.QueueTypeID
WHERE dq.IsProcessed = 0
AND dq.PageNumber = 1
) q ON q.QueueName = qn.QueueName
GROUP BY ALL qn.QueueName
ORDER BY qn.QueueName ASC;
I think the best corollary here for a 'GROUP BY ALL' into something more ANSI compliant would be a CASE statement. Without knowing your data, it's hard to say for sure if this is 1:1, but I'm betting it's in the ballpark.
SELECT qt.QueueName AS [Queue]
,COUNT(CASE
WHEN dq.IsProcessed = 0
AND dq.PageNumber = 1
THEN qt.QueueName
END) AS [#ofUnprocessedEnvelopes]
,COUNT(CASE
WHEN dq.AssignedToUserID = 0
AND dq.IsProcessed = 0
AND dq.PageNumber = 1
THEN 1
ELSE NULL
END) AS [#ofUnassignedEnvelopes]
,MIN(CASE
WHEN dq.IsProcessed = 0
AND dq.PageNumber = 1
THEN dq.DocumentDate
END) AS [OldestEnvelope]
FROM dbo.VehicleReg_Documents_QueueTypes AS [qt]
LEFT OUTER JOIN dbo.VehicleReg_Documents_Queue AS [dq] ON dq.QueueID = qt.QueueTypeID
GROUP BY qt.QueueName
ORDER BY qt.QueueName ASC;
That's a bit uglier because every aggregate has to have the WHERE conditions inside a case statement, but at least you are future proof.

Counting records in a SQL subquery

I'm having difficult with a subquery. In plain English I'm trying to pick a random userID from the QCUsers table that has less than 20 records from the QCTier1_Assignments table. The problem is that my query below is only picking users where it meets the criteria of the inner query when I need it to pick any user from QCUsers table even if the user does not have any records at all in the QCTier1_Assignments table. I need something like this
AND (Sub.QCCount < 20 OR Sub.QCCount = 0 )
DECLARE #ReviewPeriodMonth varchar(10) = '10'
DECLARE #ReviewPeriodYear varchar(10) = '2015'
SELECT TOP 1
E1.UserID
,Sub.QCCount --Drawn from the subquery
FROM QCUsers E1
JOIN (SELECT
QCA.UserID,
COUNT(*) AS QCCount
FROM QCTier1_Assignments QCA
WHERE QCA.ReviewPeriodMonth = #ReviewPeriodMonth
AND QCA.ReviewPeriodYear = #ReviewPeriodYear
GROUP BY QCA.UserID
) Sub
ON E1.UserID = Sub.UserID
WHERE Active = 1
AND Grade = 12
AND Sub.QCCount < 20
ORDER BY NEWID()
I also tried it this way with no luck
DECLARE #ReviewPeriodMonth varchar(10) = '10'
DECLARE #ReviewPeriodYear varchar(10) = '2015'
SELECT TOP 1
E1.UserID
,Sub.QCCount --Drawn from the subquery
FROM QCUsers E1
RIGHT JOIN (SELECT
QCA.UserID,
ReviewPeriodMonth,
ReviewPeriodYear,
COUNT(*) AS QCCount
FROM QCTier1_Assignments QCA
GROUP BY
QCA.UserID,
ReviewPeriodMonth,
ReviewPeriodYear
) Sub
ON E1.UserID = Sub.UserID
WHERE Active = 1
AND Grade = 12
AND Sub.QCCount < 20
AND Sub.ReviewPeriodMonth = #ReviewPeriodMonth
AND Sub.ReviewPeriodYear = #ReviewPeriodYear
ORDER BY NEWID()
Try using your second query but change the WHERE clause to use COALESCE(Sub.QCCount, 0) instead of justSub.QCCount`
If the subquery returns no rows then with your RIGHT JOIN you'll at least still get the row, but the QCCount will be NULL which when compared to anything will result in a "false" effectively.
Also, you should look into the HAVING clause. It might allow you to do this without a subquery at all.
Here's an example with the HAVING clause. If it doesn't give the correct results please let me know as I'm not able to test this.
DECLARE
#ReviewPeriodMonth VARCHAR(10) = '10'
#ReviewPeriodYear VARCHAR(10) = '2015'
SELECT TOP 1
E1.UserID,
COUNT(QCA.UserID) AS QCCount
FROM
QCUsers E1
LEFT OUTER JOIN QCTier1_Assignments QCA ON
QCA.UserID = E1.UserID AND
QCA.ReviewPeriodMonth = #ReviewPeriodMonth AND
QCA.ReviewPeriodYear = #ReviewPeriodYear
WHERE
E1.Active = 1 AND
Grade = 12 AND
HAVING
COUNT(*) < 20
ORDER BY
NEWID()
You should use LEFT JOIN instead of JOIN(INNER JOIN), And you'd better to put the predicate to the outer query based on your practice, but I recommend the following way:
SELECT TOP1 ABC.UserID,ABC.QCCount
FROM
(
SELECT E1.UserID, COUNT(*) as QCCount
FROM QCUsers as E1
LEFT JOIN QCTier1_Assignments as QCA
ON QCA.UserID = E1.UserID
WHERE QCA.ReviewPeriodMonth = #ReviewPeriodMonth
AND QCA.ReviewPeriodYear = #ReviewPeriodYear
AND Active = 1
AND Grade = 12
GROUP BY E1.UserID
) as ABC
WHERE ABC.QCCount <20
ORDER BY NEWID()
I was able to work it out through a combination of responses here
DECLARE #ReviewPeriodMonth varchar(10) = '10'
DECLARE #ReviewPeriodYear varchar(10) = '2015'
SELECT TOP 1
QCUsers.UserID,
COUNT(QCTier1_Assignments.ReviewID) AS ReviewCount
FROM
QCTier1_Assignments RIGHT OUTER JOIN
QCUsers ON QCTier1_Assignments.UserID = QCUsers.UserID
WHERE
QCUsers.Active = 1
AND QCUsers.Grade = '12'
AND (ReviewPeriodMonth = #ReviewPeriodMonth OR ReviewPeriodMonth IS NULL)
AND (ReviewPeriodYear = #ReviewPeriodYear OR ReviewPeriodYear IS NULL)
GROUP BY
QCUsers.UserID
HAVING
(COALESCE(COUNT(QCTier1_Assignments.ReviewID),0) < 4)
ORDER BY NEWID()

SQL getting only one max id field on a LEFT OUTER JOIN

how to compare the value of the user table with only one maximum value of the notifications of the table?
SELECT "vkId" AS "id"
FROM "user" AS "user"
LEFT OUTER JOIN "notification" ON "notification"."userId" = "user"."vkId"
WHERE (
"user"."vipEnd" < 1469714507
AND "user"."vipEnd" > 1469710907
) AND (
(
"notification"."type" = 4
AND
"notification"."expiresVipDate" < 1469710907
)
OR "notification"."id" IS NULL
)
LIMIT '100';
You may use DISTINCT ON for that, sorting by expiresVipDate DESC:
SELECT DISTINCT ON (u."vkId") u."vkId" AS id, n.*
FROM "user" AS u
LEFT JOIN notification n ON (n."userId" = u."vkId" AND
n.type = 4 AND
n."expiresVipDate" < 1469710907)
WHERE u."vipEnd" < 1469714507 AND u."vipEnd" > 1469710907
ORDER BY u."vkId", n."expiresVipDate" DESC
LIMIT 100;

How to find rows that have one equal value and one different value from the table

I have the following table:
ID Number Revision
x y 0
x y 1
z w 0
a w 0
a w 1
b m 0
b m 0
I need to return rows that for the same Number thare are more then one ID with the same Revision.Number can be "Null" and I don't need those values.
The output should be:
z w 0
a w 0
I have tried the following query:
SELECT a.id,a.number,a.revision,
FROM table a INNER JOIN
(SELECT id, number, revision FROM table where number > '0'
GROUP BY number HAVING COUNT(*) > 1
) b ON a.revision = b.revision AND a.id != b.id
A little addition- I have rows in my table with the same Number, ID and Revision- I don't need those rows in my query to be displayed!
It is not working! Please help me to figure out how to fix it.
Thanks.
Select t.Id,s.number,t.revision
from (Select number,count(*) 'c'
from table t1
where revision=0
group by number
having count(*) > 1
) s join table t on t.number= s.number
where revision = 0
Another simple approach:
SELECT DISTINCT b.id, b.Number, b.Revision
FROM tbl a
INNER JOIN tbl b
ON a.ID != b.ID AND a.Number = b.Number AND a.Revision = b.Revision;
This is tested in MySql 5, syntax might differ slightly.
You are not that far away with your query:
SELECT a.id,a.number,a.revision
FROM table a
JOIN (
-- multiple id for the same number and revision
SELECT number, revision
FROM table
GROUP BY number, revision
HAVING COUNT(*) > 1
) b
ON a.revision = b.revision
AND a.number = b.number
Untested, but you should get the idea. If your sql-server is a resent version you can solve this with OLAP functions as well.
To filter out rows where the whole row is duplicated we can select only unique rows via group by and having:
SELECT a.id,a.number,a.revision
FROM table a
JOIN (
-- multiple id for the same number and revision
SELECT number, revision
FROM table
GROUP BY number, revision
HAVING COUNT(*) > 1
) b
ON a.revision = b.revision
AND a.number = b.number
GROUP BY a.id,a.number,a.revision
HAVING COUNT(1) = 1