CTE and Lag query - sql

I am trying to achieve a running deductive total, but the problem occurs when I have a zero row. I am using SQL Server 2012.
Here's a batch of my current results.
SuppressionDescription SuppressionPriority SuppressionPriorityOrder TotalRecords RecordsLost RunningTotal
Deceased_Bln 1 1 1376 2 1374
Pivotal Postcode Exclusions 9 2 1376 0 1374
Pivotal 3 Month Decline 11 3 1376 24 1352
Postcode exclusions (Complaints) 12 4 1376 0 1352
Gone Away (from Barcode on returned mail) 15 5 1376 30 1346
Pivotal prospects with a Do Not Mail flag 16 6 1376 234 1112
Email Suppression File 17 7 1376 7 1135
Opt outs & undeliverables from SMS system 18 8 1376 7 1362
Generic Phone Number Suppression 19 9 1376 245 1124
Exclude if not MR, MRS, MISS, MS, NULL 23 10 1376 0 1131
Total Prospects 9999 11 1376 0 1376
With my query, the first two rows calculate. A total of 1,376 less 2 leaves 1,374 and the second row has no RecordsLost, so the RunningTotal remains the same. So far so good.
But because Row 2 has a 0 count, it means row 3 (whilst it has a RecordsLost of 24) gets out of sync.
I've tried adding various case statements, to try and get totals for different scenario's, but it never quite works.
Here is my statement:
;WITH CTE AS (SELECT ID, SuppressionTypeID, ContactMethodType, SupplierName, SuppressionDescription
, SuppressionCount, TotalRecords, RecordsLost, SuppressionPriority, SuppressionPriorityOrder
, CASE WHEN SuppressionPriority = 9999
THEN TotalRecords
ELSE TotalRecords - RecordsLost END AS RowDiff
, CASE WHEN RecordsLost = 0
THEN LAG(RecordsLost, 1) OVER (PARTITION BY SupplierName, ContactMethodType ORDER BY SuppressionPriorityOrder)
ELSE RecordsLost END AS RecordsLostRoll
FROM #tmpSup
WHERE SupplierName = 'Freeman Grattan Holdings'
AND ContactMethodType = 'A'
)
SELECT SuppressionTypeID, ContactMethodType, SupplierName, SuppressionDescription
, SuppressionPriority, SuppressionPriorityOrder
, TotalRecords
, RecordsLost
, CASE WHEN SuppressionPriorityOrder = 1
THEN RowDiff
ELSE (LAG(RowDiff, 1, RowDiff)
OVER (PARTITION BY SupplierName, ContactMethodType ORDER BY SuppressionPriorityOrder)) - RecordsLost
END AS RunningTotal
FROM CTE
ORDER BY SupplierName
, ContactMethodType
, SuppressionPriority
I'd like to see the RunningTotal as:
1374
1374
1350
1350
1320
1086
1079
1072
827
827
1376 (for total Prospects)
Any suggestions and thanks in advance.

You can use a 'window frame' in the windowed function to achieve this:
SELECT
SuppressionPriorityOrder,
TotalRecords,
RecordsLost,
TotalRecords - SUM(RecordsLost) OVER
(
PARTITION BY SupplierName, ContactMethodType
ORDER BY SuppressionPriorityOrder ROWS UNBOUNDED PRECEDING -- this tells the SUM() to add up all preceding records (including the current row)
) RunningTotal
FROM
(
SELECT 'Freeman Grattan Holdings' AS SupplierName, 'A' AS ContactMethodType, 1 AS SuppressionPriorityOrder, 1376 AS TotalRecords, 2 AS RecordsLost UNION ALL
SELECT 'Freeman Grattan Holdings' AS SupplierName, 'A' AS ContactMethodType, 2 AS SuppressionPriorityOrder, 1376 AS TotalRecords, 0 AS RecordsLost UNION ALL
SELECT 'Freeman Grattan Holdings' AS SupplierName, 'A' AS ContactMethodType, 3 AS SuppressionPriorityOrder, 1376 AS TotalRecords, 24 AS RecordsLost UNION ALL
SELECT 'Freeman Grattan Holdings' AS SupplierName, 'A' AS ContactMethodType, 4 AS SuppressionPriorityOrder, 1376 AS TotalRecords, 0 AS RecordsLost UNION ALL
SELECT 'Freeman Grattan Holdings' AS SupplierName, 'A' AS ContactMethodType, 5 AS SuppressionPriorityOrder, 1376 AS TotalRecords, 30 AS RecordsLost
) x
Reference: link.

I try to simulate the best I could your data. Let me know if you need some changes on the sqlFiddle
SQL FIDDLE DEMO
I use another CTE to calculate the sum of lost records
WITH
LOST AS (
SELECT
SuppressionPriorityOrder,
(SELECT SUM(RecordsLost)
FROM tmpSup T2
WHERE T2.SuppressionPriorityOrder <= T1.SuppressionPriorityOrder) as TotalLost
FROM tmpSup T1
)
SELECT
T.*,
L.TotalLost,
CASE WHEN SuppressionPriority = 9999
THEN T.TotalRecords
ELSE (T.TotalRecords - L.TotalLost) END AS RunningTotal
FROM
tmpSup T INNER JOIN
LOST L ON T.SuppressionPriorityOrder = L.SuppressionPriorityOrder

Related

SQL Grouping and dense rank concept

I have a data set that looks like:
cust city hotel_id amount
-------------------------------
A 1 252 3160
B 1 256 1893
C 2 105 2188
D 2 105 3054
E 3 370 6107
F 2 110 3160
G 2 150 1893
H 3 310 2188
I 1 252 3160
J 1 250 4000
K 3 370 5000
L 3 311 1095
Query to display the top 3 hotels by revenue (Sum of amount) for each city?
Since same hotel can be booked by other customer in same city so we need to sum the amount to find total amount.
Expected output:
city hotel_id amount
---------------------------
1 252 6320
1 250 4000
1 256 1893
2 105 5242
2 110 3160
2 150 1893
3 370 11107
3 310 2188
3 311 1095
SELECT
t.city, t.hotel_id, t.amount
FROM
(
SELECT city, hotel_id, SUM(amount) AS amount,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY SUM(amount) DESC) AS rn
FROM yourTable
GROUP BY city, hotel_id
) t
WHERE t.rn <= 3
ORDER BY t.city, t.amount DESC;
Demo here:
Rextester
To get the total sum for each hotel_id you need to group by that column first, then group by the city for syntax purposes. The #tmp table here should have all of the data you need, so then you just have to select the top 3 entries for each city from there.
SELECT city, hotel_id, SUM(amount) AS 'total' INTO #tmp
FROM [table]
GROUP BY hotel_id, city
(SELECT TOP 3 *
FROM #tmp
WHERE city = 1)
UNION
(SELECT TOP 3 *
FROM #tmp
WHERE city = 2)
UNION
(SELECT TOP 3 *
FROM #tmp
WHERE city = 3)

SQL select & Group if at least one exists

I need to select & Group Id & Pid when at least 1 for each Pid in Id has IsExists=1
Id Pid Opt IsExists
27 2 107 1
27 2 108 0
27 5 96 1
51 9 17 1
51 9 18 0
51 10 112 0
758 25 96 0
758 97 954 1
758 194 2902 1
758 194 2903 1
The result should be:
Id IsExists
27 1
In the result for [id=27 | pid=2] & for [id=27 | pid=5] has at least 1 with isExists=1
Is it possible?
One method uses two levels of aggregation:
select id
from (select id, pid, max(isexists) as max_isexists
from t
group by id, pid
) t
having count(*) = sum(max_isexists);
This assumes that isexists takes on the values 0 and 1.
An alternative only uses one level of aggregation but is a bit trickier, using count(distinct):
select id
from t
group by id
having count(distinct pid) = count(distinct case when isexists = 1 then pid end);
You need a nested aggregation:
select Id
from
(
select Id, Pid,
-- returns 1 when value exists
max(IsExists) as maxExists
from tab
group by Id, Pid
) as dt
group by Id
-- check if all Pid got a 1
having min(maxExists) = 1
Try this ... it uses the inner group by to get the distinct counts of IsExists by ID and PID and the outer one checks if there are 2 or more
SELECT ID, 1 as IsExists FROM
(
select ID, PID , Count(Distinct IsExists) as IsExists
FROM
(
Select 27 as ID , 2 as PID , 1 as IsExists UNION ALL
Select 27 as ID , 2 as PID , 0 as IsExists UNION ALL
Select 27 as ID , 5 as PID , 1 as IsExists UNION ALL
Select 51 as ID , 9 as PID , 1 as IsExists UNION ALL
Select 51 as ID , 9 as PID , 0 as IsExists UNION ALL
Select 51 as ID , 10 as PID , 0 as IsExists
) a
WHERE IsExists = 1
Group by ID, PID
) B
GROUP BY B.ID
Having Count(*) >= 2

SQL: alternatives and substitutions for GROUPING SETS and PIVOT

I've got code like this:
SELECT id, YEAR(datek) AS YEAR, COUNT(*) AS NUM
FROM Orders
GROUP BY GROUPING SETS
(
(id, YEAR(datek)),
id,
YEAR(datek),
()
);
It gives me this output:
1 NULL 4
2 NULL 11
3 NULL 6
NULL NULL 21
1 2006 36
2 2006 56
3 2006 51
NULL 2006 143
1 2007 130
2 2007 143
3 2007 125
NULL 2007 398
1 2008 79
2 2008 116
3 2008 73
NULL 2008 268
NULL NULL 830
1 NULL 249
2 NULL 326
3 NULL 255
What I need to do is write it without "grouping sets" (nor cube or rollup) but with the same result. I thought about writing three different queries and join them with "union". I try something like "null" in group by settings but it does not work.
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, YEAR(datek)
UNION
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, null
order by id, YEAR(datek)
I also have a question about "PIVOT". What kind of syntax can replace query with "PIVOT"?
Thanks for your time and all the answers!
You are right in that you need separate queries, although you actually need 4, and rather than GROUP BY NULL, just group by the columns in the corresponding grouping set, and replace the column in the SELECT with NULL:
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, YEAR(datek)
UNION ALL
SELECT id, NULL, COUNT(*) AS NUM
FROM Orders
GROUP BY id
UNION ALL
SELECT NULL, YEAR(datek), COUNT(*) AS NUM
FROM Orders
GROUP BY YEAR(datek)
UNION ALL
SELECT NULL, NULL, COUNT(*) AS NUM
FROM Orders
ORDER BY ID, Rok
With regard to a replacement for PIVOT I think the best alternative is to use a conditional aggregate, e.g. instead of:
SELECT pvt.SomeGroup,
pvt.[A],
pvt.[B],
pvt.[C]
FROM T
PIVOT (SUM(Val) FOR Col IN ([A], [B], [C])) AS pvt;
You would use:
SELECT T.SomeGroup,
[A] = SUM(CASE WHEN T.Col = 'A' THEN T.Val ELSE 0 END),
[B] = SUM(CASE WHEN T.Col = 'B' THEN T.Val ELSE 0 END),
[C] = SUM(CASE WHEN T.Col = 'C' THEN T.Val ELSE 0 END)
FROM T
GROUP BY T.SomeGroup;

SQL - Number of entries needed to reach given value

I need to find how many records it took to reach a given value. I have a table in the below format:
ID Name Time Time 2
1 Campaign 1 7 100
2 Campaign 3 5 165
3 Campaign 1 3 321
4 Campaign 2 610 952
5 Campaign 2 15 13
6 Campaign 2 310 5
7 Campaign 3 0 3
8 Campaign 1 0 610
9 Campaign 1 1 15
10 Campaign 1 54 310
11 Campaign 3 4 0
12 Campaign 2 23 0
13 Campaign 2 8 1
14 Campaign 3 23 1
15 Campaign 3 7 0
16 Campaign 3 5 5
17 Campaign 3 2 66
18 Campaign 3 100 7
19 Campaign 1 165 3
20 Campaign 1 321 13
21 Campaign 1 952 5
22 Campaign 1 13 3
23 Campaign 2 15 610
24 Campaign 2 0 15
25 Campaign 1 100 310
26 Campaign 2 165 0
27 Campaign 3 321 0
28 Campaign 3 952 1
29 Campaign 3 0 1
30 Campaign 3 5 0
I'd like to find out how many entries of 'Campaign 1' there were before the total of Time1 + Time2 was equal to or greater than a given number.
As an example, the result for Campaign 1 to reach 1400 should be 5.
Apologies if I haven't explained this clearly enough - the concept is still a little muddy at the moment.
Thanks
In SQL Server 2012, you can get the row using:
select t.*
from (select t.*, sum(time1 + time2) over (partition by name order by id) as cumsum
from table t
) t
where cumsum >= #VALUE and (cumsum - (time1 + time2)) < #VALUE;
You can get the count using:
select name, count(*)
from (select t.*, sum(time1 + time2) over (partition by name order by id) as cumsum
from table t
) t
where (cumsum - (time1 + time2)) < #VALUE
group by name;
If you are not using SQL Server 2012, you can do the cumulative sum with a correlated subquery:
select name, count(*)
from (select t.*,
(select sum(time1 + time2)
from table t2
where t2.name = t.name and
t2.id <= t.id
) as cumsum
from table t
) t
where (cumsum - (time1 + time2)) < #VALUE
group by name;
A recursive CTE computing a running total should work:
;WITH CTE AS
(
SELECT id,
name,
SUM([time]+[time 2])
OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table1
WHERE name = 'Campaign 1'
)
SELECT count(*)+1 AS [Count]
FROM CTE
WHERE RunningTotal < 1400
Note that I added 1 to the count as the query counts the number of rows needed to reach up to, but not including, 1400. Logic dictates that the next row will push the value above 1400.

Pivot SQL with Rank

Basically i have the following query and i am trying to distinguish only the unique ranks from this:
WITH numbered_rows
as (
SELECT Claim,
reserve,
time,
RANK() OVER (PARTITION BY ClaimNumber ORDER BY time asc) as 'Rank'
FROM (
SELECT cc.Claim,
MAX(csd.time) as time,
csd.reserve
FROM ClaimData csd WITH (NOLOCK)
JOIN Core cc WITH (NOLOCK)
on cc.ClaimID = csd.ClaimID
GROUP BY cc.Claim, csd.Reserve
) as t
)
select *
from numbered_rows cur, numbered_rows prev
where cur.Claim= prev.Claim
and cur.Rank = prev.Rank -1
The results set I get is the following:
Claim reserve Time Rank Claim reserve Time Rank
--------------------------------------------------------------------
11 0 12/10/2012 1 11 15000 5/30/2013 2
34 2000 1/21/2013 1 34 750 1/31/2013 2
34 750 1/31/2013 2 34 0 3/31/2013 3
07 800000 5/9/2013 1 07 0 5/10/2013 2
But what I only want to see the following: (have the Claim 34 Rank 2 removed because its not the highest
Claim reserve Time Rank Claim reserve Time Rank
--------------------------------------------------------------------
11 0 12/10/2012 1 11 15000 5/30/2013 2
34 750 1/31/2013 2 34 0 3/31/2013 3
07 800000 5/9/2013 1 07 0 5/10/2013 2
I think you can do this by just reversing your logic, i.e. order by time DESC, switching cur and prev in your final select and changing -1 to +1 in your final select, then just limiting prev.rank to 1, therefore ensuring that the you only include the latest 2 results for each claim:
WITH numbered_rows AS
( SELECT Claim,
reserve,
time,
[Rank] = RANK() OVER (PARTITION BY ClaimNumber ORDER BY time DESC)
FROM ( SELECT cc.Claim,
[Time] = MAX(csd.time),
csd.reserve
FROM ClaimData AS csd WITH (NOLOCK)
INNER JOIN JOIN Core AS cc WITH (NOLOCK)
ON cc.ClaimID = csd.ClaimID
GROUP BY cc.Claim, csd.Reserve
) t
)
SELECT *
FROM numbered_rows AS prev
INNER JOIN numbered_rows AS cur
ON cur.Claim= prev.Claim
AND cur.Rank = prev.Rank + 1
WHERE prev.Rank = 1;