Total Sum and Partial Sum - sql

I am currently using SSMS. I am pulling data, and trying to get two different columns that sum prices. The two columns 'ChangeSpend' and 'TotalSpend' both reference the same column and this is where I am running into problems.
I want ChangeSpend to return the sum of all the codes per receipt that start with V.Ch% (so they exclude all the others) and the TotalSpend to sum all of the codes for each receipt.
Here is my current code:
SELECT
Receipt
,ReceiptCode
,ReceiptAmount
,sum(ReceiptAmount) over (Partition by Receipt) as TotalSpend
,(CASE WHEN ReceiptCode = 'V.Ch%' then sum(ReceiptAmount)
over (Partition by Receipt)
ELSE 0
END) as ChangeSpend
FROM tableA
LEFT OUTER JOIN tableB
on A.Receipt = B.Receipt
WHERE ReceiptCode LIKE 'V.%'
ORDER BY Receipt
However, my query currently prints this:
Receipt ReceiptCode ReceiptAmount TotalSpend ChangeSpend
1 v.cha 5 20 0
1 v.rt 2 20 0
1 v.chb 6 20 0
1 v.abc 7 20 0
2 v.cha 20 21 0
2 v.abc 1 21 0
3 v.cha 4 14 0
3 v.chb 1 14 0
3 v.tye 7 14 0
3 v.chs 2 14 0
And I would like it to print this:
Receipt ReceiptCode ReceiptAmount TotalSpend ChangeSpend
1 v.cha 5 20 11
1 v.rt 2 20 11
1 v.chb 6 20 11
1 v.abc 7 20 11
2 v.cha 20 21 20
2 v.abc 1 21 20
3 v.cha 4 14 7
3 v.chb 1 14 7
3 v.tye 7 14 7
3 v.chs 2 14 7
Thanks for any help

Try
,SUM(CASE WHEN ReceiptCode LIKE 'V.Ch%' THEN ReceiptAmount ELSE 0 END)
OVER (Partition by Receipt)
AS ChangeSpend

You have to put the SUM outside the CASE, not the other way around:
SUM(CASE WHEN SomeCondition=true THEN MyColumn ELSE 0 END)

This may help:
Create Table Payment(
Receipt Int,
ReceiptCode VARCHAR(10),
ReceiptAmount decimal)
Insert Into Payment
Values
(1, 'v.cha', 5),
(1, 'v.rt', 2),
(1, 'v.chb', 6),
(1, 'v.abc', 7),
(2, 'v.cha', 20),
(2, 'v.abc', 1),
(3, 'v.cha', 4),
(3, 'v.chb', 1),
(3, 'v.the', 7),
(3, 'v.chs', 2);
SELECT * ,
SUM(ReceiptAmount) OVER ( PARTITION BY Receipt ) AS TotalSpend ,
SUM(IIF(ReceiptCode LIKE 'v.ch%',ReceiptAmount,0)) OVER ( PARTITION
BY Receipt ) AS ChangeSpend
FROM payment;
Result:

SUM(
CASE WHEN ReceiptCode like 'V.Ch%' then ReceiptAmount ELSE 0 END) as ChangeSpend

Related

Incremental Sum across different groups

I am trying to figure out how to count every product at every date such that count is incremental across all product,
this is dummy table for understanding , I have millions of records with thousands of different products
I am unable to query at every date for each product the count in incremental fashion along with miles as per date provided
CREATE TABLE Dummy_tab (
empid int,
date1_start date,
name_emp varchar(255),
product varchar(255),
miles varchar(20)
);
INSERT INTO Dummy_tab VALUES
(1, '2018-08-27', 'Eric', 'a',10),
(1, '2018-08-28', 'Eric','b',10),
(1, '2018-08-28', 'Eric','a',20),
(2, '2020-01-8', 'Jack','d',10),
(2, '2020-02-8', 'Jack','b',20),
(2, '2020-12-28', 'Jack','b',20),
(2, '2020-12-28', 'Jack','d',20),
(2,'2021-10-28', 'Jack','c',20),
(2, '2022-12-28', 'Jack','d',20),
(3, '2018-12-31', 'Jane','',10),
(3, '2018-12-31', 'Jane','',15);
My desired O/p is this
Id Date a b c d empty miles
1 2018-08-27 1 0 0 0 0 10
1 2018-08-28 2 1 0 0 0 20
2 2020-01-08 0 0 0 1 0 10
2 2020-02-08 0 1 0 1 0 20
2 2020-12-28 0 2 0 2 0 20
2 2021-10-28 0 2 1 2 0 20
2 2022-12-28 0 2 1 3 0 20
3 2018-12-31 0 0 0 0 1 10
3 2019-12-31 0 0 0 0 2 15
FOR EXAMPLE
Eric has 3 entry for ID =1 with product a on 2018 08 27 with product b on 2018 08 28 with product a on 2018 08 28
SO 1st entry a= 1 for ID=1 2nt entry is sum of previous and current so now a =2 for ID=1 and b= 1 as there were no entry earlier for b
Miles needs to be maximum miles for that date from past dates
You need to first (conditionally) aggregate your values here, and then you can do a cumulative SUM:
WITH Aggregates AS(
SELECT empid AS Id,
date1_start AS [Date],
COUNT(CASE product WHEN 'a' THEN 1 END) AS a,
COUNT(CASE product WHEN 'b' THEN 1 END) AS b,
COUNT(CASE product WHEN 'c' THEN 1 END) AS c,
COUNT(CASE product WHEN 'd' THEN 1 END) AS d,
COUNT(CASE product WHEN '' THEN 1 END) AS empty,
MAX(miles) AS miles
FROM dbo.Dummy_tab
GROUP BY empid, date1_start)
SELECT Id,
[Date],
SUM(a) OVER (PARTITION BY Id ORDER BY [Date]) AS a,
SUM(b) OVER (PARTITION BY Id ORDER BY [Date]) AS b,
SUM(c) OVER (PARTITION BY Id ORDER BY [Date]) AS c,
SUM(d) OVER (PARTITION BY Id ORDER BY [Date]) AS d,
SUM(empty) OVER (PARTITION BY Id ORDER BY [Date]) AS empty,
miles
FROM Aggregates
ORDER BY ID,
[Date];

skip consecutive rows after specific value

Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception:
skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare #req_data table (
id int primary key identity,
session_id int,
value int
)
insert into #req_data(session_id, value) values (5, 7)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (5, 1) -- ignore this one too
insert into #req_data(session_id, value) values (5, 12)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (6, 7)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (6, 3)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (7, 7)
insert into #req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into #req_data(session_id, value) values (8, 2)
insert into #req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from #req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?
The following should work well for this.
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
Results...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
Edit: How and why it works...
The basic premise is taken from Itzik Ben-Gan's "The Last non NULL Puzzle".
Essentially, we are relying 2 different behaviors that most people don't usually think about...
1) NULL + anything = NULL.
2) You can CAST or CONVERT an INT into a fixed length BINARY data type and it will continue to sort as an INT (as opposed to sorting like a text string).
This is easier to see when the intermittent steps are added to the query in the CTE...
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
Results...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
Looking at the BinVal column, we see an 8 byte hex value for all non-[value] = 1 rows and NULLS where [value] = 1... The 1st 4 bytes are the Id (used for ordering) and the 2nd 4 bytes are [value] (used to set the "previous non-1 value" or set the whole thing to NULL.
The 2nd step is to "smear" the non-NULL values into the NULLs using the window framed MAX function, partitioned by session_id and ordered by id.
The 3rd step is to parse out the last 4 bytes and convert them back to an INT data type (SecondHalfAsINT) and deal with any nulls that result from not having any non-1 preceding value (ControlValue).
Since we can't reference a windowed function in the WHERE clause, we have to throw the query into a CTE (a derived table would work just as well) so that we can use the new ControlValue in the where clause.
SELECT CRow.id
FROM #req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM #req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN #req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7
You can use the following query:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
to get:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
So, this query detects islands of consecutive 1 records that belong to the same group, as specified by the first preceding row with value <> 1.
You can use a window function once more to detect all 7 islands. If you wrap this in a second cte, then you can finally get the desired result by filtering out all 7 islands:
;with session_islands as (
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
), islands_with_7 as (
select id, grp, value,
count(case when value = 7 then 1 end)
over (partition by session_id, grp) as cnt_7
from session_islands
)
select id
from islands_with_7
where cnt_7 = 0 and value = 1

Different select criteria in odd and even events

I have a table which looks like this ( 10 billion rows)
AID BID CID
1 2 1
1 6 9
0 1 4
1 3 2
1 100 2
0 4 2
0 0 1
The AID could only be 0 or 1. BID and CID could be anything.
Now I want to select events first with AID=1 and then AID=0, and again AID=1 and then AID=0.
The idea is to select equal numbers of AID=1 and AID=0 event.
How can I achieve that?
The expected result is
AID BID CID
1 2 1
0 1 4
1 6 9
0 4 2
1 3 2
0 0 1
;WITH cte AS (
select *
FROM (VALUES
(1, 2, 1),
(1, 6, 9),
(0, 1, 4),
(1, 3, 2),
(1, 100, 2),
(0, 4, 2),
(0, 0, 1)
) as t(AID, BID, CID)
),
withrow AS (
SELECT ROW_NUMBER() OVER (PARTITION BY AID ORDER BY AID) as RN, *
FROM cte)
SELECT AID,BID,CID
FROM withrow
ORDER BY RN asc , aid desc
Output:
AID BID CID
----------- ----------- -----------
1 100 2
0 4 2
1 3 2
0 1 4
1 6 9
0 0 1
1 2 1
(7 row(s) affected)

SQL Calculations based by field type and group by the type

Database includes FamID, TicketType and Amt
I want to get a calculation for total amount for each tickettype for each family and sort by family high to low based on total for all tickettypes.
Database values are:
FamID TicketType Amt
1 1 10
1 1 10
1 2 20
1 3 30
2 2 20
2 1 10
2 1 10
2 1 10
2 3 30
3 3 30
3 3 30
3 3 30
Would like results to be
Family Type 1 Type 2 Type 3 Total
3 0 0 90 90
2 30 20 30 80
1 20 20 30 70
Am I trying to do too much?
You never specified your RDBMS, but the following pivot query should work across most major ones with little modification:
SELECT t.`Type 1`, t.`Type 2`, t.`Type 3`,
(t.`Type 1` + t.`Type 2` + 2*t.`Type 3`) AS Total
FROM
(
SELECT FamID AS Family,
SUM(CASE WHEN TicketType = 1 THEN Amt ELSE 0 END) AS `Type 1`,
SUM(CASE WHEN TicketType = 2 THEN Amt ELSE 0 END) AS `Type 2`,
SUM(CASE WHEN TicketType = 3 THEN Amt ELSE 0 END) AS `Type 3`,
FROM Tickets
GROUP BY FamID
) t
ORDER BY t.Total DESC

Week based count

I have a requirement to retrieve the data in the below fashion
Weeks delay_count
0 6
1 0
2 3
3 4
4 0
5 1
6 0
7 0
8 0
9 0
10 2
11 0
12 0
13 0
14 0
15 3
Here weeks is the hard coded column from 0 to 15 and delay_count is the derived column. I have a column delay_weeks. Based on the values in this column I need to populate the values in the delay_count column (derived column)
delay_weeks column values are below.
blank
blank
blank
2
10
5
blank
3
2
10
2
3
3
3
0
0
15
22
29
Conditions:
When delay_weeks is blank or 0 then count in the delay_count column should be 1
When delay_weeks is 3 then in the delay_count column the count should be 1 under week 3
When delay_weeks is 10 then in the delay_count column the count should be 1 under week 10
When delay_weeks is greater than or equal to 15 then in the delay_count column the count should be 1 under week 15.
I wrote code like below
SELECT "Weeks", a."delay_count"
FROM (SELECT LEVEL AS "Weeks"
FROM DUAL
CONNECT BY LEVEL <= 15) m,
(SELECT VALUE, COUNT (VALUE) AS "delay_numbers"
FROM (SELECT CASE
WHEN attr11.VALUE >= 15
THEN '15'
ELSE attr11.VALUE
END
VALUE
FROM docs,
(SELECT object_id, VALUE, attribute_type_id
FROM ATTRIBUTES
WHERE attribute_type_id =
(SELECT attribute_type_id
FROM attribute_types
WHERE name_display_code =
'ATTRIBUTE_TYPE.DELAY IN WEEKS')) attr11
WHERE docs.obj_id = attr11.object_id(+)
GROUP BY VALUE) a
WHERE m."Weeks" = a.VALUE(+)
select
weeks,
nvl(cnt, 0) as delay_count
from
(select level-1 as weeks from dual connect by level < 17)
left join (
select
nvl(least(attr11.value, 15), 0) as weeks,
count(0) as cnt
from
DOCS
left join (
ATTRIBUTES attr11
join ATTRIBUTE_TYPES atr_tp using(attribute_type_id)
)
on atr_tp.name_display_code = 'ATTRIBUTE_TYPE.DELAY IN WEEKS'
and docs.obj_id = attr11.object_id
group by nvl(least(attr11.value, 15), 0)
) using(weeks)
order by 1
Reverse-engineering the relevant parts of the table definitions, I think this gives you what you want:
select t.weeks, count(delay) as delay_count
from (select level - 1 as weeks from dual connect by level <= 16) t
left join (
select case when a.value is null then 0
when to_number(a.value) > 15 then 15
else to_number(a.value) end as delay
from docs d
left join (
select a.object_id, a.value
from attributes a
join attribute_types at on at.attribute_type_id = a.attribute_type_id
where at.name_display_code = 'ATTRIBUTE_TYPE.DELAY IN WEEKS'
) a on a.object_id = d.obj_id
) delays on delays.delay = t.weeks
group by t.weeks
order by t.weeks;
With what I think is matching data I get:
WEEKS DELAY_COUNT
---------- -----------
0 6
1 0
2 3
3 4
4 0
5 1
6 0
7 0
8 0
9 0
10 2
11 0
12 0
13 0
14 0
15 3
But obviously since you haven't given the real table structures I'm guessing a bit on the relationships. Obligatory SQL Fiddle.