SQL Running total grouped by ID - sql

Using this Query, I need to populate the NULL column with running total for each row where it would correspond to the paid amount over the period of a calendar year, year to date, of the current table. This running total should be grouped by member_id.
SELECT id=identity(int,1,1), cast(null as numeric(22,3)) as max_running_total, *
INTO #temp
FROM Customer_DB..Sales_Table
ORDER BY Date_Column asc
UPDATE #temp
SET max_running_total = (SELECT SUM(paid_amount)
FROM #temp
WHERE id <= id
GROUP BY member_id)

Since you have not given the schema, I have taken a sample schema and have tried to a rolling sum. You can use the same sql windows functions and achieve your results
CREATE TABLE amt
(
id INT,
paid_amount DECIMAL,
running_total DECIMAL
)
insert INTO amt VALUES (1, 100, NULL), (2, 50, NULL), (3, 50, NULL)
SELECT id, paid_amount,
SUM(paid_amount) over(ORDER BY id ROWS BETWEEN unbounded preceding AND CURRENT ROW) AS running_total
FROM amt

Related

How to select columns that aren't part of an aggregate query using HAVING SUM() in the WHERE and selecting only certain rows on db2

Using AS400 db2 for this.
I have a table of orders. From that table I have to:
Get all orders from a specified list of order IDs and type
Group by the user_id on those orders
Check to make sure the total order amount on the group is greater than $100
Return all orders that matched the group but the results won't be grouped, which includes order_id which is not part of the group
I got a bit stuck because the AS400 did not like that I was asking to select a field that wasn't part of the group, which I need.
I came up with this query, but it's slow.
-- Create a common temp table we can use in both places
WITH wantedOrders AS (
SELECT order_id FROM orders
WHERE
-- Only orders from the web
order_type = 'web'
-- And only orders that we want to get at this time
AND order_id IN
(
50,
20,
30
)
)
-- Our main select that gets all order information, even the non-grouped stuff
SELECT
t1.order_id,
t1.user_id,
t1.amount,
t2.total_amount,
t2.count
FROM orders AS t1
-- Join in the group data where we can do our query
JOIN (
SELECT
user_id,
SUM(amount) as total_amount,
COUNT(*) AS count
FROM
orders
-- Re use the temp table to get the order numbers
WHERE order_id IN (SELECT order_id FROM wantedOrders)
GROUP BY
user_id
HAVING SUM(amount)>100
) AS t2 ON t2.user_id=t1.user_id
-- Make sure we only use the order numbers
WHERE order_id IN (SELECT order_id FROM wantedOrders)
ORDER BY t1.user_id ASC;
What's the better way to write this query?
Try this:
WITH
wantedOrders (order_id) AS
(
VALUES 1, 2
)
, orders (order_id, user_id, amount) AS
(
VALUES
(1, 1, 50)
, (2, 1, 50)
, (1, 2, 60)
, (2, 2, 60)
, (3, 3, 200)
, (4, 3, 200)
)
-- Our main select that gets all order information, even the non-grouped stuff
SELECT *
FROM
(
SELECT
order_id,
user_id,
amount,
SUM (amount) OVER (PARTITION BY user_id) AS total_amount,
COUNT (*) OVER (PARTITION BY user_id) AS count
FROM orders t
WHERE EXISTS
(
SELECT 1
FROM wantedOrders w
WHERE w.order_id = t.order_id
)
) A
WHERE total_amount > 100
ORDER BY user_id ASC
ORDER_ID
USER_ID
AMOUNT
TOTAL_AMOUNT
COUNT
1
2
60
120
2
2
2
60
120
2
If order_id is the PK of the table. Then just add the columns you need to the wantedOrders query and use it as your "base" (instead of using orders and refiltering it. You should end up joining wantedOrders with itself.
You can do:
select t.*
from orders t
join (
select user_id
from orders t
where order_id in (50, 20, 30)
group by user_id
having sum(total_amount) > 100
) s on s.user_id = t.user_id
The first table orders as t will produce the data you want. It will be filtered by the second "table expression" s that preselects the groups according to your logic.

Ranking and obtaining data across moving window

I have following table -
create table iphone_defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
insert into iphone_defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into iphone_defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into iphone_defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into iphone_defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into iphone_defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into iphone_defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into iphone_defects values ('iPhone','Audio problem',25,202106,'2020-08-09');
Am expecting the following result -
fwkyr refers to Financial Week in a year. I have added in additional column fwenddate basically referring to max date in the financial week of the year.
Basically the ask is to obtain the defect with largest quantity in a 4 week window from the current week. Say for the fwkyr - 202112, the highest defects is for 'Glass breakage' and the total quantity is 100.
This is a static window. My actual use case needs 52 week.
Without the moving window, I know that I can rank and get the data but not sure on how to even approach this problem. Any help?
Per updated question my updated solution gets much longer and changes quite a bit.
I am still not sure if user selects from which week you need another 52 weeks or if you are looking at this calculation from start (week 1) of every year.
I also assume that you have a typo in one of your insert statements when I compare to your desired output table. So I changed it to fit your output table.
1. Create table
create table table.defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
2. Insert data (adjusted last insert to match your output table)
insert into table.defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into table.defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into table.defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into table.defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into table.defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into table.defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into table.defects values ('iPhone','Audio problem',55,202106,'2020-08-09');
3. Query for results
###############################################################################
### start count of weeks since selected first week and
### get number of weeks by desired range
###############################################################################
WITH
get_weeks AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_numbering,
SPLIT(CAST(ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr)/4 AS string), '.')[
OFFSET
(0)] AS week_id_0,
FROM
table.defects
ORDER BY
fwkyr DESC
),
###############################################################################
### produce filter column for each window period by offsetting
###############################################################################
get_weeks_consequtive AS (
SELECT
*,
LAG(week_id_0,1) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_1,
LAG(week_id_0,2) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_2,
LAG(week_id_0,3) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_3
FROM
get_weeks ),
###############################################################################
### create tables and calculations per window using filter column where you group by for qty and keep top qty only
###############################################################################
week_id_0 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_0 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_1 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_1 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_2 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_2 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_3 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_3 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1)
###############################################################################
### union all selected windows
###############################################################################
SELECT
*
FROM
week_id_0
UNION ALL
SELECT
*
FROM
week_id_1
UNION ALL
SELECT
*
FROM
week_id_2
UNION ALL
SELECT
*
FROM
week_id_3
ORDER BY
week_id DESC
get_weeks
get_weeks_consequtive
week_id_1
result
PS ---
I brainstormed this quick per your update perhaps there is a better way and I would be interested in seeing it.
Anyhow, with such lengthy queries I typically produce a python script with text templates for repetitive parts and use a loop to expand repetitive parts to desired lengths by incrementing changing values and inserting them with so called f strings.

postgresql: group by columns/windows function/min-max and complex query

Imagine I've invoices from two branches. I need to select min and max invoice date and on that date show branch id. If at min/max date several branches have invoices, choose any.
CREATE TEMP TABLE invoice (
id int not null,
branch_id int not null,
c_date date not null,
PRIMARY KEY (id)
);
insert into invoice (id, branch_id, c_date) values
(1, 1, '2020-01-01')
,(2, 2, '2020-01-01')
,(3, 1, '2020-01-02')
,(4, 2, '2020-01-02')
,(5, 2, '2020-01-03');
The straightforward solution is (skip max part to do not overcomplicate the query).
select i2.branch_id, i2.c_date from (
select min(i1.id) minid
from (select min(i.c_date) mind, max(i.c_date) maxd
from invoice i
)a
join invoice i1
on a.mind=i1.c_date) b
join invoice i2 on b.minid=i2.id
Window function solution a bit simpler but awkward too. Please keep in mind that the actual query is more complex, and I provide only the core part.
select * from (
select a.branch_id, a.c_date from(
select *, rank() over (order by c_date) r from invoice i
) a where a.r=1
limit 1
) mn,
(select a.branch_id, a.c_date from(
select *, rank() over (order by c_date desc) r from invoice i
) a where a.r=1
limit 1
) mx
Any guesses on how to write the query more elegantly?
One method is a trick using arrays:
select min(date),
(array_agg(branch_id order by date))[1] as first_branch,
max(date),
(array_agg(branch_id order by date desc))[1] as last_branch
from invoice;
This does aggregate all values into an array, so you wouldn't want to use this if there are too many values in each result row.

query in Server Management Studio + table in ssrs reports+-using select with correlated sub-query

i have a table, consist fruit groups and sizes
i.e:
send send
**fruit-package / size / start-date/ end-date/ **
--------------------------------------------------
apple s 2.2.16 5.2.16
apple s 7.2.16 **10.2.16**
apple s **20.2.16** 21.2.16
--------------------------------------------------
apple l 1.2.16 **5.2.16**
apple l **25.2.16** 26.2.16
apple l 26.2.16 27.2.16
-------------------------------------------------
orange m 1.1.16 2.1.16
orange m 3.1.16 **4.1.16**
orange m **24.1.16** 25.1.16
---------------------------------------------------
i need , for each specific group of fruit-package and size
(like apple+small), to find the max days,in the group, passed between
one package send-end-date to the followed package, in the group ,send-start-day
and then select that send-end-date and follow start date and calculate
that max diff between these two values, and put them in the result table for that specific group, doing it for each group
so the result table would be
send send
**fruit-package / size / start-date / end-date/ **
--------------------------------------------------
apple s 20.2.16 10.2.16
--------------------------------------------------
apple l 25.2.16 5.2.16
-------------------------------------------------
orange m 24.1.16 4.1.16
---------------------------------------------------
i tried to do this in parts.
first part:
for each group of fruit -
find all combination of:
(fruit-package) + (size) + (current end_date) and the start_date of the follow package
like that:
select P.fruit
,P.size
,P.end_date
,(SELECT top 1 (pa.start_date)
FROM packages as pa
WHERE pa.start_date >= pa.end_date
and p.fruit=pa.fruit and p.size=pa.size
order by pa.start_date desc ) as start
into #temp
from packages p
group by p.fruit
, P.size
,p.end_date
and second step would be, simplly find the row with the largest day-diff in each group
but the first part i wrote won't work- got null value as start date,
or one end_date and not for each group from inside select -
why and
how to correct it?
please help
thanks
This should work for you if you have 2012 or later.
Create Table #Tbl (Name Varchar(8000), Size Char(1), StartDate Date, EndDate Date)
Insert #Tbl Values ('apple', 's', '2.2.16', '2.5.16')
Insert #Tbl Values ('apple', 's', '2.7.16', '2.10.16')
Insert #Tbl Values ('apple', 's', '2.20.16', '2.21.16')
Insert #Tbl Values ('apple', 'l', '2.1.16', '2.5.16')
Insert #Tbl Values ('apple', 'l', '2.25.16', '2.26.16')
Insert #Tbl Values ('apple', 'l', '2.26.16', '2.27.16')
Insert #Tbl Values ('orange', 'm', '1.1.16', '1.2.16')
Insert #Tbl Values ('orange', 'm', '1.3.16', '1.4.16')
Insert #Tbl Values ('orange', 'm', '1.24.16', '1.25.16')
;With cteQry As
(
Select *,
Lead(StartDate) Over (Partition By Name, Size Order By StartDate) NextStartDate,
DateDiff(d, EndDate, Lead(StartDate) Over (Partition By Name, Size Order By StartDate)) Days
From #Tbl
)
Select *
From
(
Select *,
Row_Number() Over (Partition By Name, Size Order By Days Desc) SortOrder
From cteQry
) A
Where SortOrder = 1
EDIT: Without lead function.
;With cteQry2 As
(
Select *,
DateDiff(d, EndDate,
(Select Top 1 StartDate
From #Tbl
Where Name = T1.Name
And Size = T1.Size
And StartDate > T1.StartDate
Order By StartDate)) Days
From #Tbl T1
)
Select *
From
(
Select *,
Row_Number() Over (Partition By Name, Size Order By Days Desc) SortOrder
From cteQry2
) A
Where SortOrder = 1
Order By Name, Size, StartDate

SQL Group BY SUM one column and select of first row of grouped items

I have a part table where I have 5 fields. I want to sum the QTY of the mfgpn while showing the first returned row for the other 3 fields (Manfucturer, DateCode, Description). I initially thought of using the MIN function as follows, but that doesn't really help me insofar as that the data is not a int data type. How would I go about doing this? Right now I'm stuck at the following query below:
SELECT SUM([QTY]) AS QTY
,[MFGPN]
,MIN([MANUFACTURER]) AS MANUFACTURER
,MIN([DATECODE]) AS DateCode
,MIN([DESCRIPTION]) AS DESCRIPTION
INTO part
GROUP BY MFGPN, MANUFACTURER, DATECODE, description
ORDER BY mfgpn ASC
Would CROSS APPLY work for you?
SELECT
SUM(a.[QTY]) AS QTY
,a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
FROM part a
CROSS APPLY (SELECT TOP 1 * FROM part b WHERE a.[MFGPN] = b.[MFGPN]) c
GROUP BY
a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
Tested with the following:
DECLARE #T1 AS TABLE (
[QTY] int
,[MFGPN] NVARCHAR(50)
,[MANUFACTURER] NVARCHAR(50)
,[DATECODE] DATE
,[DESCRIPTION] NVARCHAR(50));
INSERT #T1 VALUES
(2, 'MFGPN-1', 'MANUFACTURER-A', '20120101', 'A-1'),
(4, 'MFGPN-1', 'MANUFACTURER-B', '20120102', 'B-1'),
(3, 'MFGPN-1', 'MANUFACTURER-C', '20120103', 'C-1'),
(1, 'MFGPN-2', 'MANUFACTURER-A', '20120101', 'A-2'),
(5, 'MFGPN-2', 'MANUFACTURER-B', '20120101', 'B-2')
SELECT
SUM(a.[QTY]) AS QTY
,a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
FROM #T1 a
CROSS APPLY (SELECT TOP 1 * FROM #T1 b WHERE a.[MFGPN] = b.[MFGPN]) c
GROUP BY
a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
Produces
QTY MFGPN MANUFACTURER DATECODE DESCRIPTION
9 MFGPN-1 MANUFACTURER-A 2012-01-01 A-1
6 MFGPN-2 MANUFACTURER-A 2012-01-01 A-2
This can be easily managed with a windowed SUM():
WITH summed_and_ranked AS (
SELECT
MFGPN,
MANUFACTURER,
DATECODE,
DESCRIPTION,
QTY = SUM(QTY) OVER (PARTITION BY MFGPN),
RNK = ROW_NUMBER() OVER (
PARTITION BY MFGPN
ORDER BY DATECODE -- or which column should define the order?
)
FROM atable
)
SELECT
MFGPN,
MANUFACTURER,
DATECODE,
DESCRIPTION,
QTY,
INTO parts
FROM summed_and_ranked
WHERE RNK = 1
;
For every row, the total group quantity and the ranking within the group is calculated. When actually getting rows for inserting into the new table (the main SELECT), only rows with RNK values of 1 are pulled. Thus you get a result set containing group totals as well as details of certain rows.