To make running total over groups is very easy today with sum over partition.
I have a need to reset the total in the same partition based on a condition, if some field in a row is false, the sum should be reset and begin from this row.
In code this is very easy, just loop over the rows and check for the condition. but how can we achieve this in SQL?
Here is a sample, it contains a table with four fields, and a query to sum the running amounts. the sum should be reset if the ResetSum field is true.
CREATE TABLE dbo.Table_1
(
PersonID int NOT NULL,
Amount money NOT NULL,
PayDate date NOT NULL,
ResetSum bit NOT NULL
)
INSERT INTO Table_1 (PersonID, Amount, PayDate, ResetSum)
VALUES (1, 100, '2015-1-1', 0)
,(1,200,'2015-1-2',0)
,(1,180,'2015-1-3',0)
,(1,200,'2015-1-4',1)
,(1,200,'2015-1-5',0)
,(1,360,'2015-1-6',0)
SELECT *,SUM(Amount) over(PARTITION BY PersonID ORDER BY PayDate) as SumAmount
FROM Table_1
Desired result should be 760, not 1140.
The records cannot be grouped by the ResetSum field, because if it is true, all the fields below this should be reset though the ResetField in this row is false.
here is a sample of my .net code, it is very simple:
Public Function SumTest() As Decimal
Dim lst As New List(Of TestRecords)
Dim sum As Decimal = 0
For Each tst As TestRecords In lst
If tst.ResetSum = true Then
sum = fcf.Amount
Else
sum += fcf.Amount
End If
Next
Return sum
End Function
Do a running total on ResetSum in a derived table and use that as a partition column in the running total on Amount.
select T.PersonID,
T.Amount,
T.PayDate,
sum(T.Amount) over(partition by T.PersonID, T.ResetSum
order by T.PayDate rows unbounded preceding) as SumAmount
from (
select T1.PersonID,
T1.Amount,
T1.PayDate,
sum(case T1.ResetSum
when 1 then 1
else 0
end) over(partition by T1.PersonID
order by T1.PayDate rows unbounded preceding) as ResetSum
from dbo.Table_1 as T1
) as T;
SQL Fiddle
Related
I have a table A which contains id and report_day and other columns. Also I've a table B which contains also id, report_day and also subscribers. I want to create a VIEW with id, report_day, subscribers columns. So it's a simple join:
select a.id, a.report_day, b.subscribers from schema.a
left join schema.b on a.id = b.id
and a.report_day = b.report_day
Now i want to add column subscribers_increment based on subscribers. But for some days I don't have stats for subscribers column and it's set to NULL. subcribers_increment it's just a (subcribers(current_day) - subscribers (prev_day).
I read some articles and add next statement:
case WHEN row_number() OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1 THEN b.subscribers
else ab.subscribers - COALESCE(last_value(b.subscribers) OVER (PARTITION BY b.id ORDER BY b.report_day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0::bigint::numeric)
END::integer AS subscribers_increment
And now I've next result:
NULL is still NULL.
For example it has incorrect increment for 2021-04-07. It's increment for 2 days. Can i divide this value from 2021-04-08 by numbers of days (here it's 2) and write same value for 2021-04-07 and 2021-04-08 (or at least for 2021-04-07 where it was null)? And same logic for all days where subscribers is null?
So i need to follow next rules:
If I see NULL value in subcribers column I should go for the next (future) NOT NULL day and grab value for this next day. Substract from this (feature) value last not null value (past - order by date, so we looping back). Divide result of substraction by number of days and fill these rows for column subcribers_increment.
Is it possible?
UPDATE:
For my data it shoud look like this:
UPDATE v2
After applying script:
UPDATE v3
case (our increment) 25.03-27.03 still is NULL
The basic idea is:
Use lag() to get the previous subscribers and dates before joining. This assumes that the left join is the cause of all the NULL values.
Use a cumulative count in reverse to assign a grouping so NULL is combined with the next value in one grouping.
As a result of (2), the count of NULLs in a group is the denominator
As a result of (1) the difference between subscribers and prev_subscribers is the numerator.
The actual calculation requires more window functions and case logic.
So the idea is:
with t as (
select a.id, a.report_day, b.subscribers, b.prev_report_day, b.prev_subscribers,
count(b.subscribers) over (partition by a.id order by a.report_day desc) as grp
from first_table a left join
(select b.*,
lag(b.report_day) over (partition by id order by report_day) as prev_report_day,
lag(b.subscribers) over (partition by id order by report_day) as prev_subscribers
from second_table b
) b
on a.id = b.id and a.report_day = b.report_day
)
select t.*,
(case when t.subscribers is not null and t.prev_report_day = t.report_day - interval '1 day'
then t.subscribers - t.prev_subscribers
when t.subscribers is not null
then (t.subscribers - t.prev_subscribers) / count(*) over (partition by id, grp)
when t.subscribers is null
then (max(t.subscribers) over (partition by id, grp) - max(t.prev_subscribers) over (partition by id, grp)
) / count(*) over (partition by id, grp)
end)
from t;
Here is a db<>fiddle.
In my dataset there are some user_id that each of them has several row number (from 1 to n) that each row has a specific revenue. I want to select the maximum of the revenue for each user_id with the row number belongs to this revenue. I want to have a query with result of the highlighted rows.
One method is a correlated subquery:
select t.*
from t
where t.revenue = (select max(t2.revenue) from t t2 where t2.user_id = t.user_id);
If there are ties for the maximum, this returns all the highest value rows.
select *,
case when revenue = max(revenue) over (partition by user_id) then 1 else 0 end as highlight
from T
select tt.*
from #tbl tt
join (select user_Id, max(revenue) as revenue from #tbl group by user_Id) tm on tt.user_Id = tm.user_Id and tt.revenue = tm.revenue
Select only latest amount, if null then before that.
table a
customer|amount|date
001|2 |20201101
001|null|20201102
001|3 |20201103
002|8.9 |20201101
002|7 |20201008
002|null|20201106
Result
001|null|20201101
001|null|20201102
001|3 |20201103
002|null|20201101
002|null|20201008
002|7 |20201106
amount data should be taken latest as per date , other record will be null, if amount is null for the latest date it should take the previous not null value.
My current attempt:
select top 1 [amount]
from table
where [amount] is not null
order by date desc
If you want to set all but the most recent value to NULL:
select customer_code, date,
(case when seqnum = 1 then amount end) as amount
from (select t.*,
row_number() over (partition by customer_code order by (amount is not null) desc, date desc) as seqnum
from table t
) t
where customer_code = '001'
order by date desc
Probably what you are looking for is a window function:
SELECT *
FROM (SELECT *,
row_number() over
(partition by customer
order by amount desc, date desc) as rn
FROM your_table
WHERE amount is not null)
WHERE rn = 1
You can use row_number or dense_rank depending on your needs
Create a view that returns all inserted values in descending order. Then select the first or second row according to the condition.
Find if a specific column in a group are all NULLs, then populate the target accordingly. I have a record-set as given below. My requirement is to populate the output column "total" based on -
1) Within a group (or Partition) if the "trans_dt" column in all the rows are NULLs, then populate "total" column in the output as zer0
2) If any of the records has a valid value (in trans_dt column) then populate the "total" with the max value of "items" for that that group and the trans_dt as the MAX trans_dt for that group
custid|transact_dt|items
------------------------
1234|05/01/2019|3
1234|10/02/2019|4
1234|Null|3
5678|Null|5
5678|Null|3
5678|Null|1
5678|Null|2
In the above record-set custid "1234" has valid values in trans_dt in 2 rows, hence the output column "total" should be populated as "4". However, for custid "5678", all trans_dt values are Nulls, hence "total" should be populated as 0.
custid|transact_dt|items
------------------------
1234|10/02/2019|4
5678|31/12/9999|0
select custid, max_trans_dt,
CASE WHEN max_trans_dt IS NULL then 0
ELSE total
END as total
from
( select custid, MAX(trans_dt) OVER (PARTITION BY custid) as max_trans_dt, MAX(items) OVER (PARTITION BY custid) as total,
ROW_NUMBER() OVER (PARTITION BY custid order by trans_dt desc, items desc) as rn ) tmp
WHERE tmp.rn = 1
Is there a smarter and cleaner solution to the above requirement ?
Thanks
Just use conditional aggregation:
select cust_id
max(case when trans_dt is not null then items else 0 end) as max_items
from t
group by cust_id;
Using this Query, I need to populate the NULL column with running total for each row where it would correspond to the paid amount over the period of a calendar year, year to date, of the current table. This running total should be grouped by member_id.
SELECT id=identity(int,1,1), cast(null as numeric(22,3)) as max_running_total, *
INTO #temp
FROM Customer_DB..Sales_Table
ORDER BY Date_Column asc
UPDATE #temp
SET max_running_total = (SELECT SUM(paid_amount)
FROM #temp
WHERE id <= id
GROUP BY member_id)
Since you have not given the schema, I have taken a sample schema and have tried to a rolling sum. You can use the same sql windows functions and achieve your results
CREATE TABLE amt
(
id INT,
paid_amount DECIMAL,
running_total DECIMAL
)
insert INTO amt VALUES (1, 100, NULL), (2, 50, NULL), (3, 50, NULL)
SELECT id, paid_amount,
SUM(paid_amount) over(ORDER BY id ROWS BETWEEN unbounded preceding AND CURRENT ROW) AS running_total
FROM amt