Cumulated sum based on condition in other column - sql

I would like to create a view based on data in following structure:
CREATE TABLE my_table (
date date,
daily_cumulative_precip float4
);
INSERT INTO my_table (date, daily_cumulative_precip)
VALUES
('2016-07-28', 3.048)
, ('2016-08-04', 2.286)
, ('2016-08-11', 5.334)
, ('2016-08-12', 0.254)
, ('2016-08-13', 2.794)
, ('2016-08-14', 2.286)
, ('2016-08-15', 3.302)
, ('2016-08-17', 3.81)
, ('2016-08-19', 15.746)
, ('2016-08-20', 46.739998);
I would like to accumulate the precipitation for consecutive days only.
Below is the desired result for a different test case - except that days without rain should be omitted:
I have tried window functions with OVER(PARTITION BY date, rain_on_day) but they do not yield the desired result.
How could I solve this?

SELECT date
, dense_rank() OVER (ORDER BY grp) AS consecutive_group_nr -- optional
, daily_cumulative_precip
, sum(daily_cumulative_precip) OVER (PARTITION BY grp ORDER BY date) AS cum_precipitation_mm
FROM (
SELECT date, t.daily_cumulative_precip
, row_number() OVER (ORDER BY date) - t.rn AS grp
FROM (
SELECT generate_series (min(date), max(date), interval '1 day')::date AS date
FROM my_table
) d
LEFT JOIN (SELECT *, row_number() OVER (ORDER BY date) AS rn FROM my_table) t USING (date)
) x
WHERE daily_cumulative_precip > 0
ORDER BY date;
db<>fiddle here
Returns all rainy days with cumulative sums for consecutive days (and a running group number).
Basics:
Select longest continuous sequence

Here's a way to calculate cumulative precipitation without having to explicitly enumerate all dates:
SELECT date, daily_cumulative_precip, sum(daily_cumulative_precip) over (partition by group_num order by date) as cum_precip
FROM
(SELECT date, daily_cumulative_precip, sum(start_group) over (order by date) as group_num
FROM
(SELECT date, daily_cumulative_precip, CASE WHEN (date != prev_date + 1) THEN 1 ELSE 0 END as start_group
FROM
(SELECT date, daily_cumulative_precip, lag(date, 1, '-infinity'::date) over (order by date) as prev_date
FROM my_table) t1) t2) t3
yields
| date | daily_cumulative_precip | cum_precip |
|------------+-------------------------+------------|
| 2016-07-28 | 3.048 | 3.048 |
| 2016-08-04 | 2.286 | 2.286 |
| 2016-08-11 | 5.334 | 5.334 |
| 2016-08-12 | 0.254 | 5.588 |
| 2016-08-13 | 2.794 | 8.382 |
| 2016-08-14 | 2.286 | 10.668 |
| 2016-08-15 | 3.302 | 13.97 |
| 2016-08-17 | 3.81 | 3.81 |
| 2016-08-19 | 15.746 | 15.746 |
| 2016-08-20 | 46.74 | 62.486 |

Related

How to get first_value from previous window partition

I want to display the the BalanceEndOfYesterday Value from the day before in a query as shown below.
| Date | Amout | BalanceEndOfDay | BalanceEndOfYesterday |
|------------|-------|-----------------|-----------------------|
| 2020-04-30 | 10 | 130 | 80 |
| 2020-04-30 | 20 | 130 | 80 |
| 2020-04-30 | 30 | 130 | 80 |
| 2020-04-30 | -10 | 130 | 80 |
| 2020-04-29 | 50 | 80 | 0 |
| 2020-04-29 | -10 | 80 | 0 |
| 2020-04-29 | 40 | 80 | 0 |
My query is
SELECT
BalanceEndOfDay ,
first_value(BalanceEndOfDay) OVER (ORDER BY Date DESC) -- here is some sort of window needed
FROM AccountTransactions
You can use apply :
SELECT at.*, COALESCE(at1.BalanceEndOfDay, 0) AS BalanceEndOfYesterday
FROM AccountTransactions at OUTER APPLY
( SELECT TOP (1) at1.BalanceEndOfDay
FROM AccountTransactions at1
WHERE at1.Date < at.Date
ORDER BY at1.Date DESC
) at1;
EDIT : If you want yesterday only balance then you can use dateadd() :
SELECT DISTINCT at.*, COALESCE(at1.balanceendofday, 0) AS BalanceEndOfYesterday
FROM AccountTransactions at LEFT JOIN
AccountTransactions at1
ON at1.date = dateadd(day, -1, at.date);
We could use LAG here, after first aggregating by date to obtain a single end of day balance for each date. Then, we can join your table to this result to pull in the end of day balance from yesterday.
WITH cte AS (
SELECT Date, MAX(BalanceEndOfDay) AS BalanceEndOfDay,
LAG(MAX(BalanceEndOfDay), 1, 0) OVER (ORDER BY Date) As BalanceEndOfYesterday
FROM AccountTransactions
GROUP BY Date
)
SELECT
a1.Date,
a1.Amount,
a1.BalanceEndOfDay,
a2.BalanceEndOfYesterday
FROM AccountTransactions a1
INNER JOIN cte a2
ON a1.Date = a2.Date
ORDER BY
a1.Date DESC;
Demo
If you want to do this using only window functions, you can use:
select at.*,
max(case when prev_date = dateadd(day, -1, date) then prev_BalanceEndOfDay end) over (partition by date) as prev_BalanceEndOfDay
from (select at.*,
lag(BalanceEndOfDay) over (order by date) as prev_BalanceEndOfDay,
lag(date) over (order by date) as prev_date
from accounttransactions at
) at;
Note: This interprets "the day before" as being exactly one day before. It is means "the day before in the data", then the first comparison should just be max(case when prev_date <> date . . . ).
Here is a db<>fiddle.
Note that in databases that fully support the range window specification, this can be done directly with logic like this:
max(BalanceEndOfDay) over (order by datediff(day, '2000-01-01', date)
range between 1 preceding and 1 preceding
)
Alas, SQL Server does not support this (standard) functionality.

Getting distinct rows for overlapping timestamp in SQL Server

I have the following result set which I get from SQL Server:
employeeNumber | start_date | start_time | end_date | end_time
---------------+------------+------------+--------------+----------
123 | 10-03-2020 | 18:13:55 | 10-03-2020 | 22:59:46
123 | 10-03-2020 | 18:24:22 | 10-03-2020 | 22:59:51
123 | 10-03-2020 | 23:24:22 | 10-03-2020 | 23:59:51
123 | 11-03-2020 | 18:25:25 | 11-03-2020 | 20:59:51
123 | 12-03-2020 | 18:40:22 | 12-03-2020 | 22:59:52
For some cases I have multiple rows for the same overlapping time (row 1 and 2) as above but with a different start and end time (difference in seconds or minutes).
While my query is a simple select query that fetches the data from the source table, What can i add in the where clause to fetch distinct rows for such overlapping timestamp rows. i.e. for the above query i would want the result set to return the following :
employeeNumber | start_date | start_time | end_date | end_time
---------------+------------+------------+--------------+----------
123 | 10-03-2020 | 18:13:55 | 10-03-2020 | 22:59:46
123 | 10-03-2020 | 23:24:22 | 10-03-2020 | 23:59:51
123 | 11-03-2020 | 18:25:25 | 11-03-2020 | 20:59:51
123 | 12-03-2020 | 18:40:22 | 12-03-2020 | 22:59:52
Below is my query :
select
employeeNumber, start_date, start_time, end_date, end_time
from
emp_data
where
employeeNumber = 123
order by
employeeNumber;
I can probably do with fetching only the first record but what would the where clause be.
Any help is appreciated as I am not very familiar with SQL Server.
This is complicated. You need to keep track of "starts" and "ends". I am going to assume that your columns are datetimes or something similar that can be combined into a single column:
with e as (
select e.employeeNumber, v.dt, sum(v.inc) as inc,
sum(sum(v.inc)) over (partition by e.employeeNumber order by v.dt) as in_outs
from emp_data e cross apply
(values (start_date + start_time, 1),
(end_date + end_time, -1)
) v(dt, inc)
group by e.employeeNumber, v.dt
)
select employeeNumber, min(dt) as start_datetime, max(dt) as end_datetime
from (select e.*,
sum(case when in_outs = 0 then 1 else 0 end) over (partition by employeeNumber order by dt) as grp
from e
) e
where in_outs <> 0
group by employeeNumber, grp;
Here is a db<>fiddle.
What is this doing?
First the date/times are converted to date times.
Then the columns are unpivoted and identified as starts and ends, along with +1 or -1 to indicate whether the employee is "entering" or "existing" at that time.
These are accumulated.
Now you have a gaps and islands problem, where you want to find continue periods of "in"s. The "islands" are identified using a cumulative sum of "ins".
Then these are aggregated.
EDIT:
You can replace the cumulative sum with:
from (select e.*,
(select sum(case when e2.in_outs = 0 then 1 else 0 end)
from e e2
where e2.employeeNumber = e.employeeNumber
e2.dt <= e.dt
) as grp
from e
) e

Get Max And Min dates for consecutive values in T-SQL

I have a log table like below and want to simplfy it by getting min start date and max end date for consecutive Status values for each Id. I tried many window function combinations but no luck.
This is what I have:
This is what want to see:
This is a typical gaps-and-islands problem. You want to aggregate groups of consecutive records that have the same Id and Status.
No need for recursion, here is one way to solve it using window functions:
select
Id,
Status,
min(StartDate) StartDate,
max(EndDate) EndDate
from (
select
t.*,
row_number() over(partition by id order by StartDate) rn1,
row_number() over(partition by id, status order by StartDate) rn2
from mytable t
) t
group by
Id,
Status,
rn1 - rn2
order by Id, min(StartDate)
The query works by ranking records over two different partitions (by Id, and by Id and Status). The difference between the ranks gives you the group each record belongs to. You can run the subquery independently to see what it returns and understand the logic.
Demo on DB Fiddle:
Id | Status | StartDate | EndDate
-: | :----- | :------------------ | :------------------
1 | B | 07/02/2019 00:00:00 | 18/02/2019 00:00:00
1 | C | 18/02/2019 00:00:00 | 10/03/2019 00:00:00
1 | B | 10/03/2019 00:00:00 | 01/04/2019 00:00:00
2 | A | 05/02/2019 00:00:00 | 22/04/2019 00:00:00
2 | D | 22/04/2019 00:00:00 | 05/05/2019 00:00:00
2 | A | 05/05/2019 00:00:00 | 30/06/2019 00:00:00
Try the following query. First order the data by StartDate and generate a sequence (rid). Then you the recursive cte to get the first row (rid=1) for each group (id,status), and recursively get the next row and compare the start/end date.
;WITH cte_r(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate, ROW_NUMBER() OVER(PARTITION BY Id,[Status] ORDER BY StartDate) AS rid
FROM log_table
),
cte_range(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate,rid
FROM cte_r
WHERE rid=1
UNION ALL
SELECT p.id, p.[Status], CASE WHEN c.StartDate<p.EndDate THEN p.StartDate ELSE c.StartDate END AS StartDate, c.EndDate,c.rid
FROM cte_range p
INNER JOIN cte_r c
ON p.id=c.id
AND p.[Status]=c.[Status]
AND p.rid+1=c.rid
)
SELECT id,[Status],StartDate,MAX(EndDate) AS EndDate FROM cte_range GROUP BY id,StartDate ;

Count values checking if consecutive

This is my table:
Event Order Timestamp
delFailed 281475031393706 2018-07-24T15:48:08.000Z
reopen 281475031393706 2018-07-24T15:54:36.000Z
reopen 281475031393706 2018-07-24T15:54:51.000Z
I need to count the number of event 'delFailed' and 'reopen' to calculate #delFailed - #reopen.
The difficulty is that there cannot be two same consecutives events, so that in this case the result will be "0" not "-1".
This is what i have achieved so far (Which is wrong because it gives me -1 instead of 0 due to the fact there are two consecutive "reopen" events )
with
events as (
select
event as events,
orders,
"timestamp"
from main_source_execevent
where orders = '281475031393706'
and event in ('reopen', 'delFailed')
order by "timestamp"
),
count_events as (
select
count(events) as CEvents,
events,
orders
from events
group by orders, events
)
select (
(select cevents from count_events where events = 'delFailed') - (select cevents from count_events where events = 'reopen')
) as nAttempts,
orders
from count_events
group by orders
How can i count once if there are two same consecutive events?
It is a gaps-and-islands problem, you can use make to row number to check rows are two same consecutive events
Explain
one row number created by normal.
another row number created by Event column
SELECT *
FROM (
SELECT *
,ROW_NUMBER() OVER(ORDER BY Timestamp) grp
,ROW_NUMBER() OVER(PARTITION BY Event ORDER BY Timestamp) rn
FROM T
) t1
| event | Order | timestamp | grp | rn |
|-----------|-----------------|----------------------|-----|----|
| delFailed | 281475031393706 | 2018-07-24T15:48:08Z | 1 | 1 |
| reopen | 281475031393706 | 2018-07-24T15:54:36Z | 2 | 1 |
| reopen | 281475031393706 | 2018-07-24T15:54:51Z | 3 | 2 |
when you create those two row you can get an upper result, then use grp - rn to get calculation the row are or are not same consecutive.
SELECT *,grp-rn
FROM (
SELECT *
,ROW_NUMBER() OVER(ORDER BY Timestamp) grp
,ROW_NUMBER() OVER(PARTITION BY Event ORDER BY Timestamp) rn
FROM T
) t1
| event | Order | timestamp | grp | rn | grp-rn |
|-----------|-----------------|----------------------|-----|----|----------|
| delFailed | 281475031393706 | 2018-07-24T15:48:08Z | 1 | 1 | 0 |
| reopen | 281475031393706 | 2018-07-24T15:54:36Z | 2 | 1 | 1 |
| reopen | 281475031393706 | 2018-07-24T15:54:51Z | 3 | 2 | 1 |
you can see when if there are two same consecutive events grp-rn column will be the same, so we can group by by grp-rn column and get count
Final query.
CREATE TABLE T(
Event VARCHAR(50),
"Order" VARCHAR(50),
Timestamp Timestamp
);
INSERT INTO T VALUES ('delFailed',281475031393706,'2018-07-24T15:48:08.000Z');
INSERT INTO T VALUES ('reopen',281475031393706,'2018-07-24T15:54:36.000Z');
INSERT INTO T VALUES ('reopen',281475031393706,'2018-07-24T15:54:51.000Z');
Query 1:
SELECT
SUM(CASE WHEN event = 'delFailed' THEN 1 END) -
SUM(CASE WHEN event = 'reopen' THEN 1 END) result
FROM (
SELECT Event,COUNT(distinct Event)
FROM (
SELECT *
,ROW_NUMBER() OVER(ORDER BY Timestamp) grp
,ROW_NUMBER() OVER(PARTITION BY Event ORDER BY Timestamp) rn
FROM T
) t1
group by grp - rn,Event
)t1
Results:
| result |
|--------|
| 0 |
I would just use lag() to get the first event in any sequence of similar values. Then do the calculation:
select sum( (event = 'reopen')::int ) as num_reopens,
sum( (event = 'delFailed')::int ) as num_delFailed
from (select mse.*,
lag(event) over (partition by orders order by "timestamp") as prev_event
from main_source_execevent mse
where orders = '281475031393706' and
event in ('reopen', 'delFailed')
) e
where prev_event <> event or prev_event is null;

get calculation result from two columns in sql server

Please refer to this table below.
|RefNbr | DocDate | OrigAmt | AdjAmt | Balances |
|INV001 | 2016-03-15 | 5,000.00 | 250.00 | 4,750.00 |
|INV002 | 2016-03-16 | 5,000.00 | 750.00 | 4,000.00 |
|INV003 | 2016-03-17 | 5,000.00 | 1,000.00 | 3,000.00 |
|INV004 | 2016-03-19 | 5,000.00 | 500.00 | 2,500.00 |
how to provide query to get value of balances ?
(Balances = OrigAmt - AdjAmt (this rule only for the first row), and then in second row, Balances = Prev Balances (balances in first row) - AdjAmt, and etc).
Here is one way using windowed aggregate function
select OrigAmt - sum(AdjAmt) over(order by DocDate asc) as Balances
From yourtable
For anything less than sql server 2012 use this
SELECT OrigAmt - cum_sum AS Balances
FROM yourtable a
CROSS apply (SELECT Sum(AdjAmt)
FROM yourtable b
WHERE b.DocDate <= a.DocDate) cs( cum_sum)
Try below codes it may help you little .
**
CREATE TABLE #TAB(REFNBR VARCHAR(MAX),DOCDATE DATETIME ,ORIGAMT DECIMAL(18,2),ADJAMT DECIMAL(18,2))
INSERT INTO #TAB VALUES ('INV001','2016-03-15',5000.00,250.00),('INV002','2016-03-16',5000.00,750.00),
('INV003','2016-03-17',5000.00,1000.00),('INV004','2016-03-19',5000.00,500.00)
;WITH CTE AS (
SELECT REFNBR,
DOCDATE,
ORIGAMT,
ADJAMT,
ORIGAMT-ADJAMT AS BALANCE,
ROW_NUMBER() OVER ( ORDER BY DOCDATE) AS RN
FROM #TAB)
SELECT a.REFNBR,
a.DOCDATE,
a.ORIGAMT,
a.ADJAMT,
CASE WHEN ISNULL(LAG(a.BALANCE + ISNULL(x.ADDS,0)) OVER (ORDER BY a.RN),0) + a.ORIGAMT - a.ADJAMT < 0
THEN 0
ELSE a.BALANCE + ISNULL(x.ADDS,0)
END AS FINAL_BALANCE
FROM CTE a
CROSS APPLY (SELECT SUM(BALANCE) AS ADDS
FROM CTE f
WHERE f.REFNBR = a.REFNBR AND f.RN < a.RN
) x
**
The above code is for 2014 for less than 2014 try below code once
SELECT REFNBR,
DocDate,
OrigAmt,
AdjAmt,
CASE
WHEN RNO > 1 THEN Sum(OrigAmt - ADJAMT)
OVER(
PARTITION BY REFNBR
ORDER BY RNO)
ELSE Iif(( OrigAmt - ADJAMT ) < 0, 0, OrigAmt - ADJAMT)
END
FROM (SELECT *,
Row_number()
OVER(
PARTITION BY REFNBR
ORDER BY DocDate) AS RNO
FROM #TAB) A