use preceding in calculation sql

use preceding in calculation sql - sql

I need to calculate the column E using column B,C,D & previous row of E... I have the sample statement and calculation for reference. Note that prev(E) is the preceding value of E which I need to use in calculation but am unable to.
+---------------------------------------------------------------------------------------------------------------------------------------+
| TransactionDt | total_allotment(B) | invchange(C) | roomssold_flag(D) | available(E) | samplestatement | calculation |
+---------------------------------------------------------------------------------------------------------------------------------------+
| 1/1/16 | 5 | 0 | null | 5 | E=case when D=null then B | 5 |
| 1/2/16 | 5 | 0 | 1 | 4 | E=case when C=0 then prev(E)-D | E=(5-1) |
| 1/3/16 | 5 | 0 | 0 | 4 | E=case when C=0 then prev(E)-D | E=(4-0) |
| 1/4/16 | 6 | 1 | 1 | 5 | E=case when C=1 then B-D | E=(6-1) |
| 1/5/16 | 6 | 0 | 0 | 5 | E=case when C=0 then prev(E)-D | E=(5-0) |
| 1/6/16 | 7 | 1 | 1 | 6 | E=case when C=1 then B-D | E=(7-1) |
+---------------------------------------------------------------------------------------------------------------------------------------+

You can use first_value() function with preceding clause to get privious value:
set dateformat dmy;
declare #t table (TransactionDt smalldatetime, b int, c int, d int, e int);
insert into #t (TransactionDt, b, c, d, e) values
(cast('01.01.2016' as date), 5, 0, null, 5),
(cast('02.01.2016' as date), 5, 0, 1, 4),
(cast('03.01.2016' as date), 5, 0, 0, 4),
(cast('04.01.2016' as date), 6, 1, 1, 5),
(cast('05.01.2016' as date), 6, 0, 0, 5),
(cast('06.01.2016' as date), 7, 1, 1, 6);
select
t.*
,first_value(t.e) over(order by t.TransactionDt asc rows 1 preceding) [prevE]
,case t.c
when 0 then
first_value(t.e)
over(order by t.TransactionDt asc rows 1 preceding)
- t.d
when 1 then
t.b - t.d
end [calculation]
from
#t t
order by
t.TransactionDt
;
Tested on MS SQL 2012.
I'm not big fan of Teradata, but this should work:
select
t.e
,sum(t.e)
over(order by t.TransactionDt asc rows between 1 preceding and 1 preceding) ePrev
,case t.c
when 0 then
sum(t.e)
over(order by t.TransactionDt asc rows between 1 preceding and 1 preceding)
- t.d
when 1 then
t.b - t.d
end calculation
from
(
select cast('01.01.2016' as date format 'dd.mm.yyyy') TransactionDt, 5 b, 0 c, null d, 5 e from (select 1 x) x
union all
select cast('02.01.2016' as date format 'dd.mm.yyyy') TransactionDt, 5 b, 0 c, 1 d, 4 e from (select 1 x) x
union all
select cast('03.01.2016' as date format 'dd.mm.yyyy'), 5, 0, 0, 4 from (select 1 x) x
union all
select cast('04.01.2016' as date format 'dd.mm.yyyy'), 6, 1, 1, 5 from (select 1 x) x
union all
select cast('05.01.2016' as date format 'dd.mm.yyyy'), 6, 0, 0, 5 from (select 1 x) x
union all
select cast('06.01.2016' as date format 'dd.mm.yyyy'), 7, 1, 1, 6 from (select 1 x) x
) t
order by
t.TransactionDt
;

When you need to restart the calculation whenever invchange=1 you have to create a group for partitioning using
sum(invchange)
over (order by TransactionDt
rows unbounded preceding) as grp
invchange seems to be based on a previous row query, so you need to nest it't calculation in a Dervied Table.
Now you it's the total_allotment value minus a Cumulative Sum over roomssold_flag:
select t.*,
b - sum(coalesce(D,0))
over (partition by grp
order by TransactionDt
rows unbounded preceding)
from
(
select TransactionDt,b,c,d,
sum(c) over (order by TransactionDt rows unbounded preceding) as grp
from t
) as t
Btw, using a 0/1 flag to get dynamic partitioning is similar to RESET WHEN

Related

Get Count for Each Column values

Input
Create Table #t1 (CaseId Int, NewValue char(2),Attribute char(2),TimeStamp datetime)
insert into #t1 values
(1, 'A', 'X' , '2020-01-01 13:01'),
(1, 'Au', 'WB' , '2020-01-01 13:02'),
(1 , 'C' , 'P' , '2020-01-01 13:03'),
(1 , 'Ma', 'WB' , '2020-01-01 13:04'),
(1 , 'C' , 'D', '2020-01-01 13:05'),
(1, 'D' , 'E', '2020-01-01 13:04'),
(2 , 'M' , 'P' , '2020-05-01 15:20'),
(2 , 'X' , 'WB' , '2020-05-01 15:26'),
(2 , 'Y' , 'WB', '2020-05-01 15:29'),
(2 , 'X' , 'P' , '2020-05-01 15:31')
I need output like the following.
CaseId NewValue Attribute TimeStamp NewColumn NewColumn Count
1 A X 01:00.0 NULL NULL 0
1 Au WB 02:00.0 Au-WB Au-WB 2
1 C P 03:00.0 Au-WB Au-WB 2
1 Ma WB 04:00.0 Ma-WB Ma-WB 3
1 C D 05:00.0 Ma-WB Ma-WB 3
1 D E 04:00.0 Ma-WB Ma-WB 3
2 M P 20:00.0 NULL NULL 0
2 X WB 26:00.0 X -WB X -WB 1
2 Y WB 29:00.0 Y -WB Y -WB 2
2 X P 31:00.0 Y -WB Y -WB 2
Squirrel helped to get everything minus count. The query is as follows. Does anyone know how to get that count?
select *, wb.NewColumn
from #t1 t
outer apply
(
select top 1 x.NewValue + '-' + x.Attibute as NewColumn
from #t1 x
where x.CaseId = t.CaseId
and x.TimeStamp <= t.TimeStamp
and x.Attibute = 'WB'
order by x.TimeStamp desc
) wb

This looks like a gaps-and-island problem, where a new island starts everytime a record with Attribute 'WB' is encountered.
If so, here is one way to solve it using window functions:
select
caseId,
newValue,
attribute,
timeStamp,
case when grp > 0
then first_value(newValue) over(partition by caseId, grp order by timeStamp)
+ '-'
+ first_value(attribute) over(partition by caseId, grp order by timeStamp)
end newValue,
case when grp > 0
then count(*) over(partition by caseId, grp)
else 0
end cnt
from (
select
t.*,
sum(case when attribute = 'WB' then 1 else 0 end)
over(partition by caseId order by timeStamp) grp
from #t1 t
) t
order by caseId, timeStamp
The inner query does a window sum() to define the groups: everytime attribute 'WB' is met for a given caseId, a new group starts. Then, the outer query uses first_value() to recover the first value in the group, and performs a window count() to compute the number of records per group. This is wrapped in conditional logic so the additional columns are not filled before the first 'WB' atribute is met.
Demo on DB Fiddle:
caseId | newValue | attribute | timeStamp | newValue | cnt
-----: | :------- | :-------- | :---------------------- | :------- | --:
1 | A | X | 2020-01-01 13:01:00.000 | null | 0
1 | Au | WB | 2020-01-01 13:02:00.000 | Au-WB | 2
1 | C | P | 2020-01-01 13:03:00.000 | Au-WB | 2
1 | Ma | WB | 2020-01-01 13:04:00.000 | Ma-WB | 3
1 | D | E | 2020-01-01 13:04:00.000 | Ma-WB | 3
1 | C | D | 2020-01-01 13:05:00.000 | Ma-WB | 3
2 | M | P | 2020-05-01 15:20:00.000 | null | 0
2 | X | WB | 2020-05-01 15:26:00.000 | X -WB | 1
2 | Y | WB | 2020-05-01 15:29:00.000 | Y -WB | 2
2 | X | P | 2020-05-01 15:31:00.000 | Y -WB | 2

Using your query output, create a cte and perform the count using the windowing function by partition on caseid,newcolumn as follows
with data
as (
select *, wb.NewColumn
from #t1 t
outer apply
(
select top 1 x.NewValue + '-' + x.Attibute as NewColumn
from #t1 x
where x.CaseId = t.CaseId
and x.TimeStamp <= t.TimeStamp
and x.Attibute = 'WB'
order by x.TimeStamp desc
) wb
)
select *,count(*) over(partition by caseid,newcolumn) as cnt
from data

How to repeat values in a table in SQL Server?

I have a table in Microsoft SQL Server that logged some values on data change triggers. Now, in order to display some graphs, I would like to get (or repeat) a value per 10 minutes from each column(for example).
I would try to avoid, if possible, an INSERT command modifying the table itself.
Original table:
Time Stamp---- | A | B | C |
---------------+---+---+---+
01-01-19 10:20 | 1 | 0 | 0 |
01-01-19 15:30 | 0 | 0 | 1 |
01-01-19 22:50 | 0 | 1 | 0 |
02-01-19 01:40 | 1 | 0 | 0 |
...
Result I would like to achieve:
Time Stamp---- | A | B | C |
---------------+---+---+---+
01-01-19 10:20 | 1 | 0 | 0 |
01-01-19 10:30 | 1 | 0 | 0 |
01-01-19 10:40 | 1 | 0 | 0 |
01-01-19 10:50 | 1 | 0 | 0 |
...
01-01-19 15:30 | 0 | 0 | 1 |
01-01-19 15:40 | 0 | 0 | 1 |
01-01-19 15:50 | 0 | 0 | 1 |
01-01-19 16:00 | 0 | 0 | 1 |
...

Assuming your dates are mm-dd-yy and times are hh:mm...
create table #Original (
[Time Stamp----] datetime2,
A int,
B int,
C int
)
insert #Original
values ({ts '2019-01-01 10:20:00.000'}, 1, 0, 0)
, ({ts '2019-01-01 15:30:00.000'}, 0, 0, 1)
, ({ts '2019-01-01 22:50:00.000'}, 0, 1, 0)
, ({ts '2019-01-02 01:40:00.000'}, 1, 0, 0)
;
with
boundaries as (
select min(o.[Time Stamp----]) as s
, dateadd(minute, 10, max(o.[Time Stamp----])) as e
from #Original o
),
timeslist as (
select 1 as i
, (select s from boundaries) as s
, (select s from boundaries) as d
union all
select t.i + 1
, t.s
, dateadd(minute, 10, d)
from timeslist t
where d < (select e from boundaries)
),
result as (
select
right('0' + cast(MONTH(t.d) as varchar(2)), 2) + '-' +
right('0' + cast(DAY(t.d) as varchar(2)), 2) + '-' +
right('0' + cast(year(t.d) % 100 as varchar(2)), 2) + ' ' +
right('0' + cast(datepart(hour, t.d) as varchar(2)), 2) + ':' +
right('0' + cast(datepart(minute, t.d) as varchar(2)), 2) as 'Time Stamp----'
, o2.A
, o2.B
, o2.C
from timeslist t
inner join (
select o.[Time Stamp----]
, o.A
, o.B
, o.C
, lead (o.[Time Stamp----], 1, dateadd(minute, 10, o.[Time Stamp----])) over (order by o.[Time Stamp----]) as OldTs
from #Original o
) o2 on o2.[Time Stamp----] <= t.d and o2.OldTs > t.d
)
select *
from result
order by [Time Stamp----]
drop table #Original

To select records with manufactured duplicates, try
SELECT Dateadd(mi, DQ.T,TimeStamp) as 'TimeStamp', A, B, C From YourTable
CROSS JOIN (Select 0 T UNION ALL
Select 10 T UNION ALL
Select 20 T UNION ALL
Select 30 T) DQ
or to insert duplicates, try
INSERT YourTable
SELECT Dateadd(mi, DQ.T,TimeStamp) as 'TimeStamp', A, B, C From YourTable
CROSS JOIN (
Select 10 T UNION ALL
Select 20 T UNION ALL
Select 30 T) DQ

Personally I recommend maling a "Time Table", but i do this on the fly here using a Tally. Anyway, I think this is what you're after?
USE Sandbox;
GO
CREATE TABLE dbo.YourTable ([timestamp] datetime2(0), --This is a bad name for a column, as timestamp means soemthing else in SQL Server
A bit,
B bit,
C bit);
INSERT INTO dbo.YourTable ([timestamp],
A,
B,
C)
VALUES ('2019-01-01T10:20:00',1,0,0),
('2019-01-01T15:30:00',0,0,1),
('2019-01-01T22:50:00',0,1,0),
('2019-01-02T01:40:00',1,0,0);
GO
WITH N AS
(SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP(144) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3),
Times AS(
SELECT DATEADD(MINUTE,T.I * 10,CONVERT(time(0),'00:00:00')) AS TimeSlot
FROM Tally T),
DateTimes AS(
SELECT DISTINCT
CONVERT(datetime,CONVERT(date,YT.[timestamp])) + CONVERT(datetime,T.TimeSlot) AS DateTimeSlot
FROM dbo.YourTable YT
CROSS JOIN Times T),
Groups AS(
SELECT DT.DateTimeSlot,
CONVERT(tinyint,YT.A) AS A, --Can't aggregate Bits
CONVERT(tinyint,YT.B) AS B,
CONVERT(tinyint,YT.C) AS C,
COUNT(YT.A) OVER (ORDER BY DT.DateTimeSlot ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM DateTimes DT
LEFT JOIN dbo.YourTable YT ON DT.DateTimeSlot = YT.[timestamp])
SELECT G.DateTimeSlot,
MAX(G.A) OVER (PARTITION BY G.Grp) AS A,
MAX(G.B) OVER (PARTITION BY G.Grp) AS B,
MAX(G.C) OVER (PARTITION BY G.Grp) AS C
FROM Groups G
ORDER BY G.DateTimeSlot;
GO
DROP TABLE dbo.YourTable;

You can use SQL RECURSION and CROSS JOIN
SQL FIDDLE
Demo
declare #mytable as table(timestamp datetime,A int,B int,C int)
insert into #mytable values
('01-01-19 10:20',1,0,0),('01-01-19 15:30',0,0,1),
('01-01-19 22:50',0,1,0),('01-01-19 01:40',1,0,0)
;with cte as(
select 0 n
union all
select n+10 from cte where n+10 <40)
select dateadd(mi,n,timestamp)[TIMESTAMP],t1.A,t1.B,T1.C
from #mytable t1 cross join cte
order by timestamp

Do calculations in bigquery within row and median - is this possible?

My problem is getting a raw set of sensorydata, that needs some processing before I can use it. loading the data to client and do the processing is pretty slow, so looking for possibilty to unload this logic to bigquery.
Imagine I have some constants for a set of sensors. They can change, but I have them when I want to do the query
A: 1, B: 2, C: 3, D: 2, E: 1, F: 2
Sensors are connected, I know what sensors are connected to each other. It has a meaning below.
A: BC
D: EF
This is a table with measurements per timestamp per sensor. Imagine thousands of rows.
TS A | B | C | D | E | F
01 10 | 20 | 10 | 10 | 15 | 10
02 11 | 10 | 20 | 20 | 10 | 10
03 12 | 20 | 10 | 10 | 12 | 11
04 13 | 10 | 10 | 20 | 15 | 15
05 11 | 20 | 10 | 15 | 14 | 14
06 10 | 20 | 10 | 10 | 15 | 12
I want to query ts 01 to ts 06 (in real it can be 1000's of rows again). I don't want it to return this raw data, but have it do some calculations:
First, for each row, i need to detract the constants, so row 01 would look like:
01 9 | 18 | 17 | 8 | 14 | 8
Then, BC need to have A detracted, and EF to have D detracted:
01 9 | 9 | 8 | 8 | 6 | 0
Last step, when I have all rows, I want to return rows, where each sensor has the median value of the proceding X rows for this sensor. So
TS A | B |
01 10 | 1 |
02 11 | 2 |
03 12 | 2 |
04 13 | 1 |
05 11 | 2 |
06 10 | 3 |
07 10 | 4 |
08 11 | 2 |
09 12 | 2 |
10 13 | 10 |
11 11 | 20 |
12 10 | 20 |
returns (for X is 4)
TS A | B |
//first 3 needed for median for 4th value
04 11.5 | etc | //median 10, 11, 12, 13
05 11.5 | etc | //median 11, 12, 13, 11
06 11.5 | etc | //median 12, 13, 11, 10
07 etc | etc |
08 etc | etc |
09 etc | etc |
10 etc | etc |
11 etc | etc |
12 etc | etc |
Getting the data to my server and do the calc is very slow, I am really wondering if we can get these amounts of data in bigQuery, so I am able to get a quick calculated set with my own settings of choice!
I do this in Node.js... but in BigQuery SQL.. i am lost.

Below is for BigQuery Standard SQL
If you would look for AVG values - this would be as "simple" as below
#standardSQL
WITH constants AS (
SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
SELECT ts,
a - val_a AS a,
b - val_b - a + val_a AS b,
c - val_c - a + val_a AS c,
d - val_d AS d,
e - val_e - d + val_d AS e,
f - val_f - d + val_d AS f
FROM `project.dataset.measurements`, constants
)
SELECT ts,
AVG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,
AVG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,
AVG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,
AVG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,
AVG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,
AVG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f
FROM temp
For MEDIAN you need to add a little extras - like in below example
#standardSQL
WITH constants AS (
SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
SELECT ts,
a - val_a AS a,
b - val_b - a + val_a AS b,
c - val_c - a + val_a AS c,
d - val_d AS d,
e - val_e - d + val_d AS e,
f - val_f - d + val_d AS f
FROM `project.dataset.measurements`, constants
)
SELECT ts,
(SELECT PERCENTILE_CONT(a, 0.5) OVER() FROM UNNEST(a) a LIMIT 1) a,
(SELECT PERCENTILE_CONT(b, 0.5) OVER() FROM UNNEST(b) b LIMIT 1) b,
(SELECT PERCENTILE_CONT(c, 0.5) OVER() FROM UNNEST(c) c LIMIT 1) c,
(SELECT PERCENTILE_CONT(d, 0.5) OVER() FROM UNNEST(d) d LIMIT 1) d,
(SELECT PERCENTILE_CONT(e, 0.5) OVER() FROM UNNEST(e) e LIMIT 1) e,
(SELECT PERCENTILE_CONT(f, 0.5) OVER() FROM UNNEST(f) f LIMIT 1) f
FROM (
SELECT ts,
ARRAY_AGG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,
ARRAY_AGG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,
ARRAY_AGG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,
ARRAY_AGG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,
ARRAY_AGG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,
ARRAY_AGG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f
FROM temp
)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.measurements` AS (
SELECT 01 ts, 10 a, 20 b, 20 c, 10 d, 15 e, 10 f UNION ALL
SELECT 02, 11, 10, 20, 20, 10, 10 UNION ALL
SELECT 03, 12, 20, 10, 10, 12, 11 UNION ALL
SELECT 04, 13, 10, 10, 20, 15, 15 UNION ALL
SELECT 05, 11, 20, 10, 15, 14, 14 UNION ALL
SELECT 06, 10, 20, 10, 10, 15, 12
), constants AS (
SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
SELECT ts,
a - val_a AS a,
b - val_b - a + val_a AS b,
c - val_c - a + val_a AS c,
d - val_d AS d,
e - val_e - d + val_d AS e,
f - val_f - d + val_d AS f
FROM `project.dataset.measurements`, constants
)
SELECT ts,
(SELECT PERCENTILE_CONT(a, 0.5) OVER() FROM UNNEST(a) a LIMIT 1) a,
(SELECT PERCENTILE_CONT(b, 0.5) OVER() FROM UNNEST(b) b LIMIT 1) b,
(SELECT PERCENTILE_CONT(c, 0.5) OVER() FROM UNNEST(c) c LIMIT 1) c,
(SELECT PERCENTILE_CONT(d, 0.5) OVER() FROM UNNEST(d) d LIMIT 1) d,
(SELECT PERCENTILE_CONT(e, 0.5) OVER() FROM UNNEST(e) e LIMIT 1) e,
(SELECT PERCENTILE_CONT(f, 0.5) OVER() FROM UNNEST(f) f LIMIT 1) f
FROM (
SELECT ts,
ARRAY_AGG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,
ARRAY_AGG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,
ARRAY_AGG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,
ARRAY_AGG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,
ARRAY_AGG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,
ARRAY_AGG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f
FROM temp
)
-- ORDER BY ts
with result
Row ts a b c d e f
1 1 null null null null null null
2 2 9.0 9.0 8.0 8.0 6.0 0.0
3 3 9.5 3.5 7.5 13.0 -1.5 -5.0
4 4 10.0 7.0 7.0 8.0 3.0 0.0
5 5 10.5 2.5 1.5 13.0 -0.5 -2.5
6 6 10.5 2.5 -3.5 15.5 -2.0 -3.0

Cumulative count over dates, discarding some values

I have an Answers table with the following schema:
CREATE TABLE Answers
([id] int, [analyst_id] int, [date] date);
I have to "accumulate-count" how many answers an analyst has per month, discarding any answers given before a period of 3 months after the last answer. Given the following:
INSERT INTO Answers
([id], [analyst_id], [date])
VALUES
(1, 1, '2017/01/01'),
(2, 1, '2017/02/01'), -- should be discarded
(3, 1, '2017/03/01'), -- should be discarded
(4, 1, '2017/05/01'),
(5, 1, '2017/06/01'), -- should be discarded
(6, 1, '2017/07/01'), -- should be discarded
(7, 1, '2017/08/01'),
(8, 2, '2017/01/01'),
(9, 2, '2017/04/01'),
(10, 1, '2018/02/01'),
(11, 2, '2018/03/01');
The expected result is:
analyst_id | month-year | count
-------------------------------
1 | 01/2017 | 1
1 | 02/2017 | 1
1 | 03/2017 | 1
1 | 04/2017 | 1
1 | 05/2017 | 2
1 | 06/2017 | 2
1 | 07/2017 | 2
1 | 08/2017 | 3
1 | 09/2017 | 3
1 | 10/2017 | 3
1 | 11/2017 | 3
1 | 12/2017 | 3
2 | 01/2017 | 1
2 | 02/2017 | 1
2 | 03/2017 | 1
2 | 04/2017 | 2
2 | 05/2017 | 2
2 | 06/2017 | 2
2 | 07/2017 | 2
2 | 08/2017 | 2
2 | 09/2017 | 2
2 | 10/2017 | 2
2 | 11/2017 | 2
2 | 12/2017 | 2
1 | 01/2018 | 0
1 | 02/2018 | 1
1 | 03/2018 | 1
2 | 01/2018 | 0
2 | 02/2018 | 0
2 | 03/2018 | 1
DBMS is a SQL Server 2012.
EDIT
I wrote this fiddle with my current half-solution: http://sqlfiddle.com/#!6/c2e82e/5
Each year, the count need to be reset.

EDIT:
OK, for the updated question, you essentially need to make a "dates" table (here a CTE called "D") that contains all the dates between the minimum and maximum date in your Answers table. Then you can essentially left join your results to that and use a DENSE_RANK() window function to determine the count.
DECLARE #Answers TABLE (ID INT, Analyst_ID INT, [Date] DATE);
INSERT #Answers (ID, Analyst_ID, [Date])
VALUES
(1, 1, '2017/01/01'),
(2, 1, '2017/02/01'),
(3, 1, '2017/03/01'),
(4, 1, '2017/05/01'),
(5, 1, '2017/06/01'),
(6, 1, '2017/07/01'),
(7, 1, '2017/08/01'),
(8, 2, '2017/01/01'),
(9, 2, '2017/04/01'),
(10, 1, '2018/02/01'),
(11, 2, '2018/03/01');
WITH CTE AS
(
SELECT A.Analyst_ID, [Date] = MIN(A.[Date])
FROM #Answers AS A
GROUP BY A.Analyst_ID
UNION ALL
SELECT A.Analyst_ID, A.[Date]
FROM
(
SELECT A.Analyst_ID, A.[Date], RN = ROW_NUMBER() OVER (PARTITION BY A.Analyst_ID ORDER BY A.ID)
FROM #Answers AS A
JOIN CTE
ON CTE.Analyst_ID = A.Analyst_ID
AND DATEADD(MONTH, 3, CTE.[Date]) <= A.[Date]
) AS A
WHERE A.RN = 1
),
D AS -- List of dates between minimum and maximum date in table for each analyst ID.
(
SELECT [Date] = DATEADD(MONTH, RN, (SELECT MIN([Date]) FROM #Answers)),
A.Analyst_ID
FROM (SELECT RN = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 FROM sys.objects) AS O
CROSS JOIN (SELECT DISTINCT Analyst_ID FROM #Answers) AS A
WHERE RN <= (SELECT DATEDIFF(MONTH, MIN([Date]), MAX([Date])) FROM #Answers)
)
SELECT D.Analyst_ID,
[Month-Year] = FORMAT(D.[Date], 'MM/yyyy'),
[Count] = CASE WHEN A.[Date] IS NULL THEN 0 ELSE DENSE_RANK() OVER (PARTITION BY D.Analyst_ID, DATEPART(YEAR, A.[Date]) ORDER BY A.[Date]) END
FROM D
OUTER APPLY (SELECT TOP 1 * FROM CTE WHERE CTE.[Date] <= D.[Date] AND DATEDIFF(YEAR, CTE.[Date], D.[Date]) = 0 AND CTE.Analyst_ID = D.Analyst_ID ORDER BY CTE.[Date] DESC) AS A
ORDER BY D.Analyst_ID, D.[Date];

SQL Server 2008 Cumulative Sum that resets value

I want to have the last column cumulative based on ROW_ID that resets every time it starts again with '1'.
Initially my table doesn't have the ROW_ID, this was created using partition so at least I can segregate my records.
It should add the Amt + CumulativeSum (except for the first record) all the way down and reset every time the Row_ID = 1.
I have tried several queries but it doesn't give me the desired result. I am trying to read answers from several forums but to no avail.
Can someone advise the best approach to do this?
For the sake of representation, I made the sample table as straightforward as possible.
ID ROW-ID Amt RunningTotal(Amt)
1 1 2 2
2 2 4 6
3 3 6 12
4 1 2 2
5 2 4 6
6 3 6 12
7 4 8 20
8 5 10 30
9 1 2 2
10 2 4 6
11 3 6 12
12 4 8 20

try this
declare #tb table(ID int, [ROW-ID] int, Amt money)
insert into #tb(ID, [ROW-ID], Amt) values
(1,1,2),
(2,2,4),
(3,3,6),
(4,1,2),
(5,2,4),
(7,4,8),
(8,5,10),
(9,1,2),
(10,2,4),
(11,3,6),
(12,4,8)
select *,sum(amt) over(partition by ([id]-[row-id]) order by id,[row-id]) AS cum from #tb
other version
select *,(select sum(amt) from #tb t where
(t.id-t.[row-id])=(t1.id-t1.[ROW-ID]) and (t.id<=t1.id) ) as cum
from #tb t1 order by t1.id,t1.[row-id]

Try this
SELECT distinct (T1.ID),
T1.ROW_ID,
T1.Amt,
CumulativeSum =
CASE
WHEN T1.RoW_ID=1 THEN T1.Amt
ELSE T1.Amt+ T2.Amt
END
FROM TestSum T1, TestSum T2
WHERE T1.ID = T2.ID+1
http://sqlfiddle.com/#!6/8b2a2/2

The idea is to create partitions from R column. First leave 1 if R = 1, else put 0. Then cumulative sum on that column. When you have partitions you can finally calculate cumulative sums on S column in those partitions:
--- --- ---
| 1 | | 1 | | 1 |
| 2 | | 0 | | 1 | --prev 1 + 0
| 3 | | 0 | | 1 | --prev 1 + 0
| 1 | | 1 | | 2 | --prev 1 + 1
| 2 | => | 0 | => | 2 | --prev 2 + 0
| 3 | | 0 | | 2 | --prev 2 + 0
| 4 | | 0 | | 2 | --prev 2 + 0
| 5 | | 0 | | 2 | --prev 2 + 0
| 1 | | 1 | | 3 | --prev 2 + 1
| 2 | | 0 | | 3 | --prev 3 + 0
--- --- ---
DECLARE #t TABLE ( ID INT, R INT, S INT )
INSERT INTO #t
VALUES ( 1, 1, 2 ),
( 2, 2, 4 ),
( 3, 3, 6 ),
( 4, 1, 2 ),
( 5, 2, 4 ),
( 6, 3, 6 ),
( 7, 4, 8 ),
( 8, 5, 10 ),
( 9, 1, 2 ),
( 10, 2, 4 ),
( 11, 3, 6 ),
( 12, 4, 8 );
For MSSQL 2008:
WITH cte1
AS ( SELECT ID ,
CASE WHEN R = 1 THEN 1
ELSE 0
END AS R ,
S
FROM #t
),
cte2
AS ( SELECT ID ,
( SELECT SUM(R)
FROM cte1 ci
WHERE ci.ID <= co.ID
) AS R ,
S
FROM cte1 co
)
SELECT * ,
( SELECT SUM(S)
FROM cte2 ci
WHERE ci.R = co.R
AND ci.ID <= co.ID
)
FROM cte2 co
For MSSQL 2012:
WITH cte
AS ( SELECT ID ,
SUM(CASE WHEN R = 1 THEN 1
ELSE 0
END) OVER ( ORDER BY ID ) AS R ,
S
FROM #t
)
SELECT * ,
SUM(s) OVER ( PARTITION BY R ORDER BY ID ) AS T
FROM cte
Output:
ID R S T
1 1 2 2
2 1 4 6
3 1 6 12
4 2 2 2
5 2 4 6
6 2 6 12
7 2 8 20
8 2 10 30
9 3 2 2
10 3 4 6
11 3 6 12
12 3 8 20
EDIT:
One more way. This looks way better by execution plan then first example:
SELECT * ,
CASE WHEN R = 1 THEN S
ELSE ( SELECT SUM(S)
FROM #t it
WHERE it.ID <= ot.ID
AND it.ID >= ( SELECT MAX(ID)
FROM #t iit
WHERE iit.ID < ot.ID
AND iit.R = 1
)
)
END
FROM #t ot

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

use preceding in calculation sql - sql

Related

Get Count for Each Column values

How to repeat values in a table in SQL Server?

Do calculations in bigquery within row and median - is this possible?

Cumulative count over dates, discarding some values

SQL Server 2008 Cumulative Sum that resets value

Categories

Resources