SQL sum by month with the previous values - sql

I have following data:
cohort activity counter
-----------------------------
2010-12 0 470
2010-12 1 2
2010-12 2 1
2010-12 3 1
2010-12 6 1
2011-01 0 550
2011-01 1 1
2011-01 6 1
I want to sum counter of different activities by month, so the final table looks like:
cohort activity counter sumResult
-------------------------------------------
2010-12 0 470 470
2010-12 1 2 472
2010-12 2 1 473
2010-12 3 1 474
2010-12 6 1 475
2011-01 0 550 550
2011-01 1 1 551
2011-01 6 1 552
I've tried to do it like this:
select
a.activity, a.counter, a.cohort,
(
select sum(b.counter)
from data_table as b
where b.cohort = a.cohort and b.counter >= a.counter
) as sumResult
from data_table as a;
GO;
but it gave me strange results as:
cohort activity counter sumResult
-------------------------------------------
2010-12 0 470 470
2010-12 1 2 472
2010-12 2 1 475
2010-12 3 1 475
2010-12 6 1 475
2011-01 0 550 550
2011-01 1 1 552
2011-01 6 1 552
What could be a problem?

Depends on your RDBMS , some(SQL Server,Oracle,Postgresql) of them will accept SUM() OVER() :
SELECT t.*,
SUM(t.counter) OVER(PARTITION BY t.cohort ORDER BY t.activity) as sumResult
FROM YourTable t
If it's another, that's a bit more complicated and can be dealt with JOINS

The normal way to do this uses the ANSI standard cumulative sum function:
select dt.*,
sum(dt.counter) over (partition by dt.cohort order by dt.counter desc)
from data_table dt
order by cohort, counter desc;
If you want to use a subquery, the you need a stable sort, and activity can give you one. You can use this in the cumulative sum syntax:
select dt.*,
sum(dt.counter) over (partition by dt.cohort order by dt.counter desc, dt.activity)
from data_table dt
order by cohort, counter desc, activity;
Or using a subquery:
select dt.*,
(select sum(dt2.counter)
from data_table dt2
where dt2.cohort = dt.cohort and
(dt2.counter > dt.counter or
dt2.counter = dt.counter and dt2.activity < dt.activity)
)
from data_table dt
order by cohort, counter desc, activity;

Related

MSSQL - Running sum with reset after gap

I have been trying to solve a problem for a few days now, but I just can't get it solved. Hence my question today.
I would like to calculate the running sum in the following table. My result so far looks like this:
PersonID
Visit_date
Medication_intake
Previous_date
Date_diff
Running_sum
1
2012-04-26
1
1
2012-11-16
1
2012-04-26
204
204
1
2013-04-11
0
1
2013-07-19
1
1
2013-12-05
1
2013-07-19
139
343
1
2014-03-18
1
2013-12-05
103
585
1
2014-06-24
0
2
2014-12-01
1
2
2015-03-09
1
2014-12-01
98
98
2
2015-09-28
0
This is my desired result. So only the running sum over contiguous blocks (Medication_intake=1) should be calculated.
PersonID
Visit_date
Medication_intake
Previous_date
Date_diff
Running_sum
1
2012-04-26
1
1
2012-11-16
1
2012-04-26
204
204
1
2013-04-11
0
1
2013-07-19
1
1
2013-12-05
1
2013-07-19
139
139
1
2014-03-18
1
2013-12-05
103
242
1
2014-06-24
0
2
2014-12-01
1
2
2015-03-09
1
2014-12-01
98
98
2
2015-09-28
0
I work with Microsoft SQL Server 2019 Express.
Thank you very much for your tips!
This is a gaps and islands problem, and one approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PersonID
ORDER BY Visit_date) rn1,
ROW_NUMBER() OVER (PARTITION BY PersonId, Medication_intake
ORDER BY Visit_date) rn2
FROM yourTable
)
SELECT PersonID, Visit_date, Medication_intake, Previous_date, Date_diff,
CASE WHEN Date_diff IS NOT NULL AND Medication_intake = 1
THEN SUM(Date_diff) OVER (PARTITION BY PersonID, rn1 - rn2
ORDER BY Visit_date) END AS Running_sum
FROM cte
ORDER BY PersonID, Visit_date;
Demo
The CASE expression in the outer query computes the rolling sum for date diff along islands of records having a medication intake value of 1. For other records, or for records where date diff be null, the value generated is simply null.

Joins and/or Sub queries or Ranking functions

I have a table as follows:
Order_ID
Ship_num
Item_code
Qty_to_pick
Qty_picked
Pick_date
1111
1
1
3000
0
Null
1111
1
2
2995
1965
2021-05-12
1111
2
1
3000
3000
2021-06-24
1111
2
2
1030
0
Null
1111
3
2
1030
1030
2021-08-23
2222
1
3
270
62
2021-03-18
2222
1
4
432
0
Null
2222
2
3
208
0
Null
2222
2
4
432
200
2021-05-21
2222
3
3
208
208
2021-08-23
2222
3
4
232
200
2021-08-25
From this table,
I only want to show the rows that has the latest ship_num information, not the latest pick_date information (I was directed to a question like this that needed to return the rows with the latest entry time, I am not looking for that) for an order i.e., I want it as follows
Order_ID
Ship_num
Item_code
Qty_to_pick
Qty_picked
Pick_date
1111
3
2
1030
1030
2021-08-23
2222
3
3
208
208
2021-08-23
2222
3
4
232
200
2021-08-25
I tried the following query,
select order_id, max(ship_num), item_code, qty_to_pick, qty_picked, pick_date
from table1
group by order_id, item_code, qty_to_pick, qty_picked, pick_date
Any help would be appreciated.
Thanks in advance.
Using max(ship_num) is a good idea, but you should use the analytic version (with an OVER clause).
select *
from
(
select t.*, max(ship_num) over (partition by order_id) as orders_max_ship_num
from table1 t1
) with_max
where ship_num = orders_max_ship_num
order by order_id, item_code;
You can get this using the DENSE_RANK().
Query
;with cte as (
select rnk = dense_rank()
over (Partition by order_id order by ship_num desc)
, *
from table_name
)
Select *
from cte
Where rnk =1;

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.
You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25
You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

List the last two records for each id

Good Afternoon!
I'm having trouble list the last two records each idmicro
Ex:
idhist idmicro idother room unit Dtmov
100 1102 0 8 coa 2009-10-23 10:40:00.000
101 1102 0 1 coa 2009-10-28 10:40:00.000
102 1102 0 2 dib 2008-10-24 10:40:00.000
103 1201 0 6 diraf 2008-10-23 10:40:00.000
104 1201 0 7 diraf 2009-10-21 10:40:00.000
105 1201 0 4 dimel 2008-10-22 10:40:00.000
Would look like this:
ex:
result
idhist idmicro idoutros room unit Dtmov
101 1102 0 1 coa 2009-10-28 10:40:00.000
102 1102 0 2 dib 2008-10-24 10:40:00.000
103 1201 0 6 diraf 2008-10-22 10:40:00.000
104 1201 0 7 diraf 2009-10-21 10:40:00.000
I'm starting to delve into SQL and am having trouble finding this solution
Sorry
Thank you.
EDIT: I am using SQL server, and I made no query.
Yes! is based on the date and time
You can do the same thing with an imbricated SELECT statement.
SELECT *
FROM (
SELECT row_number() OVER (
PARTITION BY idmicro ORDER BY idhist
) AS ind
,*
FROM data
) AS initialResultSet
WHERE initialResultSet.ind < 3
Here is a sample SQLFiddle with how this query works.
WITH etc
AS (
SELECT *
,row_number() OVER (
PARTITION BY idmicro ORDER BY idhist
) AS r
,count() OVER (
PARTITION BY idmicro ORDER BY idhist
) cfrom TABLE
)
SELECT *
FROM etc
WHERE r > c - 2
Use row_number and over partition
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY idmicro ORDER BY idhist desc) AS rownum
FROM data
) AS initialResultSet
WHERE initialResultSet.rownum<=2