SQL sum by month with the previous values

SQL sum by month with the previous values - sql

I have following data:
cohort activity counter
-----------------------------
2010-12 0 470
2010-12 1 2
2010-12 2 1
2010-12 3 1
2010-12 6 1
2011-01 0 550
2011-01 1 1
2011-01 6 1
I want to sum counter of different activities by month, so the final table looks like:
cohort activity counter sumResult
-------------------------------------------
2010-12 0 470 470
2010-12 1 2 472
2010-12 2 1 473
2010-12 3 1 474
2010-12 6 1 475
2011-01 0 550 550
2011-01 1 1 551
2011-01 6 1 552
I've tried to do it like this:
select
a.activity, a.counter, a.cohort,
(
select sum(b.counter)
from data_table as b
where b.cohort = a.cohort and b.counter >= a.counter
) as sumResult
from data_table as a;
GO;
but it gave me strange results as:
cohort activity counter sumResult
-------------------------------------------
2010-12 0 470 470
2010-12 1 2 472
2010-12 2 1 475
2010-12 3 1 475
2010-12 6 1 475
2011-01 0 550 550
2011-01 1 1 552
2011-01 6 1 552
What could be a problem?

Depends on your RDBMS , some(SQL Server,Oracle,Postgresql) of them will accept SUM() OVER() :
SELECT t.*,
SUM(t.counter) OVER(PARTITION BY t.cohort ORDER BY t.activity) as sumResult
FROM YourTable t
If it's another, that's a bit more complicated and can be dealt with JOINS

The normal way to do this uses the ANSI standard cumulative sum function:
select dt.*,
sum(dt.counter) over (partition by dt.cohort order by dt.counter desc)
from data_table dt
order by cohort, counter desc;
If you want to use a subquery, the you need a stable sort, and activity can give you one. You can use this in the cumulative sum syntax:
select dt.*,
sum(dt.counter) over (partition by dt.cohort order by dt.counter desc, dt.activity)
from data_table dt
order by cohort, counter desc, activity;
Or using a subquery:
select dt.*,
(select sum(dt2.counter)
from data_table dt2
where dt2.cohort = dt.cohort and
(dt2.counter > dt.counter or
dt2.counter = dt.counter and dt2.activity < dt.activity)
)
from data_table dt
order by cohort, counter desc, activity;

Related

MSSQL - Running sum with reset after gap

I have been trying to solve a problem for a few days now, but I just can't get it solved. Hence my question today.
I would like to calculate the running sum in the following table. My result so far looks like this:
PersonID
Visit_date
Medication_intake
Previous_date
Date_diff
Running_sum
1
2012-04-26
1
1
2012-11-16
1
2012-04-26
204
204
1
2013-04-11
0
1
2013-07-19
1
1
2013-12-05
1
2013-07-19
139
343
1
2014-03-18
1
2013-12-05
103
585
1
2014-06-24
0
2
2014-12-01
1
2
2015-03-09
1
2014-12-01
98
98
2
2015-09-28
0
This is my desired result. So only the running sum over contiguous blocks (Medication_intake=1) should be calculated.
PersonID
Visit_date
Medication_intake
Previous_date
Date_diff
Running_sum
1
2012-04-26
1
1
2012-11-16
1
2012-04-26
204
204
1
2013-04-11
0
1
2013-07-19
1
1
2013-12-05
1
2013-07-19
139
139
1
2014-03-18
1
2013-12-05
103
242
1
2014-06-24
0
2
2014-12-01
1
2
2015-03-09
1
2014-12-01
98
98
2
2015-09-28
0
I work with Microsoft SQL Server 2019 Express.
Thank you very much for your tips!

This is a gaps and islands problem, and one approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PersonID
ORDER BY Visit_date) rn1,
ROW_NUMBER() OVER (PARTITION BY PersonId, Medication_intake
ORDER BY Visit_date) rn2
FROM yourTable
)
SELECT PersonID, Visit_date, Medication_intake, Previous_date, Date_diff,
CASE WHEN Date_diff IS NOT NULL AND Medication_intake = 1
THEN SUM(Date_diff) OVER (PARTITION BY PersonID, rn1 - rn2
ORDER BY Visit_date) END AS Running_sum
FROM cte
ORDER BY PersonID, Visit_date;
Demo
The CASE expression in the outer query computes the rolling sum for date diff along islands of records having a medication intake value of 1. For other records, or for records where date diff be null, the value generated is simply null.

Joins and/or Sub queries or Ranking functions

I have a table as follows:
Order_ID
Ship_num
Item_code
Qty_to_pick
Qty_picked
Pick_date
1111
1
1
3000
0
Null
1111
1
2
2995
1965
2021-05-12
1111
2
1
3000
3000
2021-06-24
1111
2
2
1030
0
Null
1111
3
2
1030
1030
2021-08-23
2222
1
3
270
62
2021-03-18
2222
1
4
432
0
Null
2222
2
3
208
0
Null
2222
2
4
432
200
2021-05-21
2222
3
3
208
208
2021-08-23
2222
3
4
232
200
2021-08-25
From this table,
I only want to show the rows that has the latest ship_num information, not the latest pick_date information (I was directed to a question like this that needed to return the rows with the latest entry time, I am not looking for that) for an order i.e., I want it as follows
Order_ID
Ship_num
Item_code
Qty_to_pick
Qty_picked
Pick_date
1111
3
2
1030
1030
2021-08-23
2222
3
3
208
208
2021-08-23
2222
3
4
232
200
2021-08-25
I tried the following query,
select order_id, max(ship_num), item_code, qty_to_pick, qty_picked, pick_date
from table1
group by order_id, item_code, qty_to_pick, qty_picked, pick_date
Any help would be appreciated.
Thanks in advance.

Using max(ship_num) is a good idea, but you should use the analytic version (with an OVER clause).
select *
from
(
select t.*, max(ship_num) over (partition by order_id) as orders_max_ship_num
from table1 t1
) with_max
where ship_num = orders_max_ship_num
order by order_id, item_code;

You can get this using the DENSE_RANK().
Query
;with cte as (
select rnk = dense_rank()
over (Partition by order_id order by ship_num desc)
, *
from table_name
)
Select *
from cte
Where rnk =1;

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.

You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25

You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

List the last two records for each id

Good Afternoon!
I'm having trouble list the last two records each idmicro
Ex:
idhist idmicro idother room unit Dtmov
100 1102 0 8 coa 2009-10-23 10:40:00.000
101 1102 0 1 coa 2009-10-28 10:40:00.000
102 1102 0 2 dib 2008-10-24 10:40:00.000
103 1201 0 6 diraf 2008-10-23 10:40:00.000
104 1201 0 7 diraf 2009-10-21 10:40:00.000
105 1201 0 4 dimel 2008-10-22 10:40:00.000
Would look like this:
ex:
result
idhist idmicro idoutros room unit Dtmov
101 1102 0 1 coa 2009-10-28 10:40:00.000
102 1102 0 2 dib 2008-10-24 10:40:00.000
103 1201 0 6 diraf 2008-10-22 10:40:00.000
104 1201 0 7 diraf 2009-10-21 10:40:00.000
I'm starting to delve into SQL and am having trouble finding this solution
Sorry
Thank you.
EDIT: I am using SQL server, and I made no query.
Yes! is based on the date and time

You can do the same thing with an imbricated SELECT statement.
SELECT *
FROM (
SELECT row_number() OVER (
PARTITION BY idmicro ORDER BY idhist
) AS ind
,*
FROM data
) AS initialResultSet
WHERE initialResultSet.ind < 3
Here is a sample SQLFiddle with how this query works.

WITH etc
AS (
SELECT *
,row_number() OVER (
PARTITION BY idmicro ORDER BY idhist
) AS r
,count() OVER (
PARTITION BY idmicro ORDER BY idhist
) cfrom TABLE
)
SELECT *
FROM etc
WHERE r > c - 2

Use row_number and over partition
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY idmicro ORDER BY idhist desc) AS rownum
FROM data
) AS initialResultSet
WHERE initialResultSet.rownum<=2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL sum by month with the previous values - sql

Depends on your RDBMS , some(SQL Server,Oracle,Postgresql) of them will accept SUM() OVER() : SELECT t.*, SUM(t.counter) OVER(PARTITION BY t.cohort ORDER BY t.activity) as sumResult FROM YourTable t If it's another, that's a bit more complicated and can be dealt with JOINS

Related

MSSQL - Running sum with reset after gap

Joins and/or Sub queries or Ranking functions

How to get latest records based on two columns of max

Computing rolling average and standard deviation by dates

List the last two records for each id

Categories

Resources