I have a table with 3 columns (Year, Month, Value) like this in Sql Server :
Year
Month
Value
ValueOfLastTwelveMonths
2021
1
30
30
2021
2
24
54 (30 + 24)
2021
5
26
80 (54+26)
2021
11
12
92 (80+12)
2022
1
25
87 (SUM of values from 1 2022 TO 2 2021)
2022
2
40
103 (SUM of values from 2 2022 TO 3 2021)
2022
4
20
123 (SUM of values from 4 2022 TO 5 2021)
I need a SQL request to calculate ValueOfLastTwelveMonths.
SELECT Year,
Month,
Value,
SUM (Value) OVER (PARTITION BY Year, Month)
FROM MyTable
This is much easier if you have a row for each month and year, and then (if needed) you can filter the NULL rows out. The reason it's easier is because then you know how many rows you need to look back at: 11.
If you make a dataset of the years and months, you can then LEFT JOIN to your data, aggregate, and then finally filter the data out:
SELECT *
INTO dbo.YourTable
FROM (VALUES(2021,1,30),
(2021,2,24),
(2021,5,26),
(2021,11,12),
(2022,1,25),
(2022,2,40),
(2022,4,20))V(Year,Month,Value);
GO
WITH YearMonth AS(
SELECT YT.Year,
V.Month
FROM (SELECT DISTINCT Year
FROM dbo.YourTable) YT
CROSS APPLY (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12))V(Month)),
RunningTotal AS(
SELECT YM.Year,
YM.Month,
YT.Value,
SUM(YT.Value) OVER (ORDER BY YM.Year, YM.Month
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS Last12Months
FROM YearMonth YM
LEFT JOIN dbo.YourTable YT ON YM.Year = YT.Year
AND YM.Month = YT.Month)
SELECT Year,
Month,
Value,
Last12Months
FROM RunningTotal
WHERE Value IS NOT NULL;
GO
DROP TABLE dbo.YourTable;
Related
I have a table of records like this:
Item
From
To
A
2018-01-03
2018-03-16
B
2021-05-25
2021-11-10
The output of select should look like:
Item
Month
Year
A
01
2018
A
02
2018
A
03
2018
B
05
2021
B
06
2021
B
07
2021
B
08
2021
Also the range should not exceed the current month. In example above we are asuming current day is 2021-08-01.
I am trying to do something similar to THIS with CONNECT BY LEVEL but as soon as I also select my table next to dual and try to order the records the selection never completes. I also have to join few other tables to the selection but I don't think that would make a difference.
I would very much appreciate your help.
Row generator it is, but not as you did it; most probably you're missing lines #11 - 16 in my query (or their alternative).
SQL> with test (item, date_from, date_to) as
2 -- sample data
3 (select 'A', date '2018-01-03', date '2018-03-16' from dual union all
4 select 'B', date '2021-05-25', date '2021-11-10' from dual
5 )
6 -- query that returns desired result
7 select item,
8 extract(month from (add_months(date_from, column_value - 1))) month,
9 extract(year from (add_months(date_from, column_value - 1))) year
10 from test cross join
11 table(cast(multiset
12 (select level
13 from dual
14 connect by level <=
15 months_between(trunc(least(sysdate, date_to), 'mm'), trunc(date_from, 'mm')) + 1
16 ) as sys.odcinumberlist))
17 order by item, year, month;
ITEM MONTH YEAR
----- ---------- ----------
A 1 2018
A 2 2018
A 3 2018
B 5 2021
B 6 2021
B 7 2021
B 8 2021
7 rows selected.
SQL>
Recursive CTEs are the standard SQL approach to this type of problem. In Oracle, this looks like:
with cte(item, fromd, tod) as (
select item, fromd, tod
from t
union all
select item, add_months(fromd, 1), tod
from cte
where add_months(fromd, 1) < last_day(tod)
)
select item, extract(year from fromd) as year, extract(month from fromd) as month
from cte
order by item, fromd;
Here is a db<>fiddle.
input:
item loc month year qty
A DEL 5 2020 12
A DEL 6 2020 14
A DEL 8 2020 16
A DEL 9 2020 17
output:
item loc month year qty
A DEL 5 2020 12
A DEL 6 2020 14
A DEL 7 2020 26
A DEL 8 2020 16
A DEL 9 2020 17
A DEL 10 2020 33
description:
I don't have month 7 in my input. So for calculating month 7 i do sum of previous two months quantity.
for example for month 7 output will be 12(from month 5)+14(from month 6)=26
So its like whenever any month will be missing i should fill that month with this logic.
I have written a script which is two step process but it only considers missing month between the values and not boundary values i.e. it wont assume 10 is missing as it is a boundary value.
1st Step: Insert the misisng month with NULL for all other columns.
INSERT INTO TEST_MISSING(MONTH)
select min_a - 1 + level
from ( select min(MONTH) min_a
, max(MONTH) max_a
from TEST_MISSING
)
connect by level <= max_a - min_a + 1
minus
select MONTH
from TEST_MISSING;
2nd Step: Populate the values of other columns using lag with values from rows about it.
and then using Window function calculate the quantity value.
SELECT NVL(ITEM, NEW_ITEM) ITEM,
NVL(LOC, NEW_LOC) LOC,
MONTH, NVL(YEAR, NEW_YEAR) YEAR,
CASE WHEN QTY IS NULL THEN SUM(NVL(QTY, 0)) OVER(PARTITION BY NEW_ITEM ORDER BY MONTH ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) ELSE QTY END AS QTY
FROM (
SELECT A.*,
nvl(item,CASE WHEN ITEM IS NULL THEN (LAG(ITEM) OVER(ORDER BY MONTH)) END) NEW_ITEM,
nvl(LOC,CASE WHEN LOC IS NULL THEN (LAG(LOC) OVER(ORDER BY MONTH)) END) NEW_LOC,
nvl(YEAR,CASE WHEN YEAR IS NULL THEN (LAG(YEAR) OVER(ORDER BY MONTH)) END) NEW_YEAR
FROM TEST_MISSING A)
X
ORDER BY MONTH;
I have a dataset with different clients, and their sales count. Over time, some clients get added and deleted from the data. How do I make sure that when I look at the sales counts, that I am only using a selection of the clients that were in the data set all the time? Ie if I have a client that doesn't have a record for 2018-03, then I don't want that client to be part of the entire query. If a clients does not have a record in 2020-03, then I also do not want this client to be part of the entire query.
For example, the following query:
select DATE_PART (y, sold_date)as year, DATE_PART (mm, sold_date) as month, count(distinct(client))
from sales_data
where sold_date > '2018-01-01'
group by year, month
order by year,month
Yields
year month count
2018 1 78
2018 2 83
2018 3 80
2018 4 83
2018 5 84
2018 6 81
2018 7 83
2018 8 90
2018 9 89
2018 10 95
2018 11 94
2018 12 97
2019 1 102
2019 2 103
2019 3 102
2019 4 105
2019 5 103
2019 6 104
2019 7 104
2019 8 106
2019 9 106
2019 10 108
2019 11 109
2019 12 104
2020 1 104
2020 2 102
2020 3 103
2020 4 98
2020 5 97
2020 6 79
So I want to only use the clients that are in all months, they should not be more than 78, because there can not be more users than the minimal month (2018-1).
FYI, I am using Amazon Redshift here but I am OK with a query that's rdbms agnostic or works for SQL-Server/Oracle/MySQL/PostgreSQL, I am just interested in a pattern on how to solve this issue effectively.
If I'm understanding what you want correctly, and if this is just a one-off query, you could use a correlated subquery in the where clause:
SELECT
DATE_PART(y, s.sold_date) AS year,
DATE_PART(mm, s.sold_date) AS month,
COUNT(DISTINCT s.client)
FROM
sales_data AS s
WHERE
EXISTS (
SELECT sd.client FROM sales_data AS sd WHERE DATE_PART(y,
sd.sold_date) = 2018 AND DATE_PART(mm, sd.sold_date) = 1 AND
sd.client = s.client
) AND
s.sold_date > '2018-01-01'
GROUP BY
year,
month
ORDER
DATE_PART(y, s.sold_date),
DATE_PART(mm, s.sold_date)
presence in all months can be done with 2-step aggregation:
group sales data by customer ID having all months
group sales data joined to (1) by year, month
like this (=12 can be a dynamic expression, depending on the amount of history you have)
with
stable_customers as (
select customer_id
from sales_data
group by 1
having count(distinct date_trunc('month' from sold_date)=12
)
select
DATE_PART (y, sold_date) as year
,DATE_PART (mm, sold_date) as month,
,count(1)
from sales_date
join stable_customers
using (customer_id)
where sold_date > '2018-01-01'
group by year, month
order by year,month
Use window functions. Unfortunately, SQL Server does not support count(distinct) as a window function. Fortunately, there is a simple work-around using dense_rank():
select year, month, count(distinct client)
from (select sd.*, year, month,
(dense_rank() over (order by year, month) +
dense_rank() over (order by year desc, month desc)
) as num_months,
(dense_rank() over (partition by client order by year, month) +
dense_rank() over (partition by client order by year desc, month desc)
) as num_months_client
from sales_data sd cross apply
(values (year(sold_date), month(sold_date))) v(year, month)
where sd.sold_date > '2018-01-01'
) sd
where num_months_client = num_months
group by year, month
order by year, month;
Note: This looks at all months that are in the data. If all clients are missing 2019-03, then that months is not considered at all.
I have this table in SQL Server:
Year Month Quantity
----------------------------
2015 January 10
2015 February 20
2015 March 30
2014 November 40
2014 August 50
How can I identify the different years and months adding two more columns that group the same years with a number and then different months in sequential way like the example
Year Month Quantity Group Subgroup
------------------------------------------------
2015 January 10 1 1
2015 February 20 1 2
2015 March 30 1 3
2014 November 40 2 1
2014 August 50 2 2
You can use DENSE_RANK to calculate the groups for you:
SELECT t1.*, DENSE_RANK() OVER (ORDER BY Year DESC) AS [Group],
DENSE_RANK() OVER (PARTITION BY Year ORDER BY DATEPART(month, Month + ' 01 2010')) AS [SubGroup]
FROM t1
ORDER BY 4, 5
See this fiddle.
To associate group and subgroup with a number you can do this:
WITH RankedTable AS (
SELECT year, month, quantity,
ROW_NUMBER() OVER (partition by year order by Month) AS rn
FROM yourtable)
SELECT year, month, quantity,
SUM (CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (ORDER BY YEAR) as year_group,
rn AS subgroup
FROM RankedTable
Here ROW_NUMBER() OVER clause calculates rank of a month within a year.
And SUM() ... OVER calculates running SUM for the months with rank 1.
SQL Fiddle
I need to do Rollover sum with grouping in SQL Server with values repeating for empty fortnights
I need to sum of sales within same year for the later fortnights.
Data is like shown below
Year fortnight sales
2011 1 10
2011 2 5
2011 5 30
2012 4 30
2012 5 2
I would need the output like below
Year | fortnight | salestillnowThisYear
2011 1 10
2011 2 15 (which is 10 + 5 from previous fortnight within same year)
2011 3 15 same as previous fortnight
2011 4 15 same as previous fortnight
2011 5 45 ( 10 + 5 + 30 from previous quaters of this year)
2011 6 45 same as previous fortnight
2011 7 45 same as previous fortnight
......and so on for each fortnight till last fortnight
2011 24 45 last fortnight data
2012 4 30 note this is the fortnight which first has value
2012 5 32 (which is 10 + 5 from previous fortnight within same year)
2012 6 32 last fortnight data
2012 7 32 last fortnight data
......and so on for each fortnight till last fortnight
2012 24 32 last fortnight data
Below is the available solution with me.
1 .Create temp table (rpt_fortnight) with 24 rows to represent 24 fortnights
join the data with that rpt_fortnight table to get data for all fortnights as (step2result)
SELECT YEar, fMaste.FortNightNumber as FortNightNumber,
sales as sales
FROM data AS FNI
inner join
rpt_fortNight as fMaste
on fMaste.FortnightNumber >= FNi.FortNightNumber --<--extrapolated data for next fortnights
3.combined data for fortnights from previous fortnights if available
SELECT year,
FortNightNumber,
Sum(sales) as sales
FROM step2result
group by year,FortNightNumber
we shall get data for all next fortnights with sum of previous fortnights in step 3
I wanted to check if there was any better solution
Try this. Use Recursive CTE to generate years with 1 to 24 fortnight.
CREATE TABLE #test
(
years INT,
fortnight INT,
sal INT
)
INSERT #test
VALUES (2011,1,10),
(2011,2,5),
(2011,5,30),
(2012,4,30),
(2012,5,2);
WITH cte
AS (SELECT DISTINCT 1 AS fortnight,
years
FROM #test
UNION ALL
SELECT fortnight + 1,
years
FROM cte
WHERE fortnight < 24),
CTE1
AS (SELECT a.years,
a.fortnight,
Isnull(sal, 0) SAL
FROM cte A
LEFT JOIN #test B
ON a.fortnight = b.fortnight
AND a.years = b.years)
SELECT *,
Sum(sal)
OVER(
partition BY years
ORDER BY fortnight rows UNBOUNDED PRECEDING) AS Running_Total
FROM CTE1 A
ORDER BY years,
fortnight