Related
I have the following data in a SQL table:
+------------------------------------+
| ID YEARS START_DATE |
+------------------------------------+
| ----------- ----------- ---------- |
| 1 5 2020-12-01 |
| 2 8 2020-12-01 |
+------------------------------------+
Trying to create a SQL that would expand the above data and give me a start and end date for each year depending on YEARS and START_DATE from above table. Sample output below:
+-----------------------------------------------+
| ID YEAR DATE_START DATE_END |
+-----------------------------------------------+
| ----------- ----------- ---------- ---------- |
| 1 1 2020-12-01 2021-11-30 |
| 1 2 2021-12-01 2022-11-30 |
| 1 3 2022-12-01 2023-11-30 |
| 1 4 2023-12-01 2024-11-30 |
| 1 5 2024-12-01 2025-11-30 |
| 2 1 2020-12-01 2021-11-30 |
| 2 2 2021-12-01 2022-11-30 |
| 2 3 2022-12-01 2023-11-30 |
| 2 4 2023-12-01 2024-11-30 |
| 2 5 2024-12-01 2025-11-30 |
| 2 6 2025-12-01 2026-11-30 |
| 2 7 2026-12-01 2027-11-30 |
| 2 8 2027-12-01 2028-11-30 |
+-----------------------------------------------+
I would use an inline tally for this, as they are Far faster than a recursive CTE solution. Assuming you have low values for Years:
WITH YourTable AS(
SELECT *
FROM (VALUES(1,5,CONVERT(date,'20201201')),
(2,8,CONVERT(date,'20201201')))V(ID,Years, StartDate))
SELECT ID,
V.I + 1 AS [Year],
DATEADD(YEAR, V.I, YT.StartDate) AS StartDate,
DATEADD(DAY, -1, DATEADD(YEAR, V.I+1, YT.StartDate)) AS EndDate
FROM YourTable YT
JOIN (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I) ON YT.Years > V.I;
If you have more than 10~ years you can use either create a tally table, or create an large one inline in a CTE. This would start as:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I --remove the -1 if you don't want to start from 0
FROM N N1, N N2) --100 rows, add more Ns for more rows
...
Of course, I doubt you have 1,000 of years of data.
You can use a recursive CTE:
with cte as (
select id, 1 as year, start_date,
dateadd(day, -1, dateadd(year, 1, start_date)) as end_date,
years as num_years
from t
union all
select id, year + 1, dateadd(year, 1, start_date),
dateadd(day, -1, dateadd(year, 1, start_date)) as end_date,
num_years
from cte
where year < num_years
)
select id, year, start_date, end_date
from cte;
Here is a db<>fiddle.
In a query, you can use the following:
DATEADD(YEAR, 1, DATE_START) - 1
to add this to the table you can just create the extra column, and set it equal to the value of the above, e.g.
UPDATE MyTable
SET DATE_END = DATEADD(YEAR, 1, DATE_START) - 1
If you are working with sql server, then you can try to use operator CROSS APPLY with master.dbo.spt_values table to get list of numbers and generate dates:
select ID,T.number+1 as YEAR,
--generate date_start using T.number
dateadd(year,T.number,START_DATE)date_start,
--generate end_date: adding 1 year to start date
dateadd(dd,-1,dateadd(year,1,dateadd(year,T.number,START_DATE)))date_end
from Table
cross apply
master.dbo.spt_values T
where T.type='P' and T.number<YEARS
I have data like this:
id | start_date | end_date
----------------------------
1 | 16-09-2019 | 22-12-2019
I want to get the following results:
id | month | year | days
------------------------
1 | 09 | 2019 | 15
1 | 10 | 2019 | 31
1 | 11 | 2019 | 30
1 | 12 | 2019 | 22
Is there a way to get that result ?
This is what you want to do:
SELECT id, EXTRACT(MONTH FROM start_date ) as month , EXTRACT(YEAR FROM start_date ) as year , DATEDIFF(end_date, start_date ) as days
From tbl
You can use MONTH() , YEAR() and DATEDIFF() functions
SELECT id, MONTH(start_date) as month, YEAR(start_date ) as year, DATEDIFF(end_date, start_date ) as days from table-name
One way is to create a Calendar table and use that.
select month,year, count(*)
from Calendar
where db_date between '2019-09-16'
and '2019-12-22'
group by month,year
CHECK DEMO HERE
Also you can use recursive CTE to achieve the same.
You can use a recursive CTE and aggregation:
with recursive cte as (
select id, start_date, end_date
from t
union all
select id, start_date + interval 1 day, end_date
from cte
where start_date < end_date
)
select id, year(start_date), month(start_date), count(*) as days
from cte
group by id, year(start_date), month(start_date);
Here is a db<>fiddle.
My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL
I have a view like this:
Year | Month | Week | Category | Value |
2017 | 1 | 1 | A | 1
2017 | 1 | 1 | B | 2
2017 | 1 | 1 | C | 3
2017 | 1 | 2 | A | 4
2017 | 1 | 2 | B | 5
2017 | 1 | 2 | C | 6
2017 | 1 | 3 | A | 7
2017 | 1 | 3 | B | 8
2017 | 1 | 3 | C | 9
2017 | 1 | 4 | A | 10
2017 | 1 | 4 | B | 11
2017 | 1 | 4 | C | 12
2017 | 2 | 5 | A | 1
2017 | 2 | 5 | B | 2
2017 | 2 | 5 | C | 3
2017 | 2 | 6 | A | 4
2017 | 2 | 6 | B | 5
2017 | 2 | 6 | C | 6
2017 | 2 | 7 | A | 7
2017 | 2 | 7 | B | 8
2017 | 2 | 7 | C | 9
2017 | 2 | 8 | A | 10
2017 | 2 | 8 | B | 11
2017 | 2 | 8 | C | 12
And I need to make a new view which needs to show average of value column (let's call it avg_val) and the value from the max week of the month (max_val_of_month). Ex: max week of january is 4, so the value of category A is 10. Or something like this to be clear:
Year | Month | Category | avg_val | max_val_of_month
2017 | 1 | A | 5.5 | 10
2017 | 1 | B | 6.5 | 11
2017 | 1 | C | 7.5 | 12
2017 | 2 | A | 5.5 | 10
2017 | 2 | B | 6.5 | 11
2017 | 2 | C | 7.5 | 12
I have use window function, over partition by year, month, category to get the avg value. But how can I get the value of the max week of each month?
Assuming that you need a month average and a value for the max week not the max value per month
SELECT year, month, category, avg_val, value max_week_val
FROM (
SELECT *,
AVG(value) OVER (PARTITION BY year, month, category) avg_val,
ROW_NUMBER() OVER (PARTITION BY year, month, category ORDER BY week DESC) rn
FROM view1
) q
WHERE rn = 1
ORDER BY year, month, category
or more verbose version without window functions
SELECT q.year, q.month, q.category, q.avg_val, v.value max_week_val
FROM (
SELECT year, month, category, avg(value) avg_val, MAX(week) max_week
FROM view1
GROUP BY year, month, category
) q JOIN view1 v
ON q.year = v.year
AND q.month = v.month
AND q.category = v.category
AND q.max_week = v.week
ORDER BY year, month, category
Here is a dbfiddle demo for both queries
And here is my NEW version.
My thanks to #peterm for pointing me about the prior false value of val_from_max_week_of_month. So, I corrected it:
SELECT
a.Year,
a.Month,
a.Category,
max(a.Week) AS max_week,
AVG(a.Value) AS avg_val,
(
SELECT b.Value
FROM decades AS b
WHERE
b.Year = a.Year AND
b.Month = a.Month AND
b.Week = max(a.Week) AND
b.Category = a.Category
) AS val_from_max_week_of_month
FROM decades AS a
GROUP BY
a.Year,
a.Month,
a.Category
;
The new results:
First, you might need to check, how do you handle the first week in January. If 1st of January are not a Monday, there are several interpretations & not every one of them will fit the solutions here. You'll either need to use:
the ISO week concept, ie. the week column should hold the ISO week & the year column should hold the ISO year (week-year, rather). Note: in this concept, 1st of January actually sometimes belongs to the previous year
use your own concept, where the first week of the year is "split" into two if 1st of January is not a Monday.
Note: the solutions below will not work if (in your table) the first week of January can be 52 or 53.
Given that avg_val is just a simple aggregation, while max_val_of_month can be calculated with typical greatest-n-per-group queries. It has a lot of possible solutions in PostgreSQL, with varying performance. Fortunately, your query will naturally have an easily determined selectivity: you'll always need (approx.) a quarter of your data.
Usual winners (in performance) are:
(These are not surprise though, as these 2 should perform more and more as you need more portion of the original data.)
array_agg() with order by variant:
select year, month, category, avg(value) avg_val,
(array_agg(value order by week desc))[1] max_val_of_month
from table_name
group by year, month, category;
distinct on variant:
select distinct on (year, month, category) year, month, category,
avg(value) over (partition by year, month, category) avg_val,
value max_val_of_month
from table_name
order by year, month, category, week desc;
The pure window function variant is not that bad either:
row_number() variant:
select year, month, category, avg_val, max_val_of_month
from (select year, month, category, value max_val_of_month,
avg(value) over (partition by year, month, category) avg_val,
row_number() over (partition by year, month, category order by week desc) rn
from table_name) w
where rn = 1;
But the LATERAL variant is only viable with an index:
LATERAL variant:
create index idx_table_name_year_month_category_week_desc
on table_name(year, month, category, week desc);
select year, month, category,
avg(value) avg_val,
max_val_of_month
from table_name t
cross join lateral (select value max_val_of_month
from table_name
where (year, month, category) = (t.year, t.month, t.category)
order by week desc
limit 1) m
group by year, month, category, max_val_of_month;
But most of the solutions above can actually utilize this index, not just this last one.
Without the index: http://rextester.com/WNEL86809
With the index: http://rextester.com/TYUA52054
with data (yr, mnth, wk, cat, val) as
(
-- begin test data
select 2017 , 1 , 1 , 'A' , 1 from dual union all
select 2017 , 1 , 1 , 'B' , 2 from dual union all
select 2017 , 1 , 1 , 'C' , 3 from dual union all
select 2017 , 1 , 2 , 'A' , 4 from dual union all
select 2017 , 1 , 2 , 'B' , 5 from dual union all
select 2017 , 1 , 2 , 'C' , 6 from dual union all
select 2017 , 1 , 3 , 'A' , 7 from dual union all
select 2017 , 1 , 3 , 'B' , 8 from dual union all
select 2017 , 1 , 3 , 'C' , 9 from dual union all
select 2017 , 1 , 4 , 'A' , 10 from dual union all
select 2017 , 1 , 4 , 'B' , 11 from dual union all
select 2017 , 1 , 4 , 'C' , 12 from dual union all
select 2017 , 2 , 5 , 'A' , 1 from dual union all
select 2017 , 2 , 5 , 'B' , 2 from dual union all
select 2017 , 2 , 5 , 'C' , 3 from dual union all
select 2017 , 2 , 6 , 'A' , 4 from dual union all
select 2017 , 2 , 6 , 'B' , 5 from dual union all
select 2017 , 2 , 6 , 'C' , 6 from dual union all
select 2017 , 2 , 7 , 'A' , 7 from dual union all
select 2017 , 2 , 8 , 'A' , 10 from dual union all
select 2017 , 2 , 8 , 'B' , 11 from dual union all
select 2017 , 2 , 7 , 'B' , 8 from dual union all
select 2017 , 2 , 7 , 'C' , 9 from dual union all
select 2018 , 2 , 7 , 'C' , 9 from dual union all
select 2017 , 2 , 8 , 'C' , 12 from dual
-- end test data
)
select * from
(
select
-- data.*: all columns of the data table
data.*,
-- avrg: partition by a combination of year,month and category to work out -
-- the avg for each category in a month of a year
avg(val) over (partition by yr, mnth, cat) avrg,
-- mwk: partition by year and month to work out -
-- the max week of a month in a year
max(wk) over (partition by yr, mnth) mwk
from
data
)
-- as OP's interest is in the max week of each month of a year, -
-- "wk" column value is matched against
-- the derived column "mwk"
where wk = mwk
order by yr,mnth,cat;
I am trying to do a running total for some data, and have seen the easy way to do it. However, I have already grouped some data and this is throwing off my code. I currently have dates and payment types, and the totals that it relates to.
What I have at the moment is:
create table #testdata
(
mdate date,
pmttype varchar(64),
totalpmtamt int
)
insert into #testdata
select getdate()-7, 'DD', 10
union
select getdate() -7, 'SO', 12
union
select getdate()-6, 'DD', 3
union
select getdate()-5, 'DD', 13
union
select getdate()-5, 'SO', 23
union
select getdate()-5, 'PO', 8
What I want to have is:
mdate | paymenttype | totalpmtamt | incrtotal
2016-08-29 | DD | 10 | 10
2016-08-29 | SO | 12 | 22
2016-08-30 | DD | 3 | 25
2016-08-31 | DD | 13 | 38
2016-08-31 | SO | 8 | 46
2016-08-31 | PO | 23 | 69
I've tried adapting other code I've found here into:
select t1.mdate,
t1.pmttype,
t1.totalpmtamt,
SUM(t2.totalpmtamt) as runningsum
from #testdata t1
join #testdata t2 on t1.mdate >= t2.mdate and t1.pmttype >= t2.pmttype
group by t1.mdate, t1.pmttype, t1.totalpmtamt
order by t1.mdate
but all I get is
mdate | paymenttype | totalpmtamt | incrtotal
2016-08-29 | DD | 10 | 10
2016-08-29 | SO | 12 | 22
2016-08-30 | DD | 3 | 13
2016-08-31 | DD | 13 | 26
2016-08-31 | SO | 8 | 34
2016-08-31 | PO | 23 | 69
Can anyone help please?
The ANSI standard way of doing a cumulative sum is:
select t.*, sum(totalpmtamt) over (order by mdate) as runningsum
from #testdata t
order by t.mdate;
Not all databases support this functionality.
If your database doesn't support that functionality, I would go for a correlated subquery:
select t.*,
(select sum(t2.totalpmtamt)
from #testdata t2
where t2.mdate <= t.mdate
) as runningsum
from #testdata
order by t.mdate;
Use the below query for the desired result (for SQL Server).
with cte_1
as
(SELECT *,ROW_NUMBER() OVER(order by mdate ) RNO
FROM #testdata)
SELECT mdate,pmttype,totalpmtamt,(select sum(c2.totalpmtamt)
from cte_1 c2
where c2.RNO <= c1.RNO
) as incrtotal
FROM cte_1 c1
Output :
Sounds like SQL Server.
DECLARE #testdata TABLE
(
mdate DATE ,
pmttype VARCHAR(64) ,
totalpmtamt INT
);
INSERT INTO #testdata
( mdate, pmttype, totalpmtamt )
VALUES ( GETDATE() - 7, 'DD', 10 ),
( GETDATE() - 7, 'SO', 12 ),
( GETDATE() - 6, 'DD', 3 ),
( GETDATE() - 5, 'DD', 13 ),
( GETDATE() - 5, 'SO', 23 ),
( GETDATE() - 5, 'PO', 8 );
SELECT *,
SUM(totalpmtamt) OVER ( ORDER BY mdate ROWS UNBOUNDED PRECEDING )
AS RunningTotal
FROM #testdata t;