I have a table with the columns Age, Period and Year. The column Age always starts with 0 and doesn't have a fixed maximum value (I used 'Age' 0 to 30 in this example but the range could also be 0 to 100 etc.), the values Period and Year only appear in certain rows at certain ages.
However at what Age the values for Period and Year appear, changes and the solution should therefore be dynamic. What is the best way to fill in the NULL values with correct Period and Year?
I am using SQL Server.
Age Period Year
-----------------
0 NULL NULL
1 NULL NULL
2 NULL NULL
3 NULL NULL
4 NULL NULL
5 NULL NULL
6 NULL NULL
7 NULL NULL
8 NULL NULL
9 NULL NULL
10 NULL NULL
11 NULL NULL
12 NULL NULL
13 NULL NULL
14 NULL NULL
15 NULL NULL
16 NULL NULL
17 NULL NULL
18 NULL NULL
19 NULL NULL
20 NULL NULL
21 46 2065
22 NULL NULL
23 NULL NULL
24 NULL NULL
25 NULL NULL
26 51 2070
27 NULL NULL
28 NULL NULL
29 NULL NULL
30 NULL NULL
The result should look like this, the numbers for Period and Year should be increased and/or decrease from the last known values for Period and Year.
Age Period Year
-----------------
0 25 2044
1 26 2045
2 27 2046
3 28 2047
4 29 2048
5 30 2049
6 31 2050
7 32 2051
8 33 2052
9 34 2053
10 35 2054
11 36 2055
12 37 2056
13 38 2057
14 39 2058
15 40 2059
16 41 2060
17 42 2061
18 43 2062
19 44 2063
20 45 2064
21 46 2065
22 47 2066
23 48 2067
24 49 2068
25 50 2069
26 51 2070
27 52 2071
28 53 2072
29 54 2073
30 55 2074
Here is an UPDATE to my question as I didn't specify my requirement detailed enough:
The solution should be able to handle different combinations of Age, Period and Year. My start point will always be a known Age, Period and Year combination. However, the combination Age = 21, Period = 46 and Year = 2065 (or 26|51|2070 as the second combination) in my example is not static. The value at Age = 21 could be anything e.g. Period = 2 and Year = 2021. Whatever the combination (Age, Period, Year) is, the solution should fill in the gaps and finish the sequence counting up and down from the known values for Period and Year. If a Period value sequence becomes negative the solutions should return NULL values, if possible.
Seem you have always the same increment for age and year
so
select age, isnull(period,age +25) Period, isnull(year,age+44) year
from yourtable
or the standard function coalesce (as suggested by Gordon Linoff)
select age, coalesce(period,age +25) Period, coalesce(year,age+44) year
from yourtable
Tabel creation code
create table yourtable ( AGE int , Period int, Year int )
insert into yourtable
Select 0 AS AGE , null As Period , null As Year UNION all
Select 1 AS AGE , null As Period , null As Year UNION all
Select 2 AS AGE , null As Period , null As Year UNION all
Select 3 AS AGE , null As Period , null As Year UNION all
Select 4 AS AGE , null As Period , null As Year UNION all
Select 5 AS AGE , null As Period , null As Year UNION all
Select 6 AS AGE , null As Period , null As Year UNION all
Select 7 AS AGE , null As Period , null As Year UNION all
Select 8 AS AGE , null As Period , null As Year UNION all
Select 9 AS AGE , null As Period , null As Year UNION all
Select 10 AS AGE , null As Period , null As Year UNION all
Select 11 AS AGE , null As Period , null As Year UNION all
Select 12 AS AGE , null As Period , null As Year UNION all
Select 13 AS AGE , null As Period , null As Year UNION all
Select 14 AS AGE , null As Period , null As Year UNION all
Select 15 AS AGE , null As Period , null As Year UNION all
Select 16 AS AGE , null As Period , null As Year UNION all
Select 17 AS AGE , null As Period , null As Year UNION all
Select 18 AS AGE , null As Period , null As Year UNION all
Select 19 AS AGE , null As Period , null As Year UNION all
Select 20 AS AGE , null As Period , null As Year UNION all
Select 21 AS AGE ,46 As Period ,2065 As Year UNION all
Select 22 AS AGE , null As Period , null As Year UNION all
Select 23 AS AGE , null As Period , null As Year UNION all
Select 24 AS AGE , null As Period , null As Year UNION all
Select 25 AS AGE , 51 As Period ,2070 As Year UNION all
Select 26 AS AGE , null As Period , null As Year UNION all
Select 27 AS AGE , null As Period , null As Year UNION all
Select 28 AS AGE , null As Period , null As Year UNION all
Select 29 AS AGE , null As Period , null As Year UNION all
Select 30 AS AGE , null As Period , null As Year
**Steps **
We need to get one row with non null value for Period and year.
Using age get first value for both the column .
Now just add respective age column value and fill full table .
Code to fix the serial
;with tmp as
(select top 1 * from yourtable where Period is not null and year is not null)
update yourtable
set Period = (tmp.Period - tmp.age) + yourtable.age
, year = (tmp.year - tmp.age) + yourtable.age
from yourtable , tmp
OR
Declare #age int ,#Year int ,#Period int
select #age = age , #Year = year - (age +1) ,#Period = Period- (AGE +1)
from yourtable where Period is not null and year is not null
update yourtable
set Period =#Period + age
,Year =#year + age
from yourtable
You finally want three sequences with different start values. Then you simply need to calculate an offset and add it to age:
with cte as
(
select age
,max(period - age) over () + age as period -- adjusted period
,max(yr - age) over () + age as yr -- adjusted yr
from #yourtable
)
select age
-- If a Period value sequence becomes negative the solutions should return NULL
,case when period >0 then period end as period
,yr
from cte
See fiddle
-- hope you can manage the syntax error. but some logic like given below should work in this case where we can make period an origin to calculate other missing values. good luck!
declare #knownperiod int;
declare #knownperiodage int;
declare #agetop int;
declare #agebottom int;
#knownperiod = select top 1 period from table1 where period is not null
#knownperiodage = select top 1 age from table1 where period is not null
while(#knownperiodage >= 0)
begin
#knownperiod = #knownperiod -1 ;
#knownperiodage = #knownperiodage -1;
update table1 set period = #knownperiod, year = YEAR(GetDate())+#knownperiod-1 where age = #knownperiodage
end
-- now for bottom age
#knownperiod = select top 1 period from table1 where period is null or year is null
#knownperiodage = select top 1 age from table1 where period is null or year is null
while(#knownperiodage <= (Select max(age) from table1))
begin
#knownperiod = #knownperiod +1 ;
#knownperiodage = #knownperiodage +1;
update table1 set period = #knownperiod, year = YEAR(GetDate())+#knownperiod-1 where age = #knownperiodage
end
Is the process to first calculate the increments (age -> period and age -> year) then simply add those increments to the age values?
This assumes the differences between age and period, and age and year, are consistent across rows (just not filled in sometimes).
As such, you could use the following to first calculate the increments (PeriodInc, YrInc) and then select the values with the increments added (noting that if period goes negative, it gets NULL).
; WITH PeriodInc AS (SELECT TOP 1 Period - Age AS PeriodInc FROM #yourtable WHERE Period IS NOT NULL),
YrInc AS (SELECT TOP 1 Yr - Age AS YrInc FROM #yourtable WHERE Yr IS NOT NULL)
SELECT Age,
CASE WHEN (Age + PeriodInc) >= 0 THEN (Age + PeriodInc) ELSE NULL END AS Period,
Age + YrInc AS Yr
FROM #yourtable
CROSS JOIN PeriodInc
CROSS JOIN YrInc
Here is a DB_Fiddle with the code
This solution takes 4 inputs:
#list_length -- (integer) the number of rows to generate (up to 12^5=248,832)
#start_age -- (integer) beginning age
#start_period -- (integer) beginning period
#start_year -- (integer) beginning year
For any combination of inputs this code generates the requested output. If either the Age or Year is calculated to be negative then it is converted to NULL. The current limit to the list length could be increased to whatever is necessary. The technique of creating a row_number using cross applied rows is known to be very fast when generating large sequences. Above about 500 rows it's always faster than a recursion based CTE. At small row numbers there's little to no performance difference between the two techniques.
Here are the code and output to match the example data.
Inputs
declare
#list_length int=31,
#start_age int=21,
#start_period int=46,
#start_year int=2065;
Code
with
n(n) as (select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) v(n)),
tally_cte(n) as (
select row_number() over (order by (select null))
from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select p.Age,
case when p.[Period]<0 then null else p.[Period] end [Period],
case when p.[Year]<0 then null else p.[Year] end [Year]
from tally_cte t
cross apply
(select (t.n-1) [Age], (t.n-1)+(#start_period-#start_age) [Period],
(t.n-1)+(#start_year-#start_age) [Year]) p
where n<=#list_length;
Output
Age Period Year
0 25 2044
1 26 2045
2 27 2046
3 28 2047
4 29 2048
5 30 2049
6 31 2050
7 32 2051
8 33 2052
9 34 2053
10 35 2054
11 36 2055
12 37 2056
13 38 2057
14 39 2058
15 40 2059
16 41 2060
17 42 2061
18 43 2062
19 44 2063
20 45 2064
21 46 2065
22 47 2066
23 48 2067
24 49 2068
25 50 2069
26 51 2070
27 52 2071
28 53 2072
29 54 2073
30 55 2074
Suppose both the Period and the Year are less than the start Age. When the calculated values are negative the value is replaced with a NULL.
Inputs
declare
#list_length int=100,
#start_age int=10,
#start_period int=5,
#start_year int=8;
Output
Age Period Year
0 NULL NULL
1 NULL NULL
2 NULL 0
3 NULL 1
4 NULL 2
5 0 3
6 1 4
7 2 5
8 3 6
9 4 7
10 5 8
11 6 9
12 7 10
...
99 94 97
Imo this is a flexible and efficient way to meet all of the requirements. Please let me know if there are any issues.
This reads like a gaps-and-islands problem, where "empty" rows are the gaps and non-empty rows are the islands.
You want to fill the gaps. Your question is a bit tricky, because you do not clearly describe how to proceed when a gap row has both preceding and following islands - and what to do if they are not consistent.
Let me assume that you want to derive the value from the following island if there is one available, and fall back of the precedng island.
Here is an approach using lateral joins to retrieve the next and preceding non-empty row:
select t.age,
coalesce(t.period, n.period - n.diff, p.period - p.diff) period,
coalesce(t.year, n.year - n.diff, p.year - p.diff) year
from mytable t
outer apply (
select top (1) t1.*, t1.age - t.age diff
from mytable t1
where t1.age > t.age and t1.period is not null and t1.year is not null
order by t1.age
) n
outer apply (
select top (1) t1.*, t1.age - t.age diff
from mytable t1
where t1.age < t.age and t1.period is not null and t1.year is not null
order by t1.age desc
) p
order by t.age
Actually, this would probably be more efficiently performed with window functions. We can implement the very same logic by building groups of records with window counts, then doing the computation within the groups:
select
age,
coalesce(
period,
max(period) over(partition by grp2) - max(age) over(partition by grp2) + age,
max(period) over(partition by grp1) - min(age) over(partition by grp1) + age
) period,
coalesce(
year,
max(year) over(partition by grp2) - max(age) over(partition by grp2) + age,
max(year) over(partition by grp1) - min(age) over(partition by grp1) + age
) year
from (
select t.*,
count(period) over(order by age) grp1,
count(period) over(order by age desc) grp2
from mytable t
) t
order by age
Demo on DB Fiddle - both queries yield:
age | period | year
--: | -----: | ---:
0 | 25 | 2044
1 | 26 | 2045
2 | 27 | 2046
3 | 28 | 2047
4 | 29 | 2048
5 | 30 | 2049
6 | 31 | 2050
7 | 32 | 2051
8 | 33 | 2052
9 | 34 | 2053
10 | 35 | 2054
11 | 36 | 2055
12 | 37 | 2056
13 | 38 | 2057
14 | 39 | 2058
15 | 40 | 2059
16 | 41 | 2060
17 | 42 | 2061
18 | 43 | 2062
19 | 44 | 2063
20 | 45 | 2064
21 | 46 | 2065
22 | 47 | 2066
23 | 48 | 2067
24 | 49 | 2068
25 | 50 | 2069
26 | 51 | 2070
27 | 52 | 2071
28 | 53 | 2072
29 | 54 | 2073
30 | 55 | 2074
Also you can use recursive CTE (it can handle any variation of data in the table except only one that has no populated period and year at all):
WITH cte AS ( -- get any filled period and year
SELECT TOP 1 period - age delta,
[year]-period start_year
FROM tablename
WHERE period is not null and [year] is not null
), seq AS ( --get min and max age values
SELECT MIN(age) as min_age, MAX(age) as max_age
FROM tablename
), go_recursive AS (
SELECT min_age age,
min_age+delta period ,
start_year+min_age+delta year,
max_age
FROM seq
CROSS JOIN cte --That will generate the initial first row
UNION ALL
SELECT age + 1,
period +1,
year + 1,
max_age
FROM go_recursive
WHERE age < max_age --This part increments the data from first row
)
SELECT age,
period,
[year]
FROM go_recursive
OPTION (MAXRECURSION 0)
-- If you know there are some limit of rows in that kind of tables
--use this row count instead 0
Related
I want to count the distinct amount of users over the last 60 days, and then, count the distinct amount of users over the last 59 days, and so on and so forth.
Ideally, the output would look like this (TARGET OUTPUT)
Day Distinct Users
60 200
59 200
58 188
57 185
56 180
[...] [...]
where 60 days is the max total possible distinct users, and then 59 would have a little less and so on and so forth.
my query looks like this.
select
count(distinct (case when datediff(day,DATE,current_date) <= 60 then USER_ID end)) as day_60,
count(distinct (case when datediff(day,DATE,current_date) <= 59 then USER_ID end)) as day_59,
count(distinct (case when datediff(day,DATE,current_date) <= 58 then USER_ID end)) as day_58
FROM Table
The issue with my query is that This outputs the data by column instead of by rows (like shown below) AND, most importantly, I have to write out this logic 60x for each of the 60 days.
Current Output:
Day_60 Day_59 Day_58
209 207 207
Is it possible to write the SQL in a way that creates the target as shown initially above?
Using below data in CTE format -
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID1'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID1'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID1'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID1')
)
Query to get all dates and count and distinct counts -
select dates,count(userid) cnt, count(distinct userid) cnt_d
from data_cte
group by dates;
DATES
CNT
CNT_D
2022-05-01
2
2
2022-05-02
2
2
2022-05-03
3
3
2022-05-04
5
4
2022-05-05
1
1
2022-05-06
1
1
2022-05-08
1
1
2022-05-07
2
2
Query to get difference of date from current date
select dates,datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates;
DATES
DDIFF
CNT
CNT_D
2022-05-01
45
2
2
2022-05-02
44
2
2
2022-05-03
43
3
3
2022-05-04
42
5
4
2022-05-05
41
1
1
2022-05-06
40
1
1
2022-05-08
38
1
1
2022-05-07
39
2
2
Get records with date difference beyond a certain range only -
include clause having
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43;
DDIFF
CNT
CNT_D
43
3
3
42
5
4
41
1
1
39
2
2
38
1
1
40
1
1
If you need to prefix 'day' to each date diff count, you can
add and outer query to previously fetched data-set and add the needed prefix to the date diff column as following -
I am using CTE syntax, but you may use sub-query given you will select from table -
,cte_1 as (
select datediff(day,dates,current_date()) ddiff,
count(userid) cnt,
count(distinct userid) cnt_d
from data_cte
group by dates
having ddiff<=43)
select 'day_'||to_char(ddiff) days,
cnt,
cnt_d
from cte_1;
DAYS
CNT
CNT_D
day_43
3
3
day_42
5
4
day_41
1
1
day_39
2
2
day_38
1
1
day_40
1
1
Updated the answer to get distinct user count for number of days range.
A clause can be included in the final query to limit to number of days needed.
with data_cte(dates,userid) as
(select * from values
('2022-05-01'::date,'UID1'),
('2022-05-01'::date,'UID2'),
('2022-05-02'::date,'UID1'),
('2022-05-02'::date,'UID2'),
('2022-05-03'::date,'UID5'),
('2022-05-03'::date,'UID2'),
('2022-05-03'::date,'UID3'),
('2022-05-04'::date,'UID1'),
('2022-05-04'::date,'UID6'),
('2022-05-04'::date,'UID2'),
('2022-05-04'::date,'UID3'),
('2022-05-04'::date,'UID4'),
('2022-05-05'::date,'UID7'),
('2022-05-06'::date,'UID1'),
('2022-05-07'::date,'UID8'),
('2022-05-07'::date,'UID2'),
('2022-05-08'::date,'UID9')
),cte_1 as
(select datediff(day,dates,current_date()) ddiff,userid
from data_cte), cte_2 as
(select distinct ddiff from cte_1 )
select cte_2.ddiff,
(select count(distinct userid)
from cte_1 where cte_1.ddiff <= cte_2.ddiff) cnt
from cte_2
order by cte_2.ddiff desc
DDIFF
CNT
47
9
46
9
45
9
44
8
43
5
42
4
41
3
40
1
You can do unpivot after getting your current output.
sample one.
select
*
from (
select
209 Day_60,
207 Day_59,
207 Day_58
)unpivot ( cnt for days in (Day_60,Day_59,Day_58));
I have this data, where I want to generate the last row "on the fly" from the first two:
Group
1yr
2yrs
3yrs
date
code
Port
19
-15
88
1/1/2020
arp
Bench
10
-13
66
1/1/2020
arb
Diff
9
2
22
I am trying to subtract the Port & Bench returns and have the difference on the new row. How can I do this?
Here's my code so far:
Select
date
Group,
Code,
1 yr returnp,
2 yrs returnp,
3yrs return
From timetable
union
Select
date,
Group,
Code,
1 yr returnb,
2 yrs returnb,
3yrs returnb
From timetable
Seems to me that a UNION ALL in concert with a conditional aggregation should do the trick
Note the sum() is wrapped in an abs() to match desired results
Select *
From YourTable
Union All
Select [Group] = 'Diff'
,[1yr] = abs(sum([1yr] * case when [Group]='Bench' then -1 else 1 end))
,[2yrs] = abs(sum([2yrs] * case when [Group]='Bench' then -1 else 1 end))
,[3yrs] = abs(sum([3yrs] * case when [Group]='Bench' then -1 else 1 end))
,[date] = null
,[code] = null
from YourTable
Results
Group 1yr 2yrs 3yrs date code
Port 19 -15 88 2020-01-01 arp
Bench 10 -13 66 2020-01-01 arb
Diff 9 2 22 NULL NULL
If you know there is always 2 rows, something like this would work
SELECT * FROM timetable
UNION ALL
SELECT
MAX(1yr) - MIN(1yr),
MAX(2yrs) - MIN(2yrs),
MAX(3yrs) - MIN(3yrs),
null,
null,
FROM timetable
My database is an amazon redshift.
I have a table that looks like this -
id
nested_id
date
value
1
10
'2021-01-01'
5
1
20
'2021-01-01'
10
1
10
'2021-01-02'
6
1
20
'2021-01-02'
11
1
10
'2021-01-03'
7
1
20
'2021-01-03'
12
2
30
'2021-01-01'
5
2
40
'2021-01-01'
10
2
30
'2021-01-02'
6
2
40
'2021-01-02'
11
2
30
'2021-01-03'
7
2
40
'2021-01-03'
12
So this is basically a table that tracks values by id over time, except for every id there can be a nested_id. And the dates and values are primarily connected to the nested_id.
However, let's say I'm starting with the id field, but for each id I want to only return the points over time for the nested_id that has the greater sum of points.
So right now I'm just grabbing it like this...
select *
from mytable
where id in (1, 2)
except I only want it to return nested_id rows where the maximum value of that nested_id is the greatest.
So here's how I would do this manually.
For id of 1, the maximum value is 12, and the nested_id of that value is 20
For id of 2, the maximum value is 12, and the nested_id of that value is 40
So my return table should be
id
nested_id
date
value
1
20
'2021-01-01'
10
1
20
'2021-01-02'
11
1
20
'2021-01-03'
12
2
40
'2021-01-01'
10
2
40
'2021-01-02'
11
2
40
'2021-01-03'
12
Is there an easy way of performing this query? I'm assuming you have to partition somehow?
You can solve this with row_number window functions
with maxs as (
select id,
nested_id,
value,
row_number() over (partition by id order by value desc) rn
from mytable
)
select mt.*
from mytable mt
left join maxs on mt.id = maxs.id and mt.nested_id = maxs.nested_id
where maxs.rn = 1
I have a simple data
Date Count by english count by chinese
08-Mar-19 12 54
09-Mar-19 15 66
10-Mar-19 45 32
11-Mar-19 21 70
12-Mar-19 57 64
29-Mar-19 43 53
30-Mar-19 67 21
I want to group this data by week and the sum should be cumulative.The date starts from 8 march so the week should be calculated that way only. So the result should be
count by english count by chinese
08-MAR-19-14-MAR-19 150 286
15-MAR-19-22-MAR-19 150 286 (no data so same as above)
23-MAR-19-30-MAR-19 260 360
Tried using cumulative and sum but not able to achieve it
You can generate your week ranges, then use an outer join to see which data fits in each week, and use an analytic sum to get the result you want;
with week_ranges (date_from, date_to) as (
select min_date + ((level - 1) * 7), min_date + (level * 7)
from (
select min(some_date) as min_date, ceil((max(some_date) - min(some_date)) / 7) as weeks
from your_table
)
connect by level <= weeks
)
select distinct wr.date_from, wr.date_to - 1 as date_to,
sum(count_english) over (order by wr.date_from) as count_english,
sum(count_chinese) over (order by wr.date_from) as count_chinese
from week_ranges wr
left join your_table yt
on yt.some_date >= wr.date_from
and yt.some_date < wr.date_to
order by date_from;
which with your sample data gets:
DATE_FROM DATE_TO COUNT_ENGLISH COUNT_CHINESE
---------- ---------- ------------- -------------
2019-03-08 2019-03-14 150 286
2019-03-15 2019-03-21 150 286
2019-03-22 2019-03-28 150 286
2019-03-29 2019-04-04 260 360
Note this is splitting it up into four 7-days weeks, rather than one of 7 days and two of 8 days...
db<>fiddle
Here's one option; note that "my weeks" are different than yours because - your data is somewhat inconsistent as they vary from 6 to 7 days. That's also why the final result is different, but the general idea should be OK.
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> with test (datum, cbe) as
2 -- sample data
3 (select date '2019-03-08', 12 from dual union all
4 select date '2019-03-09', 15 from dual union all
5 select date '2019-03-10', 45 from dual union all
6 select date '2019-03-11', 21 from dual union all
7 select date '2019-03-12', 57 from dual union all
8 select date '2019-03-29', 43 from dual union all
9 select date '2019-03-30', 67 from dual
10 ),
11 span as
12 -- min and max date value, so that we could create a "calendar"
13 (select min(datum) mindat,
14 max(datum) maxdat
15 from test
16 ),
17 periods as
18 -- "calendar" whose periods are weeks
19 (select s.mindat + (level - 1) * 7 datum_from,
20 (s.mindat + level * 7) - 1 datum_to
21 from span s
22 connect by level <= (s.maxdat - s.mindat) / 7 + 1
23 )
24 -- running sum per weeks
25 select distinct
26 p.datum_from,
27 p.datum_to,
28 sum(t.cbe) over (order by p.datum_from) sum_cbe
29 from test t full outer join periods p on t.datum between p.datum_from and p.datum_to
30 order by p.datum_from;
DATUM_FROM DATUM_TO SUM_CBE
---------- ---------- ----------
08.03.2019 14.03.2019 150
15.03.2019 21.03.2019 150
22.03.2019 28.03.2019 150
29.03.2019 04.04.2019 260
SQL>
My table structure in SQL Server looks as below.
id startdate enddate value
---------------------------------------
1 2019-02-06 2019-02-07 11
1 2019-01-22 2019-02-05 10
1 2019-01-15 2019-01-21 14
1 2018-12-13 2018-01-14 15
1 2018-12-09 2018-12-12 14
1 2018-08-13 2018-12-08 17
1 2018-07-19 2018-08-12 19
1 2018-06-13 2018-07-18 20
Now my query needs to display value from highest start date for that month. Which is fine and I know what needs to be done but Not start just highest date value for that month, if no value is there for that start date, we carry forward value from last month. So basically if you notice on above data, after December 2018 values, there are no values for November, October, September etc but I want to return MM/YYYY values for that month in result but value for those months should be what we found on earlier month which is August values which in this example is 17. Please note that enddate will always be as of one day before new start date begins. Probably that can be used for back filling and carry forwarding missing month values?
So my result should look like below.
id date value
----------------------------
1 2019-02 11
1 2019-01 10
1 2018-12 15
1 2018-11 17
1 2018-10 17
1 2018-09 17
1 2018-08 17
1 2018-07 19
1 2018-06 20
Do you think this can be done without using cursor here?
Alexander Volok's answer is solid, so I won't go into too much extra code. But I thought I'd explain the reasoning. In essence, what you need to do is create a skeleton date table containing all the dates and primary keys you want returned. I'm guessing you have more than one id value in your real data, so probably something like this (whether you choose to persist it or not is up to you)
create table #skelly
(
id int,
_year int,
_month int
primary key (id, _year, _month)
)
You can get much more precise if you need to be, by only including dates which fall between the min and max StartDate per id, but that's an exercise I leave up to you.
From there, it's then just a matter of filling in the values you care about against that skeleton table. You can do this in a number of ways; by joining, cross applying or a correlated subquery (as Alexander Volok used).
DECLARE #start DATE, #end DATE;
SELECT #start = '20180601', #end = GETDATE();
;WITH Months AS
(
SELECT EOMONTH(DATEADD(month, n-1, #start)) AS DateValue FROM (
SELECT TOP (DATEDIFF(MONTH, #start, #end) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects
) D
)
, InputData AS
(
SELECT 1 AS id, '2019-02-06' startdate, '2019-02-07' as enddate, 11 AS [value] UNION ALL
SELECT 1, '2019-01-22', '2019-01-25', 10 UNION ALL
SELECT 1, '2019-01-15', '2019-01-17', 14 UNION ALL
SELECT 1, '2018-12-13', '2018-12-19', 15 UNION ALL
SELECT 1, '2018-12-09', '2018-12-10', 14 UNION ALL
SELECT 1, '2018-08-13', '2018-12-08', 17 UNION ALL
SELECT 1, '2018-07-19', '2018-07-25', 19 UNION ALL
SELECT 1, '2018-06-13', '2018-07-18', 20
)
SELECT FORMAT(m.DateValue, 'yyyy-MM') AS [Month]
, (SELECT TOP 1 I.value FROM InputData I WHERE I.startdate < M.DateValue ORDER BY I.startdate DESC ) [Value]
FROM months m
ORDER BY M.DateValue DESC
Results to:
Month Value
2019-02 11
2019-01 10
2018-12 15
2018-11 17
2018-10 17
2018-09 17
2018-08 17
2018-07 19
2018-06 20