Creating counts based on date ranges with inner join - sql

Here is an illustration for what I'd like to do
Table A:
user_id | industry | startdate | enddate | generation
1 retail 2000-01-01 2001-01-01 Gen X
1 retail 2002-01-01 2003-02-01 Gen X
2 Tech 2001-01-01 2002-01-01 Gen X
2 Business 2002-03-01 2003-01-01 Gen X
2 Tech 2003-02-01 null Gen X
... ... ... ... ...
35642 Medicine 2020-02-01 2022-03-01 Gen Z
Table B
month
1990-01-01
1990-02-01
...
2022-03-01
Desired Result:
industry | generation| count | month
retail Gen X 200 2002-02-01
retail Gen Y 250 2002-02-01
Tech Gen X 130 2002-02-01
Tech Gen Y 166 2002-02-01
...
For now, I've only got tables A and B. I want to create counts by industry, by month, by generation, but I'm not sure how I can do this using the two tables that I have.
My (incorrect) approach would be something like select count(*), industry, month, generation where A.startdate < B.month and A.enddate > B.month, but this query is obviously not running. Is what I want to do possible with just tables A and B?
Apologies if I'm being unclear, I am admittedly new to SQL queries and am not sure how to approach this problem.

Try this approach:
Create a CTE that generates the distinct list of industry/generation by querying table A
Create a 2nd CTE that cartesian joins the first CTE to table B - giving you a list of all months/industry/generation
Left outer join table A to the 2nd CTE and query for the result you want to achieve

Executing the steps of NickW gives me.
I guess that users whose enddate is null are still in businss
## Create tables
CREATE TABLE table_a (
user_id int,
industry text,
startdate date,
enddate date,
generation text
);
CREATE TABLE table_b (
month date
);
# Insert data
WITH series_months AS (
SELECT date(i)
from generate_series(
date '1999-01-01',
date '2012-09-01',
INTERVAL '1 month'
) i
)
INSERT INTO table_b (month)
SELECT * FROM series_months;
INSERT INTO table_a (user_id, industry, startdate, enddate, generation)
VALUES
(1, 'retail', '2000-01-01', '2001-01-01', 'Gen X'),
(1, 'retail', '2002-01-01', '2003-02-01', 'Gen X'),
(2, 'Tech', '2001-01-01', '2002-01-01', 'Gen X'),
(2, 'Business', '2002-03-01', '2003-01-01', 'Gen X'),
(2, 'Tech', '2003-02-01', NULL, 'Gen X');
# Perform joins
WITH industry_generation as (
SELECT distinct industry, generation from table_a
), months_industry_generation as (
SELECT *
FROM table_b, industry_generation
), combined_table AS (
SELECT mig.month, mig.industry, mig.generation, user_id, startdate, enddate
FROM
(SELECT * FROM months_industry_generation) mig
inner join table_a a
on mig.industry = a.industry AND mig.generation = a.generation
where month >= startdate AND (month <= enddate OR enddate is null)
)
SELECT industry, generation, count(user_id) AS count, month
FROM combined_table
GROUP BY month, industry, generation
ORDER BY 1, 2, 4
;

Related

Calculate standdard deviation over time

I have information about sales per day. For example:
Date - Product - Amount
01-07-2020 - A - 10
01-03-2020 - A - 20
01-02-2020 - B - 10
Now I would like to know the average sales per day and the standard deviation for the last year. For average I can just count the number of entries per item, and then count 365-amount of entries and take that many 0's, but I wonder what the best way is to calculate the standard deviation while incorporating the 0's for the days there are not sales.
Use a hierarchical (or recursive) query to generate daily dates for the year and then use a PARTITION OUTER JOIN to join it to your product data then you can find the average and standard deviation with the AVG and STDDEV aggregation functions and use COALESCE to fill in NULL values with zeroes:
WITH start_date ( dt ) AS (
SELECT DATE '2020-01-01' FROM DUAL
),
calendar ( dt ) AS (
SELECT dt + LEVEL - 1
FROM start_date
CONNECT BY dt + LEVEL - 1 < ADD_MONTHS( dt, 12 )
)
SELECT product,
AVG( COALESCE( amount, 0 ) ) AS average_sales_per_day,
STDDEV( COALESCE( amount, 0 ) ) AS stddev_sales_per_day
FROM calendar c
LEFT OUTER JOIN (
SELECT t.*
FROM test_data t
INNER JOIN start_date s
ON (
s.dt <= t."DATE"
AND t."DATE" < ADD_MONTHS( s.dt, 12 )
)
) t
PARTITION BY ( t.product )
ON ( c.dt = t."DATE" )
GROUP BY product
So, for your sample data:
CREATE TABLE test_data ( "DATE", Product, Amount ) AS
SELECT DATE '2020-07-01', 'A', 10 FROM DUAL UNION ALL
SELECT DATE '2020-03-01', 'A', 20 FROM DUAL UNION ALL
SELECT DATE '2020-02-01', 'B', 10 FROM DUAL;
This outputs:
PRODUCT | AVERAGE_SALES_PER_DAY | STDDEV_SALES_PER_DAY
:------ | ----------------------------------------: | ----------------------------------------:
A | .0819672131147540983606557377049180327869 | 1.16752986363678031669548047505759328696
B | .027322404371584699453551912568306010929 | .5227083734893166933219264686616717636897
db<>fiddle here

SQL Table Join with Weighting

I'm trying to create a table that will calculate the weights of spend for a customer in the months they shop for a period. For example, the following customer (faux data) has this spend profile:
/* Customer spend */
create or replace temp table ts_all_transactions
(
inferred_customer_id varchar(128)
,nw_date date
,spend number(21,2)
);
insert into ts_all_transactions
values
('52f5','2019-06-01',17.35)
,('52f5','2018-11-01',24.85)
,('52f5','2019-12-01',1.40)
,('52f5','2019-01-01',2.45)
,('52f5','2019-03-01',3.90)
,('52f5','2020-01-01',37.55)
,('52f5','2019-10-01',13.20)
,('52f5','2019-09-01',5.70)
;
A calendar containing the months in which a period falls in, along with a weighting is then created:
-- Calculate weights for each period of the time series
-- Create a staging table
create or replace temp table period_dimension as
select abs(seq4()-12) as period,
dateadd(month, seq4(), dateadd(month, -23, date_trunc('Month', current_date()))) as start_date,
dateadd(month, 12, start_date) as end_date
from table(generator(rowcount => 12)) -- number of months after reference date in previous line
;
select * from period_dimension;
create or replace temp table my_date_dimension
(
my_date date not null
,year smallint not null
,month smallint not null
,month_name char(3) not null
,day_of_mon smallint not null
,day_of_week varchar(9) not null
,week_of_year smallint not null
,day_of_year smallint not null
)
as
with my_date as (
select
seq4(),
dateadd(month, seq4(), dateadd(month, -23, date_trunc('Month', current_date()))) as my_date
from table(generator(rowcount=>23))
)
select my_date
,year(my_date)
,month(my_date)
,monthname(my_date)
,day(my_date)
,dayofweek(my_date)
,weekofyear(my_date)
,dayofyear(my_date)
from my_date
;
create or replace table weight_lookup as
select
a.period
,b.my_date
,rank() over (partition by a.period order by b.my_date) as weight
from period_dimension a
inner join my_date_dimension b
where b.my_date >= a.start_date
and b.my_date < a.end_date
order by 1,2
;
-- Create a staging table
create or replace temp table period_dimension2 as
select abs(seq4()-12) as period,
dateadd(month, seq4(), dateadd(month, -23, date_trunc('Month', current_date()))) as start_date,
last_day(dateadd(month, 11, start_date)) as end_date
from table(generator(rowcount => 12)) -- number of months after reference date in previous line
;
The above is then used to calculate an average spend based on the months the customer shops in the period, however, I'm not getting the output I expect:
-- For each month of each period, group all together by period here so we have 12 periods
-- so each period represents 12 rolling months with period 12 being the oldest period
create or replace temp table ts_spend_time as
select
a.inferred_customer_id
,b.period
,max(a.nw_date) as max_mnth /* Month in period where most spend was made */
,sum(a.spend * b.weight) / 78 as avg_spend /* Sum of weights 12,11,10...1 to give 78 */
from ts_all_transactions a
inner join weight_lookup b on a.nw_date = b.my_date
inner join period_dimension2 c on b.my_date = c.start_date and b.period = c.period
where b.my_date >= c.start_date
and b.my_date <= c.end_date
group by 1,2
order by 1 desc, 2,3
;
The output I get from the above code is this:
create or replace temp table ts_spend_time_wrong_out
(
inferred_customer_id varchar(128)
,period number(11)
,max_mnth date
,avg_spend number(38,8)
);
insert into ts_spend_time_wrong_out
values
('52f5',3,'2019-03-01',0.05000000)
,('52f5',5,'2019-01-01',0.03141026)
,('52f5',7,'2018-11-01',0.31858974)
;
I would like to get an output like this:
create or replace temp table ts_spend_time_should_be
(
inferred_customer_id varchar(128)
,period number(11)
,max_mnth date
,avg_spend number(38,8)
);
insert into ts_spend_time_should_be
values
('52f5',1,'01JAN2020',6.301923077)
,('52f5',2,'01JAN2020',7.266025641)
,('52f5',3,'01JAN2020',8.280128205)
,('52f5',4,'01JAN2020',9.294230769)
,('52f5',5,'01DEC2019',4.081410256)
,('52f5',6,'01OCT2019',4.412179487)
,('52f5',7,'01OCT2019',5.276923077)
,('52f5',8,'01SEP2019',3.941666667)
,('52f5',9,'01JUN2019',3.687179487)
,('52f5',10,'01JUN2019',4.309615385)
,('52f5',11,'01JUN2019',4.932051282)
,('52f5',12,'01MAR2019',2.662820513)
;
In the correct solution example, the average spend is calculated by period as follows: ((17.35*2)+(5.7*5)+(13.20*6)+(1.4*8)+(37.55*9)) / 78
How can I resolve this? TIA
firstly you should use row_number() over(order by seq4()) as there can be gaps in a seq()
so working half way through you question
with ts_all_transactions as (
select id, nw_date::date as nw_date, spend from values
('52f5','2019-06-01',17.35)
,('52f5','2018-11-01',24.85)
,('52f5','2019-12-01',1.40)
,('52f5','2019-01-01',2.45)
,('52f5','2019-03-01',3.90)
,('52f5','2020-01-01',37.55)
,('52f5','2019-10-01',13.20)
,('52f5','2019-09-01',5.70)
v(id,nw_date, spend)
), period_dimension as (
select
row_number() over(order by seq4())-1 as rn0,
abs(rn0-12) as period,
dateadd('month', rn0, dateadd(month, -23, date_trunc('Month', current_date()))) as start_date,
dateadd('month', 12, start_date) as end_date
from table(generator(rowcount => 12)) -- number of months after reference date in previous line
), weight_periods as (
select p.period
,p.start_date
,p.end_date
,row_number() over(partition by p.period order by seq4())-1 as rn1
,dateadd('month',-rn1, p.end_date ) as weight_month
,12 - rn1 + 1 as weight
from period_dimension p,
table(generator(rowcount => 12))
), monthly_spends as (
select id
,date_trunc('month', nw_date) as m_date
,sum(spend) as s_spend
from ts_all_transactions
group by 1,2
)
select m.id
,w.period
,w.end_date
,w.weight_month
,m.s_spend
,w.weight
,m.s_spend * w.weight as w_spend
from monthly_spends m
join weight_periods w on m.m_date = w.weight_month
order by 1,2,3,4;
gives:
ID PERIOD END_DATE WEIGHT_MONTH S_SPEND WEIGHT W_SPEND
52f5 1 2020-05-01 2019-06-01 17.35 2 34.70
52f5 1 2020-05-01 2019-09-01 5.70 5 28.50
52f5 1 2020-05-01 2019-10-01 13.20 6 79.20
52f5 1 2020-05-01 2019-12-01 1.40 8 11.20
52f5 1 2020-05-01 2020-01-01 37.55 9 337.95
52f5 2 2020-04-01 2019-06-01 17.35 3 52.05
...
This shows up to this point we can see the inputs to the values you are expecting for the "weighted average" which can be done via :
select m.id
,w.period
,sum(m.s_spend * w.weight) as t_w_spend
,round(t_w_spend / 78,3) as weighted_avg_spend
from monthly_spends m
join weight_periods w on m.m_date = w.weight_month
group by 1,2
order by 1,2;
which gives:
ID PERIOD T_W_SPEND WEIGHTED_AVG_SPEND
52f5 1 491.55 6.302
52f5 2 566.75 7.266
52f5 3 641.95 8.230
52f5 4 724.95 9.294
52f5 5 804.05 10.308
52f5 6 362.35 4.646
52f5 7 386.75 4.958
52f5 8 479.05 6.142
52f5 9 361.70 4.637
52f5 10 336.15 4.310
52f5 11 384.70 4.932
52f5 12 433.25 5.554
which starts the same, but diverge as I think your date periods are done "wrong"
but the next point is you have this line
,max(a.nw_date) as max_mnth /* Month in period where most spend was made */
which does not do what you comment it as doing..
what it does is find the max date value in the aggregate.
to do that you need to go back to the monthly results sql and put a first_value() into the mix then select the results via:
id, period, max_spend_month, sum(w_spend)/78 as weighted_avg_spend
from (
select m.id
,w.period
,w.end_date
,w.weight_month
,m.s_spend
,w.weight
,m.s_spend * w.weight as w_spend
,first_value(w.weight_month) over (partition by m.id, w.period order by m.s_spend desc) as max_spend_month
from monthly_spends m
join weight_periods w on m.m_date = w.weight_month
)
group by 1,2,3
order by 1,2;
which now matches your expectations:
ID PERIOD MAX_SPEND_MONTH WEIGHTED_AVG_SPEND
52f5 1 2020-01-01 6.30192308
52f5 2 2020-01-01 7.26602564
52f5 3 2020-01-01 8.23012821
52f5 4 2020-01-01 9.29423077
52f5 5 2020-01-01 10.30833333
52f5 6 2019-06-01 4.64551282
52f5 7 2019-06-01 4.95833333
52f5 8 2018-11-01 6.14166667
52f5 9 2018-11-01 4.63717949
52f5 10 2018-11-01 4.30961538
52f5 11 2018-11-01 4.93205128
52f5 12 2018-11-01 5.55448718

Window functions with missing data

Assume that I have a table (MyTable) as follows:
item_id | date
----------------
1 | 2016-06-08
1 | 2016-06-07
1 | 2016-06-05
1 | 2016-06-04
1 | 2016-05-31
...
2 | 2016-06-08
2 | 2016-06-06
2 | 2016-06-04
2 | 2016-05-31
...
3 | 2016-05-31
...
I would like to build a weekly summary table that reports on a running 7 day window. The window would basically say "How many unique item_ids were reported in the preceding 7 days"?
So, in this case, the output table would look something like:
date | weekly_ids
----------------------
2016-05-31| 3 # All 3 were present on the 31st
2016-06-01| 3 # All 3 were present on the 31st which is < 7 days before the 1st
2016-06-02| 3 # Same
2016-06-03| 3 # Same
2016-06-04| 3 # Same
2016-06-05| 3 # Same
2016-06-06| 3 # Same
2016-06-07| 3 # Same
2016-06-08| 2 # item 3 was not present for the entire last week so it does not add to the count.
I've tried:
SELECT
item_id,
date,
MAX(present) OVER (
PARTITION BY item_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS is_present
FROM (
# Inner query
SELECT
item_id,
date,
1 AS present,
FROM MyTable
)
GROUP BY date
ORDER BY date DESC
This feels like it is going in the right direction. But as it is, the window runs over the wrong time-frame when dates aren't present (too many dates) and it also doesn't output records for dates when the item_id wasn't present (even if it was present on the previous date). Is there a simple resolution to this problem?
If it's helpful and necessary
I can hard-code an oldest date
I also can get a table of all of the item_ids in existence.
This query will only be run on BigQuery, so BQ specific functions/syntax are fair game and SQL functions/syntax that doesn't run on BigQuery unfortunately doesn't help me ...
I have created a temp table to hold dates, however, you probably would benefit from adding a permanent table to your database for these joins. Trust me it will cause less headaches.
DECLARE #my_table TABLE
(
item_id int,
date DATETIME
)
INSERT #my_table SELECT 1,'2016-06-08'
INSERT #my_table SELECT 1,'2016-06-07'
INSERT #my_table SELECT 1,'2016-06-05'
INSERT #my_table SELECT 1,'2016-06-04'
INSERT #my_table SELECT 1,'2016-05-31'
INSERT #my_table SELECT 2,'2016-06-08'
INSERT #my_table SELECT 2,'2016-06-06'
INSERT #my_table SELECT 2,'2016-06-04'
INSERT #my_table SELECT 2,'2016-05-31'
INSERT #my_table SELECT 3,'2016-05-31'
DECLARE #TrailingDays INT=7
DECLARE #LowDate DATETIME='01/01/2016'
DECLARE #HighDate DATETIME='12/31/2016'
DECLARE #Calendar TABLE(CalendarDate DATETIME)
DECLARE #LoopDate DATETIME=#LowDate
WHILE(#LoopDate<=#HighDate) BEGIN
INSERT #Calendar SELECT #LoopDate
SET #LoopDate=DATEADD(DAY,1,#LoopDate)
END
SELECT
date=HighDate,
weekly_ids=COUNT(DISTINCT item_id)
FROM
(
SELECT
HighDate=C.CalendarDate,
LowDate=LAG(C.CalendarDate, #TrailingDays,0) OVER (ORDER BY C.CalendarDate)
FROM
#Calendar C
WHERE
CalendarDate BETWEEN #LowDate AND #HighDate
)AS X
LEFT OUTER JOIN #my_table MT ON MT.date BETWEEN LowDate AND HighDate
GROUP BY
LowDate,
HighDate
Try below example. It can give you direction to explore
Purely GBQ - Legacy SQL
SELECT date, items FROM (
SELECT
date, COUNT(DISTINCT item_id) OVER(ORDER BY sec RANGE BETWEEN 60*60*24*2 PRECEDING AND CURRENT ROW) AS items
FROM (
SELECT
item_id, date, timestamp_to_sec(timestamp(date)) AS sec
FROM (
SELECT calendar.day AS date, MyTable.item_id AS item_id
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP('2016-05-28'), pos - 1, "DAY")) AS day
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP('2016-05-28')), '.'),'') AS h
FROM (SELECT NULL)),h
)))
) AS calendar
LEFT JOIN (
SELECT date, item_id
FROM
(SELECT 1 AS item_id, '2016-06-08' AS date),
(SELECT 1 AS item_id, '2016-06-07' AS date),
(SELECT 1 AS item_id, '2016-06-05' AS date),
(SELECT 1 AS item_id, '2016-06-04' AS date),
(SELECT 1 AS item_id, '2016-05-28' AS date),
(SELECT 2 AS item_id, '2016-06-08' AS date),
(SELECT 2 AS item_id, '2016-06-06' AS date),
(SELECT 2 AS item_id, '2016-06-04' AS date),
(SELECT 2 AS item_id, '2016-05-31' AS date),
(SELECT 3 AS item_id, '2016-05-31' AS date),
(SELECT 3 AS item_id, '2016-06-05' AS date)
) AS MyTable
ON calendar.day = MyTable.date
)
)
)
GROUP BY date, items
ORDER BY date
Please note
oldest date - 2016-05-28 - is hardcoded in calendar subquery
window size is controled in RANGE BETWEEN 60*60*24*2 PRECEDING AND CURRENT ROW; if you need 7 days - the expression should be 60*60*24*6
have in mind specifics of COUNT(DISTINCT) in BigQuery Legacy SQL

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)

Find gaps in time not covered by records with start date and end date

I have a table of fee records (f_fee_item) as follows:
Fee_Item_ID int
Fee_Basis_ID int
Start_Date date
End_Date date
(irrelevant columns removed)
Assume that records for the same Fee_Basis_ID won't overlap.
I need to find the Start_Date and End_Date of each gap in the fee records for each Fee_Basis_ID between a supplied #Query_Start_Date and #Query_End_Date. I need this data to calculate fee accruals for all periods where fees have not been charged.
I also need the query to return a record if there are no fee records at all for a given Fee_Basis_ID (Fee_Basis_ID is a foreign key to D_Fee_Basis.Fee_Basis_ID if that helps).
For example:
#Query_Start_Date = '2011-01-01'
#Query_Start_Date = '2011-09-30'
D_Fee_Basis:
F_Fee_Item
1
2
3
F_Fee_Item:
Fee_Item_ID Fee_Basis_ID Start_Date End_Date
1 1 2011-01-01 2011-03-31
2 1 2011-04-01 2011-06-30
3 2 2011-01-01 2011-03-31
4 2 2011-05-01 2011-06-30
Required Results:
Fee_Basis_ID Start_Date End_Date
1 2011-07-01 2011-09-30
2 2011-04-01 2011-04-30
2 2011-07-01 2011-09-30
3 2011-01-01 2011-09-30
I've bee trying different self-joins for days trying to get it working but with no luck.
Please help!!
Here is a solution:
declare #Query_Start_Date date= '2011-01-01'
declare #Query_End_Date date = '2011-09-30'
declare #D_Fee_Basis table(F_Fee_Item int)
insert #D_Fee_Basis values(1)
insert #D_Fee_Basis values(2)
insert #D_Fee_Basis values(3)
declare #F_Fee_Item table(Fee_Item_ID int, Fee_Basis_ID int,Start_Date date,End_Date date)
insert #F_Fee_Item values(1,1,'2011-01-01','2011-03-31')
insert #F_Fee_Item values(2,1,'2011-04-01','2011-06-30')
insert #F_Fee_Item values(3,2,'2011-01-01','2011-03-31')
insert #F_Fee_Item values(4,2,'2011-05-01','2011-06-30')
;with a as
(-- find all days between Start_Date and End_Date
select #Query_Start_Date d
union all
select dateadd(day, 1, d)
from a
where d < #Query_end_Date
), b as
(--find all unused days
select a.d, F_Fee_Item Fee
from a, #D_Fee_Basis Fee
where not exists(select 1 from #F_Fee_Item where a.d between Start_Date and End_Date and Fee.F_Fee_Item = Fee_Basis_ID)
),
c as
(--find all start dates
select d, Fee, rn = row_number() over (order by fee, d) from b
where not exists (select 1 from b b2 where dateadd(day,1, b2.d) = b.d and b2.Fee= b.Fee)
),
e as
(--find all end dates
select d, Fee, rn = row_number() over (order by fee, d) from b
where not exists (select 1 from b b2 where dateadd(day,-1, b2.d) = b.d and b2.Fee= b.Fee)
)
--join start dates with end dates
select c.Fee Fee_Basis_ID, c.d Start_Date, e.d End_Date from c join e on c.Fee = e.Fee and c.rn = e.rn
option (maxrecursion 0)
Link for result:
https://data.stackexchange.com/stackoverflow/q/114193/