aligning tables with different dates - sql

I have two tables, called tblDaily and tblWeekly.
So tblDaily contains daily data & tblWeekly contains data that is stored every friday.
So obviously it is easy to join the daily table to the weekly table when the date in the daily data is a friday.
My question is what is the best way to join when the date is not a friday. So for example say I had the date 2018-05-09 (Wednesday) I would like to join it on the previous friday (2018-05-04). What is the optimal way of doing this?
I read about a calendar table, would that be the correct way to go? Although I'm not sure how that would work in this case?
tblDaily
date val
2018-04-30 2 'mon
2018-05-01 3 'tues
2018-05-02 3 'wed
2018-05-03 3 'thurs
2018-05-04 3 'fri
2018-05-07 2 'mon
2018-05-08 3 'tues
2018-05-09 3 'wed
2018-05-10 3 'thurs
2018-05-11 3 'fri
2018-05-14 3 'mon
tblWeekly
date val
2018-05-04 2 'fri
2018-05-11 3 'fri

This might work:
SELECT
[dailydate] = D.[date],
[dailyval] = D.[val],
[weeklydate] = W.[date],
[weeklyval] = W.[val]
FROM
[tblDaily] AS D
OUTER APPLY (SELECT TOP (1) _W.*
FROM [tblWeekly] AS _W
WHERE _W.[date] <= D.[date]
ORDER BY _W.[date] DESC) AS W;
This query produces the following results:
dailydate dailyval weeklydate weeklyval
2018-04-30 2 NULL NULL
2018-05-01 3 NULL NULL
2018-05-02 3 NULL NULL
2018-05-03 3 NULL NULL
2018-05-04 3 2018-05-04 2
2018-05-07 2 2018-05-04 2
2018-05-08 3 2018-05-04 2
2018-05-09 3 2018-05-04 2
2018-05-10 3 2018-05-04 2
2018-05-11 3 2018-05-11 3
2018-05-14 3 2018-05-11 3

Try something like this:
select * from tblDaily a join tblWeekly b on a.date1= dateadd(day,-5,b.date2)

Try this simple join:
select *
from tblDaily [d]
--first condition in join is to match firdays exactly
left join tblWeekly [w] on [w].[date] = [d].[date] or
--here you are joining fridays from tblWeekly to last friday before the date in tblDaily
[w].[date] = dateadd(day, -datepart(weekday, [d].[date]) - 1, [d].[date])
Here is SQL fiddle.

Related

SQL SELECT Difference between two days greater than 1 day

I have table T1
ID SCHEDULESTART SCHEDULEFINISH
1 2018-05-12 14:00:00 2018-05-14 11:00:00
2 2018-05-30 14:00:00 2018-06-01 11:00:00
3 2018-02-28 14:00:00 2018-03-02 11:00:00
4 2018-02-28 14:00:00 2018-03-01 11:00:00
5 2018-05-30 14:00:00 2018-05-31 11:00:00
I want to select all rows where difference in days (it's not important difference in hours) is greater than 1 day.
If SCHEDULESTART or SCHEDULEFINISH are on the same day or SCHEDULEFINISH is on next day then these rows should NOT be selected.
So the result should return rows with IDs: 1 2 3
because first row have difference in two days, second row (1st June is 2 days after 30th May ) and 3rd row (2nd March is 2 days after 28 February).
Is this possible somehow?
I know the function DAY but this will return only day number in that one month!!!
I must beging my query with
SELECT ID FROM T1 WHERE ...
Thanks in advance
In DB2, this should work:
select t1.*
from t1
where date(schedulestart) < date(schedulefinish) - 1 day;

SAS - PROC SQL: two tables: each one column distinct value, left join

I have a table with distinct dates YYYYMMDD from 20000101 until 20001231 and a table with distinct time points (HH:MM:SS) from 09:30:00 until 16:00:00.
I would like to create a (left) join where every day gets repeated 391 times assigned with each time point. That looks to me like a left join, however, I do not have any id's for joining.
date time
20000101 09:30:00
20000101 09:31:00
20000101 ...
20000101 ...
20000101 15:59:00
20000101 16:00:00
20000102 09:30:00
20000102 ...
20000102 16:00:00
how would the respective code look like (if there is no explicit common primary key to join on)?
PROC SQL;
SELECT DISTINCT a.date, b.time
FROM table_1 a, table_1 b (both information are in the same table)
;
QUIT;
Just as background: there are days that are "shorter" / less than 391 observation points. However, I would like to make sure every day has 391 observation points, just filled up with missing values.
You need Cartesian Product since you want to generate all combinations of date and time. So to produce such result you need CROSS JOIN in which you don't have to give any JOIN Condition.
Try the below query:
PROC SQL;
SELECT a.date, b.time
FROM table_1 a
CROSS JOIN
table_1 b
GROUP BY a.date, b.time
;
QUIT;
OR
PROC SQL;
SELECT a.date, b.time
FROM (SELECT date FROM table_1) a
CROSS JOIN
(SELECT time FROM table_1) b
GROUP BY a.date, b.time
;
QUIT;
For more info on CROSS JOIN Follow the below link:
http://support.sas.com/documentation/cdl/en/fedsqlref/67364/HTML/default/viewer.htm#p1q7agzgxs9ik5n1p7k3sdft0u9u.htm
You can do either a Left Join or Join and add Where 1=1 this will create the Cartesian Product for you:
Code:
proc sql;
create table want as
select t1.date, t2.time
from t1 left join t2 on 1=1
order by date, time;
quit;
To show all observed times (over all dates) for each date, as well as maintaining original satellite information I would use a reflexive cross join of the combinatoric columns for the basis of a reflexive left join.
Consider this sample data generator. It simulates the case of data being gathered at different intervals (every 10 or 20 minutes) on different days.
data have;
do i = 1 to 5;
date = '01-apr-2018'd + (i-1);
do j = 0 to 4;
time = '12:00't + (mod(i,2)+1) * 600 * j; * every other day sample at 1o or 20 minute interval;
x = ceil ( 25 * ranuni(123) );
OUTPUT;
end;
end;
format date yymmdd10. time time8.;
keep date time x;
run;
SQl is used to cross join the distinct dates and times and then the original data is left joined to the cross join.
proc sql;
create table cross_as_left_basis
as
select
cross.date
, cross.time
, have.x
from
( select distinct dates.date, times.time
from have as dates
cross join have as times
) as
cross
left join
have
on
cross.date = have.date
and cross.time = have.time
;
Have is
date time x
2018-04-01 12:00:00 19
12:20:00 9
12:40:00 5
13:00:00 23
13:20:00 9
2018-04-02 12:00:00 6
12:10:00 20
12:20:00 10
12:30:00 4
12:40:00 5
2018-04-03 12:00:00 20
12:20:00 11
12:40:00 25
13:00:00 7
13:20:00 18
2018-04-04 12:00:00 14
12:10:00 14
12:20:00 22
12:30:00 4
12:40:00 22
2018-04-05 12:00:00 17
12:20:00 20
12:40:00 18
13:00:00 9
13:20:00 14
The join result is
date time x
2018-04-01 12:00:00 19
12:10:00 .
12:20:00 9
12:30:00 .
12:40:00 5
13:00:00 23
13:20:00 9
2018-04-02 12:00:00 6
12:10:00 20
12:20:00 10
12:30:00 4
12:40:00 5
13:00:00 .
13:20:00 .
2018-04-03 12:00:00 20
12:10:00 .
12:20:00 11
12:30:00 .
12:40:00 25
13:00:00 7
13:20:00 18
2018-04-04 12:00:00 14
12:10:00 14
12:20:00 22
12:30:00 4
12:40:00 22
13:00:00 .
13:20:00 .
2018-04-05 12:00:00 17
12:10:00 .
12:20:00 20
12:30:00 .
12:40:00 18
13:00:00 9
13:20:00 14

Skip Holidays in Business day Table

I am using the following script to determine what the business days are for each particular month.
DECLARE #startdate DATETIME
SET #startdate ='20170401'
;
WITH bd AS(
SELECT
DATEADD(DAY,
CASE
(DATEPART(WEEKDAY, DATEADD(MONTH, DATEDIFF(MONTH, 0, #startdate), 0)) + ##DATEFIRST - 1) % 7
WHEN 6 THEN 2
WHEN 7 THEN 1
ELSE 0
END,
DATEADD(MONTH, DATEDIFF(MONTH, 0, #startdate), 0)
) AS bd, 1 AS n
UNION ALL
SELECT DATEADD(DAY,
CASE
(DATEPART(WEEKDAY, bd.bd) + ##DATEFIRST - 1) % 7
WHEN 5 THEN 3
WHEN 6 THEN 2
ELSE 1
END,
bd.bd
) AS db,
bd.n+1
FROM bd WHERE MONTH(bd.bd) = MONTH(#startdate)
)
SELECT * INTO #BD
FROM (
SELECT 'BD'+ CAST(n AS VARCHAR(5)) AS Expected_Date_Rule, bd AS Expected_Calendar_Date
from bd
) AS x
The result of this table works fine. Bd is the the calendar days for the particular month and n is the business day number. The script does its job of not counting the weekend as a business day.
bd n
----------------------- -----------
2017-04-03 00:00:00.000 1
2017-04-04 00:00:00.000 2
2017-04-05 00:00:00.000 3
2017-04-06 00:00:00.000 4
2017-04-07 00:00:00.000 5
2017-04-10 00:00:00.000 6
2017-04-11 00:00:00.000 7
2017-04-12 00:00:00.000 8
2017-04-13 00:00:00.000 9
2017-04-14 00:00:00.000 10
2017-04-17 00:00:00.000 11
2017-04-18 00:00:00.000 12
2017-04-19 00:00:00.000 13
2017-04-20 00:00:00.000 14
2017-04-21 00:00:00.000 15
2017-04-24 00:00:00.000 16
2017-04-25 00:00:00.000 17
2017-04-26 00:00:00.000 18
2017-04-27 00:00:00.000 19
2017-04-28 00:00:00.000 20
2017-05-01 00:00:00.000 21
But then I notice that a potential issue will occur in July where the output will count the 4th of July as BD2 when it should be counted as BD3. Some had created a holiday table that is updated with all the holidays (excuse the bad spelling).
Holiday table
1 2017-01-01 New Year Day
4 2017-01-02 New Year Day-Follow
1 2017-01-16 MArtin Luther King Day
4 2017-01-17 MArtin Luther King Day-Follow
1 2017-02-20 Preseiednt Day
4 2017-02-21 Preseiednt Day-Follow
1 2017-05-29 Memorial Day
4 2017-05-30 Memorial Day-Follow
1 2017-07-04 Independence Day
4 2017-07-05 Independence Day-Follow
1 2017-09-04 Labour Day
4 2017-09-05 Labour Day-Follow
1 2017-10-09 Columbus Day
4 2017-10-10 Columbus Day-Follow
1 2017-11-10 Vetrans Day
4 2017-11-11 Vetrans Day-Follow
1 2017-11-23 ThanksGiving
1 2017-11-24 Day After Thanks Giving
4 2017-11-24 ThanksGiving-Follow
4 2017-11-25 Day After Thanks Giving-Follow
1 2017-12-25 Christmas
4 2017-12-26 Christmas-Follow
I was thinking there may be some way I can update my script to check the holiday table and skip the holiday and dont count it as a business day. Any tips?

Group dates by 7 days excluding specific dates

I need query, where I could group dates by every 7 days from beginning of the month. The problem is I have to exclude some days, specifically days before/after holidays and holidays. In my DateDay dimension there's a column, thats indicates which type of day it is. Example of calendar for November:
DTD_GID DTD_Date DTD_DayType
20161101 2016-11-01 2 --holiday was on 2016-10-31
20161102 2016-11-02 0
20161103 2016-11-03 0
20161104 2016-11-04 0
20161105 2016-11-05 0
20161106 2016-11-06 0
20161107 2016-11-07 0
20161108 2016-11-08 0
20161109 2016-11-09 0
20161110 2016-11-10 2
20161111 2016-11-11 1--public holiday
20161112 2016-11-12 2
20161113 2016-11-13 0
20161114 2016-11-14 0
20161115 2016-11-15 0
20161116 2016-11-16 0
20161117 2016-11-17 0
20161118 2016-11-18 0
20161119 2016-11-19 0
20161120 2016-11-20 0
20161121 2016-11-21 0
20161122 2016-11-22 0
20161123 2016-11-23 0
20161124 2016-11-24 0
20161125 2016-11-25 0
20161126 2016-11-26 0
20161127 2016-11-27 0
20161128 2016-11-28 0
20161129 2016-11-29 0
20161130 2016-11-30 0
I need to group it like that:
1: 2016-11-02 - 2016-11-08 (inclusive)
2: 2016-11-13 - 2016-11-19
3: 2016-11-20 - 2016-11-26
If such group would have less than 7 days, it shouldn't be returned by query.
Let me know if you need more details.
EDIT: I'm not sure if it will help, but I wrote query that's counting proper days in weeks
SELECT
DTD_DTMGID
,CONVERT(VARCHAR(5), DATEADD(WK, Week, 0), 103) + ' - ' + CONVERT(VARCHAR(5), DATEADD(DD, 6, DATEADD(WK, Week, 0)), 103) AS Week
,Cnt
FROM (
SELECT
DTD_DTMGID
, DATEDIFF(WK, 0, DTD_DATE) AS Week
, COUNT(*) AS Cnt
FROM DIM_DateDay
WHERE DTD_DayType = 0
GROUP BY DTD_DTMGID ,DATEDIFF(WK, 0, DTD_DATE)
) AS X
ORDER BY DTD_DTMGID
and result:
DTD_DTMGID Week Cnt
201301 31/12 - 06/01 2
201301 07/01 - 13/01 5
201301 14/01 - 20/01 7
201301 21/01 - 27/01 7
201301 28/01 - 03/02 5
201302 28/01 - 03/02 2
EDIT2: As output I expect ID's of days that are in those groups. As ID's I mean DTD_GID column which is primary key in my DateDay dimension.
So for group 1) I'd get following list:
20161102
20161103
20161104
20161105
20161106
20161107
20161108
Here is one solution that gives you start and end date of each 7-day range:
WITH CTE1 AS (
SELECT DTD_Date, DATEDIFF(DAY, ROW_NUMBER() OVER (ORDER BY DTD_Date), DTD_Date) AS Group1
FROM #Table1
WHERE DTD_DayType = 0
), CTE2 AS (
SELECT DTD_Date, Group1, (ROW_NUMBER() OVER (PARTITION BY Group1 ORDER BY Group1) - 1) / 7 AS Group2
FROM CTE1
)
SELECT MIN(DTD_Date) AS DTD_From, MAX(DTD_Date) AS DTD_Upto, COUNT(DTD_Date) AS C
FROM CTE2
GROUP BY Group1, Group2
ORDER BY DTD_From
-- HAVING COUNT(*) >= 7
Output:
DTD_From | DTD_Upto | C
-----------+------------+--
2016-11-02 | 2016-11-08 | 7
2016-11-09 | 2016-11-09 | 1
2016-11-13 | 2016-11-19 | 7
2016-11-20 | 2016-11-26 | 7
2016-11-27 | 2016-11-30 | 4
Here is how it works:
The first CTE removes holidays and assigns a group number to remaining rows. Consecutive dates get same group number (see this question).
The second CTE assigns another group number to each row in each group. Row number 1-7 get 0, 8-14 get 1, and so on.
Finally you group the results by the group numbers.

SQL Date Range Query - Table Comparison

I have two SQL Server tables containing the following information:
Table t_venues:
venue_id is unique
venue_id | start_date | end_date
1 | 01/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 15/01/2014
4 | 20/01/2014 | 30/01/2014
Table t_venueuser:
venue_id is not unique
venue_id | start_date | end_date
1 | 02/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 10/01/2014
4 | 23/01/2014 | 25/01/2014
From these two tables I need to find the dates that haven't been selected for each range, so the output would look like this:
venue_id | start_date | end_date
1 | 01/01/2014 | 01/01/2014
3 | 11/01/2014 | 15/01/2014
4 | 20/01/2014 | 22/01/2014
4 | 26/01/2014 | 30/01/2014
I can compare the two tables and get the date ranges from t_venues to appear in my query using 'except' but I can't get the query to produce the non-selected dates. Any help would be appreciated.
Calendar Table!
Another perfect candidate for a calendar table. If you can't be bothered to search for one, here's one I made earlier.
Setup Data
DECLARE #t_venues table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venues (venue_id, start_date, end_date)
VALUES (1, '2014-01-01', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-15')
, (4, '2014-01-20', '2014-01-30')
;
DECLARE #t_venueuser table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venueuser (venue_id, start_date, end_date)
VALUES (1, '2014-01-02', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-10')
, (4, '2014-01-23', '2014-01-25')
;
The Query
SELECT t_venues.venue_id
, calendar.the_date
, CASE WHEN t_venueuser.venue_id IS NULL THEN 1 ELSE 0 END As is_available
FROM dbo.calendar /* see: http://gvee.co.uk/files/sql/dbo.numbers%20&%20dbo.calendar.sql for an example */
INNER
JOIN #t_venues As t_venues
ON t_venues.start_date <= calendar.the_date
AND t_venues.end_date >= calendar.the_date
LEFT
JOIN #t_venueuser As t_venueuser
ON t_venueuser.venue_id = t_venues.venue_id
AND t_venueuser.start_date <= calendar.the_date
AND t_venueuser.end_date >= calendar.the_date
ORDER
BY t_venues.venue_id
, calendar.the_date
;
The Result
venue_id the_date is_available
----------- ----------------------- ------------
1 2014-01-01 00:00:00.000 1
1 2014-01-02 00:00:00.000 0
2 2014-01-05 00:00:00.000 0
3 2014-01-09 00:00:00.000 0
3 2014-01-10 00:00:00.000 0
3 2014-01-11 00:00:00.000 1
3 2014-01-12 00:00:00.000 1
3 2014-01-13 00:00:00.000 1
3 2014-01-14 00:00:00.000 1
3 2014-01-15 00:00:00.000 1
4 2014-01-20 00:00:00.000 1
4 2014-01-21 00:00:00.000 1
4 2014-01-22 00:00:00.000 1
4 2014-01-23 00:00:00.000 0
4 2014-01-24 00:00:00.000 0
4 2014-01-25 00:00:00.000 0
4 2014-01-26 00:00:00.000 1
4 2014-01-27 00:00:00.000 1
4 2014-01-28 00:00:00.000 1
4 2014-01-29 00:00:00.000 1
4 2014-01-30 00:00:00.000 1
(21 row(s) affected)
The Explanation
Our calendar tables contains an entry for every date.
We join our t_venues (as an aside, if you have the choice, lose the t_ prefix!) to return every day between our start_date and end_date. Example output for venue_id=4 for just this join:
venue_id the_date
----------- -----------------------
4 2014-01-20 00:00:00.000
4 2014-01-21 00:00:00.000
4 2014-01-22 00:00:00.000
4 2014-01-23 00:00:00.000
4 2014-01-24 00:00:00.000
4 2014-01-25 00:00:00.000
4 2014-01-26 00:00:00.000
4 2014-01-27 00:00:00.000
4 2014-01-28 00:00:00.000
4 2014-01-29 00:00:00.000
4 2014-01-30 00:00:00.000
(11 row(s) affected)
Now we have one row per day, we [outer] join our t_venueuser table. We join this in much the same manner as before, but with one added twist: we need to join based on the venue_id too!
Running this for venue_id=4 gives this result:
venue_id the_date t_venueuser_venue_id
----------- ----------------------- --------------------
4 2014-01-20 00:00:00.000 NULL
4 2014-01-21 00:00:00.000 NULL
4 2014-01-22 00:00:00.000 NULL
4 2014-01-23 00:00:00.000 4
4 2014-01-24 00:00:00.000 4
4 2014-01-25 00:00:00.000 4
4 2014-01-26 00:00:00.000 NULL
4 2014-01-27 00:00:00.000 NULL
4 2014-01-28 00:00:00.000 NULL
4 2014-01-29 00:00:00.000 NULL
4 2014-01-30 00:00:00.000 NULL
(11 row(s) affected)
See how we have a NULL value for rows where there is no t_venueuser record. Genius, no? ;-)
So in my first query I gave you a quick CASE statement that shows availability (1=available, 0=not available). This is for illustration only, but could be useful to you.
You can then either wrap the query up and then apply an extra filter on this calculated column or simply add a where clause in: WHERE t_venueuser.venue_id IS NULL and that will do the same trick.
This is a complete hack, but it gives the results you require, I've only tested it on the data you provided so there may well be gotchas with larger sets.
In general what you are looking at solving here is a variation of gaps and islands problem ,this is (briefly) a sequence where some items are missing. The missing items are referred as gaps and the existing items are referred as islands. If you would like to understand this issue in general check a few of the articles:
Simple talk article
blogs.MSDN article
SO answers tagged gaps-and-islands
Code:
;with dates as
(
SELECT vdates.venue_id,
vdates.vdate
FROM ( SELECT DATEADD(d,sv.number,v.start_date) vdate
, v.venue_id
FROM t_venues v
INNER JOIN master..spt_values sv
ON sv.type='P'
AND sv.number BETWEEN 0 AND datediff(d, v.start_date, v.end_date)) vdates
LEFT JOIN t_venueuser vu
ON vdates.vdate >= vu.start_date
AND vdates.vdate <= vu.end_date
AND vdates.venue_id = vu.venue_id
WHERE ISNULL(vu.venue_id,-1) = -1
)
SELECT venue_id, ISNULL([1],[2]) StartDate, [2] EndDate
FROM (SELECT venue_id, rDate, ROW_NUMBER() OVER (PARTITION BY venue_id, DateType ORDER BY rDate) AS rType, DateType as dType
FROM( SELECT d1.venue_id
,d1.vdate AS rDate
,'1' AS DateType
FROM dates AS d1
LEFT JOIN dates AS d0
ON DATEADD(d,-1,d1.vdate) = d0.vdate
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 1
AND ISNULL(d0.vdate, '01 Jan 1753') = '01 Jan 1753'
UNION
SELECT d1.venue_id
,ISNULL(d2.vdate,d1.vdate)
,'2'
FROM dates AS d1
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 2
) res
) src
PIVOT (MIN (rDate)
FOR dType IN
( [1], [2] )
) AS pvt
Results:
venue_id StartDate EndDate
1 2014-01-01 2014-01-01
3 2014-01-11 2014-01-15
4 2014-01-20 2014-01-22
4 2014-01-26 2014-01-30