sql how to add missing week to the table - sql

I have a table that has this data: Date when the employees reported and the week start-date(Monday) for that week. Now they did not work all the dates. For example there is no data on week of christmas. Is there a way I can add the missing week.So, I will still have the week start-date for each and every week. But the report-date can be null.
I cannot declare variables
This is what I have
and this is what i want to add the missing week

Query
SQLFIDDLEEXAMPLE:
CREATE TABLE tb
(
d1 date,
d2 date
);
INSERT INTO tb
(d1, d2)
VALUES
('2015-12-10', '2015-12-07'),
('2015-12-15', '2015-12-14'),
('2015-12-29', '2015-12-28'),
('2016-01-05', '2016-01-04');
SET DATEFIRST 1
INSERT INTO tb
( d1, d2 )
select null, DATEADD(day,number,'2015-01-01')
FROM master..spt_values t1
LEFT JOIN tb t2
ON DATEADD(day,number,'2015-01-01') = t2.d2
WHERE type = 'P'
AND DATEADD(day,number,'2015-01-01') >= '2015-12-01'
AND DATEADD(day,number,'2015-01-01') <= '2016-01-04'
AND DATEPART(weekday,DATEADD(day,number,'2015-01-01')) = 1
AND t2.d2 is null
SELECT *
FROM tb
Result:
| d1 | d2 |
|------------|------------|
| 2015-12-10 | 2015-12-07 |
| 2015-12-15 | 2015-12-14 |
| 2015-12-29 | 2015-12-28 |
| 2016-01-05 | 2016-01-04 |
| (null) | 2015-12-21 |

You can create a new Calendar/Weeks table containing all the weeks in the year. This table should be in advance.
You can then make a reference from your data table to this calendar table (by id or week/year).
Your report should be based on the calendar table with an outer join to your data table.
This way your report will contain all weeks even if some weeks don't have any data.
EDIT: You would need a new table like this:
Week:
| Start date | End date |
| 12/07/15 | 12/13/15 |
| 12/14/15 | 12/20/15 |
| 12/21/15 | 12/27/15 |
etc...

Assuming that #weekly_calendar table contains your valid work weeks (i.e., for Dec 2015). By the way, syntax is for MSSQL. You should specify what database you are using.
You can also dynamically create the calendar on run-time. This is just to show the concept in an easy to understand way.
-- week start dates
-- 2015-12-01
-- 2015-12-07
-- 2015-12-14
-- 2015-12-21
-- 2015-12-28
create table #weekly_calendar (
week_start_date datetime,
week_end_date datetime
)
Assuming that #report_date contains the report date of the employee.
-- report dates
-- 2015-12-02
-- 2015-12-15
-- 2015-12-29
create table #report_date (
report_date datetime
)
This is how you display the unreported dates.
select * from #weekly_calendar w
left join #report_date r
on r.report_date between w.week_start_date and w.week_end_date
If you do not have the week_end_date. Again, assuming your work days start from Monday to Friday.
select * from #weekly_calendar w
left join #report_date r
on r.report_date between w.week_start_date and DATEADD(dd, 6-(DATEPART(dw, w.week_end_date)), w.week_end_date)

Related

Add rows between two dates Presto

I have a table that has 3 columns- start, end and emp_num. I want to generate a new table which has all dates between these dates for every employee. Need to use Presto.
I refered this link - inserting dates into a table between a start and end date in Presto
Tried using unnest function by creating sequence but , I don't know how do I create sequence by pulling dates from two columns in another table.
select unnest(seq) as t(days)
from (select sequence(start, end, interval '1' day) as seq
from table1)
Here's table and expected format
Table 1:
start | end | emp_num
2018/01/01 | 2018/01/05 | 1
2019/02/01 | 2019/02/05 | 2
Expected:
start | emp_num
2018/01/01 | 1
2018/01/02 | 1
2018/01/03 | 1
2018/01/04 | 1
2018/01/05 | 1
2019/02/01 | 2
2019/01/02 | 2
2019/02/03 | 2
2019/02/04 | 2
2019/02/05 | 2
Here is a query that might get the job done for your use case.
The logic is to use Presto sequence() function to generate a wide date range (since year 2000 to end of 2018, you can adapt that as needed), that can be joined with the table to generate the output.
select dt.x, emp_num
from
( select x from unnest(sequence(date '2000-01-01', date '2018-01-31')) t(x) ) dt
inner join table1 ta on dt.x >= ta.start and dt.x <= ta.end
However, as commented JNevill, it would be more efficient to create a calendar table rather than generating it on the fly every time the query runs.
It should be a simple as :
create table calendar as
select x from unnest(sequence(date '1970-01-01', date '2099-01-01')) t(x);
And then your query would become :
select dt.x, emp_num
from
calendar dt
inner join table1 ta on dt.x >= ta.start and dt.x <= ta.end
PS : due to the lack of DB Fiddles for Presto in the wild, I could not test the queries (#PiotrFindeisen - if you happen to read this - a Presto fiddle would be nice to have !).

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

How to identify MIN value for records within a rolling date range in SQL

I am trying to calculate a MIN date by Patient_ID for each record in my dataset that dynamically references the last 30 days from the date (Discharge_Dt) on that row. My initial thought was to use a window function, but I opted for a subquery, which is close, but not quite what I need.
Please note, my sample query is also missing logic that limits the MIN Discharge_Dt to the last 30 days, in other words, I do not want a MIN Discharge_Dt that is older than 30 days for any given row.
Sample Query:
SELECT Patient_ID,
Discharge_Dt,
/* Calculating the MIN Discharge_Dt by Patient_ID for the last 30
days based upon the Discharge_Dt for that row */
(SELECT MIN(Discharge_Dt)
FROM admissions_ds AS b
WHERE a.Patient_ID = b.Patient_ID AND
a.Discharge_Dt >= DATEADD('D', -30, GETDATE())) AS MIN_Dt
FROM admissions_ds AS a
Desired Output Table:
Patient_ID | Discharge_Dt | MIN_Dt
10 | 2017-08-15 | 2017-08-15
10 | 2017-08-31 | 2017-08-15
10 | 2017-09-21 | 2017-08-31
15 | 2017-07-01 | 2017-07-01
15 | 2017-07-18 | 2017-07-01
20 | 2017-05-05 | 2017-05-05
25 | 2017-09-24 | 2017-09-24
Here you go,
Just a simple join required.
drop TABLE if EXISTS admissions_ds;
create table admissions_ds (Patient_ID int,Discharge_Dt date);
insert into admissions_ds
values
(10,'2017-08-15'),
(10,'2017-08-31'),
(10,'2017-09-21'),
(15,'2017-07-01'),
(15,'2017-07-18'),
(20,'2017-05-05'),
(25,'2017-09-24');
select t1.Patient_ID,t1.Discharge_Dt,min(t2.Discharge_Dt) as min_dt
from admissions_ds as t1
join admissions_ds as t2 on t1.Patient_ID=t2.Patient_ID and t2.Discharge_Dt > t1.Discharge_Dt - interval '30 days'
group by 1,2
order by 1,2
;

Discard existing dates that are included in the result, SQL Server

In my database I have a Reservation table and it has three columns Initial Day, Last Day and the House Id.
I want to count the total days and omit those who are repeated, for example:
+-------------+------------+------------+
| | Results | |
+-------------+------------+------------+
| House Id | InitialDay | LastDay |
+-------------+------------+------------+
| 1 | 2017-09-18 | 2017-09-20 |
| 1 | 2017-09-18 | 2017-09-22 |
| 19 | 2017-09-18 | 2017-09-22 |
| 20 | 2017-09-18 | 2017-09-22 |
+-------------+------------+------------+
If you noticed the House Id with the number 1 has two rows, and each row has dates but the first row is in the interval of dates of the second row. In total the number of days should be 5 because the first shouldn't be counted as those days already exist in the second.
The reason why this is happening is that each house has two rooms, and different persons can stay in that house on the same dates.
My question is: how can I omit those cases, and only count the real days the house was occupied?
In your are using SQL Server 2012 or higher you can use LAG() to get the previous final date and adjust the initial date:
with ReservationAdjusted as (
select *,
lag(LastDay) over(partition by HouseID order by InitialDay, LastDay) as PreviousLast
from Reservation
)
select HouseId,
sum(case when PreviousLast>LastDay then 0 -- fully contained in the previous reservation
when PreviousLast>=InitialDay then datediff(day,PreviousLast,LastDay) -- overlap
else datediff(day,InitialDay,LastDay)+1 -- no overlap
end) as Days
from ReservationAdjusted
group by HouseId
The cases are:
The reservation is fully included in the previous reservation: we only need to compare end dates because the previous row is obtained ordering by InitialDay, LastDay, so the previous start date is always minor or equal than the current start date.
The current reservation overlaps with the previous: in this case we adjust the start and don't add 1 (the initial day is already counted), this case include when the previous end is equal to the current start (is a one day overlap).
There is no overlap: we just calculate the difference and add 1 to count also the initial day.
Note that we don't need extra condition for the reservation of a HouseID because by default the LAG() function returns NULL when there isn't a previous row, and comparisons with null always are false.
Sample input and output:
| HouseId | InitialDay | LastDay |
|---------|------------|------------|
| 1 | 2017-09-18 | 2017-09-20 |
| 1 | 2017-09-18 | 2017-09-22 |
| 1 | 2017-09-21 | 2017-09-22 |
| 19 | 2017-09-18 | 2017-09-27 |
| 19 | 2017-09-24 | 2017-09-26 |
| 19 | 2017-09-29 | 2017-09-30 |
| 20 | 2017-09-19 | 2017-09-22 |
| 20 | 2017-09-22 | 2017-09-26 |
| 20 | 2017-09-24 | 2017-09-27 |
| HouseId | Days |
|---------|------|
| 1 | 5 |
| 19 | 12 |
| 20 | 9 |
select house_id,min(initialDay),max(LastDay)
group by houseId
If I understood correctly!
Try out and let me know how it works out for you.
Ted.
While thinking through your question I came across the wonder that is the idea of a Calendar table. You'd use this code to create one, with whatever range of dates your want for your calendar. Code is from http://blog.jontav.com/post/9380766884/calendar-tables-are-incredibly-useful-in-sql
declare #start_dt as date = '1/1/2010';
declare #end_dt as date = '1/1/2020';
declare #dates as table (
date_id date primary key,
date_year smallint,
date_month tinyint,
date_day tinyint,
weekday_id tinyint,
weekday_nm varchar(10),
month_nm varchar(10),
day_of_year smallint,
quarter_id tinyint,
first_day_of_month date,
last_day_of_month date,
start_dts datetime,
end_dts datetime
)
while #start_dt < #end_dt
begin
insert into #dates(
date_id, date_year, date_month, date_day,
weekday_id, weekday_nm, month_nm, day_of_year, quarter_id,
first_day_of_month, last_day_of_month,
start_dts, end_dts
)
values(
#start_dt, year(#start_dt), month(#start_dt), day(#start_dt),
datepart(weekday, #start_dt), datename(weekday, #start_dt), datename(month, #start_dt), datepart(dayofyear, #start_dt), datepart(quarter, #start_dt),
dateadd(day,-(day(#start_dt)-1),#start_dt), dateadd(day,-(day(dateadd(month,1,#start_dt))),dateadd(month,1,#start_dt)),
cast(#start_dt as datetime), dateadd(second,-1,cast(dateadd(day, 1, #start_dt) as datetime))
)
set #start_dt = dateadd(day, 1, #start_dt)
end
select *
into Calendar
from #dates
Once you have a calendar table your query is as simple as:
select distinct t.House_id, c.date_id
from Reservation as r
inner join Calendar as c
on
c.date_id >= r.InitialDay
and c.date_id <= r.LastDay
Which gives you a row for each unique day each room was occupied. If you need a sum of how many days each room was occupied it becomes:
select a.House_id, count(a.House_id) as Days_occupied
from
(select distinct t.House_id, c.date_id
from so_test as t
inner join Calendar as c
on
c.date_id >= t.InitialDay
and c.date_id <= t.LastDay) as a
group by a.House_id
Create a table of all the possible dates and then join it to the Reservations table so that you have a list of all days between InitialDay and LastDay. Like this:
DECLARE #i date
DECLARE #last date
CREATE TABLE #temp (Date date)
SELECT #i = MIN(Date) FROM Reservations
SELECT #last = MAX(Date) FROM Reservations
WHILE #i <= #last
BEGIN
INSERT INTO #temp VALUES(#i)
SET #i = DATEADD(day, 1, #i)
END
SELECT HouseID, COUNT(*) FROM
(
SELECT DISTINCT HouseID, Date FROM Reservation
LEFT JOIN #temp
ON Reservation.InitialDay <= #temp.Date
AND Reservation.LastDay >= #temp.Date
) AS a
GROUP BY HouseID
DROP TABLE #temp

Calculate Date Range count for each day SQL Server 2012

I'm having trouble with getting the records for the following
| DATEFROM | DATETO
| 2012-01-02 | 2012-01-03
| 2012-01-11 | 2012-01-16
| 2012-01-08 | 2012-01-22
| 2012-01-29 | 2012-01-30
| 2012-01-08 | 2012-01-11
I'm trying to get count of ranges containing the day for each day from beginning of first range ending last date of last range.
Sample output:
2012-01-02 | 1
2012-01-03 | 1
2012-01-08 | 2
2012-01-09 | 2
2012-01-10 | 2
2012-01-11 | 3
2012-01-12 | 2
2012-01-13 | 2
2012-01-14 | 2
2012-01-15 | 2
2012-01-16 | 2
......
My database contains data from 2008 to nowadays.
In other words I am trying to get how many times a record is found for a specific date.
Not every day is in the TABLE for each month
I found this post Tricky mysql count occurrences of each day within date range but can't convert the code provided to my SQL Server 2012.
You could try here
http://sqlfiddle.com/#!6/0855b/1
Ok, this is one way to do what you want:
DECLARE #MinDate DATE, #MaxDate DATE;
SELECT #MinDate = MIN(DATEFROM),
#MaxDate = MAX(DATETO)
FROM ENTRIES;
WITH Dates AS
(
SELECT DATEADD(DAY,number,#MinDate) [Date]
FROM master.dbo.spt_values
WHERE type = 'P'
AND number > 0
AND DATEADD(DAY,number,#MinDate) <= #MaxDate
)
SELECT A.[Date],
COUNT(*) N
FROM Dates A
LEFT JOIN Entries B
ON A.[Date] >= B.DATEFROM
AND A.[Date] <= B.DATETO
GROUP BY A.[Date]
ORDER BY A.[Date]
If the range dates is over 2047 days, then you'll need to create more values than the ones that are available in master.dbo.spt_values (this is trivial, for instance you can use a CROSS JOIN).
Here is the sqlfiddle for you to try.