Calculate Date Range count for each day SQL Server 2012 - sql

I'm having trouble with getting the records for the following
| DATEFROM | DATETO
| 2012-01-02 | 2012-01-03
| 2012-01-11 | 2012-01-16
| 2012-01-08 | 2012-01-22
| 2012-01-29 | 2012-01-30
| 2012-01-08 | 2012-01-11
I'm trying to get count of ranges containing the day for each day from beginning of first range ending last date of last range.
Sample output:
2012-01-02 | 1
2012-01-03 | 1
2012-01-08 | 2
2012-01-09 | 2
2012-01-10 | 2
2012-01-11 | 3
2012-01-12 | 2
2012-01-13 | 2
2012-01-14 | 2
2012-01-15 | 2
2012-01-16 | 2
......
My database contains data from 2008 to nowadays.
In other words I am trying to get how many times a record is found for a specific date.
Not every day is in the TABLE for each month
I found this post Tricky mysql count occurrences of each day within date range but can't convert the code provided to my SQL Server 2012.
You could try here
http://sqlfiddle.com/#!6/0855b/1

Ok, this is one way to do what you want:
DECLARE #MinDate DATE, #MaxDate DATE;
SELECT #MinDate = MIN(DATEFROM),
#MaxDate = MAX(DATETO)
FROM ENTRIES;
WITH Dates AS
(
SELECT DATEADD(DAY,number,#MinDate) [Date]
FROM master.dbo.spt_values
WHERE type = 'P'
AND number > 0
AND DATEADD(DAY,number,#MinDate) <= #MaxDate
)
SELECT A.[Date],
COUNT(*) N
FROM Dates A
LEFT JOIN Entries B
ON A.[Date] >= B.DATEFROM
AND A.[Date] <= B.DATETO
GROUP BY A.[Date]
ORDER BY A.[Date]
If the range dates is over 2047 days, then you'll need to create more values than the ones that are available in master.dbo.spt_values (this is trivial, for instance you can use a CROSS JOIN).
Here is the sqlfiddle for you to try.

Related

Question: Joining two data sets with date conditions

I'm pretty new with SQL, and I'm struggling to figure out a seemingly simple task.
Here's the situation:
I'm working with two data sets
Data Set A, which is the most accurate but only refreshes every quarter
Data Set B, which has all the date, including the most recent data, but is overall less accurate
My goal is to combine both data sets where I would have Data Set A for all data up to the most recent quarter and Data Set B for anything after (i.e., all recent data not captured in Data Set A)
For example:
Data Set A captures anything from Q1 2020 (January to March)
Let's say we are April 15th
Data Set B captures anything from Q1 2020 to the most current date, April 15th
My goal is to use Data Set A for all data from January to March 2020 (Q1) and then Data Set B for all data from April 1 to 15
Any thoughts or advice on how to do this? Potentially a join function along with a date one?
Any help would be much appreciated.
Thanks in advance for the help.
I hope I got your question right.
I put in some sample data that might match your description: a date and an amount. To keep it simple, one row per any month. You can extract the quarter from a date, and keep that as an additional column, and then filter by that down the line.
WITH
-- some sample data: date and amount ...
indata(dt,amount) AS (
SELECT DATE '2020-01-15', 234.45
UNION ALL SELECT DATE '2020-02-15', 344.45
UNION ALL SELECT DATE '2020-03-15', 345.45
UNION ALL SELECT DATE '2020-04-15', 346.45
UNION ALL SELECT DATE '2020-05-15', 347.45
UNION ALL SELECT DATE '2020-06-15', 348.45
UNION ALL SELECT DATE '2020-07-15', 349.45
UNION ALL SELECT DATE '2020-08-15', 350.45
UNION ALL SELECT DATE '2020-09-15', 351.45
UNION ALL SELECT DATE '2020-10-15', 352.45
UNION ALL SELECT DATE '2020-11-15', 353.45
UNION ALL SELECT DATE '2020-12-15', 354.45
)
-- real query starts here ...
SELECT
EXTRACT(QUARTER FROM dt) AS the_quarter
, CAST(
TIMESTAMPADD(
QUARTER
, CAST(EXTRACT(QUARTER FROM dt) AS INTEGER)-1
, TRUNC(dt,'YEAR')
)
AS DATE
) AS qtr_start
, *
FROM indata;
-- out the_quarter | qtr_start | dt | amount
-- out -------------+------------+------------+--------
-- out 1 | 2020-01-01 | 2020-01-15 | 234.45
-- out 1 | 2020-01-01 | 2020-02-15 | 344.45
-- out 1 | 2020-01-01 | 2020-03-15 | 345.45
-- out 2 | 2020-04-01 | 2020-04-15 | 346.45
-- out 2 | 2020-04-01 | 2020-05-15 | 347.45
-- out 2 | 2020-04-01 | 2020-06-15 | 348.45
-- out 3 | 2020-07-01 | 2020-07-15 | 349.45
-- out 3 | 2020-07-01 | 2020-08-15 | 350.45
-- out 3 | 2020-07-01 | 2020-09-15 | 351.45
-- out 4 | 2020-10-01 | 2020-10-15 | 352.45
-- out 4 | 2020-10-01 | 2020-11-15 | 353.45
-- out 4 | 2020-10-01 | 2020-12-15 | 354.45
If you filter by quarter, you can group your data by that column ...

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

Postgres query for calendar

I am trying to write a query to retrieve data from an events query for a simple calendar app.
The table structure is as followed:
table name: events
Column | Type
---------+-----------
id | integer
start | timestamp
end | timestamp
the data inside of the table
id| start | end
--+---------------------+--------------------
1 | 2017-09-01 12:00:00 | 2017-09-01 12:00:00
2 | 2017-09-03 10:00:00 | 2017-09-03 12:00:00
3 | 2017-09-08 12:00:00 | 2017-09-11 12:00:00
4 | 2017-09-11 12:00:00 | 2017-09-11 12:00:00
the expected result is
date | event.id
-----------+---------
2017-09-01 | 1
2017-09-03 | 2
2017-09-08 | 3
2017-09-09 | 3
2017-09-10 | 3
2017-09-11 | 3
2017-09-11 | 4
As you can see, only days with an event (not just start and end, but also the days in between) is retrieved, days without an event are not retrieved at all.
In the second step I would like to be able to limit the amount of distinct days, e.g. "get 4 days with events" what might be more than 4 rows.
Right now I am able to retrieve the events based on start date only using the following query:
SELECT start::date, id FROM events WHERE events.start::date >= '2017-09-01' LIMIT 3
Thinks I already though about are DENSE_RANK and generate_series, but up to now I didn't find a way to fill the gaps between start and end, but not on days where there are no data.
So in short:
What I want to get is: get the next X days where there is an event. A date with an event is a day where start <= date >= end
Any ideas ?
Edit
Thanks to Tim I have now the following query (modified to use generate_series instead of a table and added a limit using dense_rank):
select date, id FROM (
SELECT
DENSE_RANK() OVER (ORDER BY t1.date) as rank,
t1.date,
events.id
FROM
generate_series([DATE]::date, [DATE]::date + interval '365 day', '1 day') as t1
INNER JOIN
events
ON t1.date BETWEEN events.start::date AND events."end"::date
) as t
WHERE rank <= [LIMIT]
This is working really good, even though I am not 100% sure about the performance hit with this kind of limit
I think you really need a calendar table here to cover the full range of dates in which your data may appear. In the first CTE below, I generate a table covering the month of September 2017. Then all we need to do is inner join this calendar table with the events table on the criteria of a given day appearing within a given range.
WITH cte AS (
SELECT CAST('2017-09-01' AS DATE) + (n || ' day')::INTERVAL AS date
FROM generate_series(0, 29) n
)
SELECT
t1.date,
t2.id
FROM cte t1
INNER JOIN events t2
ON t1.date BETWEEN CAST(t2.start AS DATE) AND CAST(t2.end AS DATE);
Output:
date id
1 01.09.2017 00:00:00 1
2 03.09.2017 00:00:00 2
3 08.09.2017 00:00:00 3
4 09.09.2017 00:00:00 3
5 10.09.2017 00:00:00 3
6 11.09.2017 00:00:00 3
7 11.09.2017 00:00:00 4
Demo here:
Rextester

Discard existing dates that are included in the result, SQL Server

In my database I have a Reservation table and it has three columns Initial Day, Last Day and the House Id.
I want to count the total days and omit those who are repeated, for example:
+-------------+------------+------------+
| | Results | |
+-------------+------------+------------+
| House Id | InitialDay | LastDay |
+-------------+------------+------------+
| 1 | 2017-09-18 | 2017-09-20 |
| 1 | 2017-09-18 | 2017-09-22 |
| 19 | 2017-09-18 | 2017-09-22 |
| 20 | 2017-09-18 | 2017-09-22 |
+-------------+------------+------------+
If you noticed the House Id with the number 1 has two rows, and each row has dates but the first row is in the interval of dates of the second row. In total the number of days should be 5 because the first shouldn't be counted as those days already exist in the second.
The reason why this is happening is that each house has two rooms, and different persons can stay in that house on the same dates.
My question is: how can I omit those cases, and only count the real days the house was occupied?
In your are using SQL Server 2012 or higher you can use LAG() to get the previous final date and adjust the initial date:
with ReservationAdjusted as (
select *,
lag(LastDay) over(partition by HouseID order by InitialDay, LastDay) as PreviousLast
from Reservation
)
select HouseId,
sum(case when PreviousLast>LastDay then 0 -- fully contained in the previous reservation
when PreviousLast>=InitialDay then datediff(day,PreviousLast,LastDay) -- overlap
else datediff(day,InitialDay,LastDay)+1 -- no overlap
end) as Days
from ReservationAdjusted
group by HouseId
The cases are:
The reservation is fully included in the previous reservation: we only need to compare end dates because the previous row is obtained ordering by InitialDay, LastDay, so the previous start date is always minor or equal than the current start date.
The current reservation overlaps with the previous: in this case we adjust the start and don't add 1 (the initial day is already counted), this case include when the previous end is equal to the current start (is a one day overlap).
There is no overlap: we just calculate the difference and add 1 to count also the initial day.
Note that we don't need extra condition for the reservation of a HouseID because by default the LAG() function returns NULL when there isn't a previous row, and comparisons with null always are false.
Sample input and output:
| HouseId | InitialDay | LastDay |
|---------|------------|------------|
| 1 | 2017-09-18 | 2017-09-20 |
| 1 | 2017-09-18 | 2017-09-22 |
| 1 | 2017-09-21 | 2017-09-22 |
| 19 | 2017-09-18 | 2017-09-27 |
| 19 | 2017-09-24 | 2017-09-26 |
| 19 | 2017-09-29 | 2017-09-30 |
| 20 | 2017-09-19 | 2017-09-22 |
| 20 | 2017-09-22 | 2017-09-26 |
| 20 | 2017-09-24 | 2017-09-27 |
| HouseId | Days |
|---------|------|
| 1 | 5 |
| 19 | 12 |
| 20 | 9 |
select house_id,min(initialDay),max(LastDay)
group by houseId
If I understood correctly!
Try out and let me know how it works out for you.
Ted.
While thinking through your question I came across the wonder that is the idea of a Calendar table. You'd use this code to create one, with whatever range of dates your want for your calendar. Code is from http://blog.jontav.com/post/9380766884/calendar-tables-are-incredibly-useful-in-sql
declare #start_dt as date = '1/1/2010';
declare #end_dt as date = '1/1/2020';
declare #dates as table (
date_id date primary key,
date_year smallint,
date_month tinyint,
date_day tinyint,
weekday_id tinyint,
weekday_nm varchar(10),
month_nm varchar(10),
day_of_year smallint,
quarter_id tinyint,
first_day_of_month date,
last_day_of_month date,
start_dts datetime,
end_dts datetime
)
while #start_dt < #end_dt
begin
insert into #dates(
date_id, date_year, date_month, date_day,
weekday_id, weekday_nm, month_nm, day_of_year, quarter_id,
first_day_of_month, last_day_of_month,
start_dts, end_dts
)
values(
#start_dt, year(#start_dt), month(#start_dt), day(#start_dt),
datepart(weekday, #start_dt), datename(weekday, #start_dt), datename(month, #start_dt), datepart(dayofyear, #start_dt), datepart(quarter, #start_dt),
dateadd(day,-(day(#start_dt)-1),#start_dt), dateadd(day,-(day(dateadd(month,1,#start_dt))),dateadd(month,1,#start_dt)),
cast(#start_dt as datetime), dateadd(second,-1,cast(dateadd(day, 1, #start_dt) as datetime))
)
set #start_dt = dateadd(day, 1, #start_dt)
end
select *
into Calendar
from #dates
Once you have a calendar table your query is as simple as:
select distinct t.House_id, c.date_id
from Reservation as r
inner join Calendar as c
on
c.date_id >= r.InitialDay
and c.date_id <= r.LastDay
Which gives you a row for each unique day each room was occupied. If you need a sum of how many days each room was occupied it becomes:
select a.House_id, count(a.House_id) as Days_occupied
from
(select distinct t.House_id, c.date_id
from so_test as t
inner join Calendar as c
on
c.date_id >= t.InitialDay
and c.date_id <= t.LastDay) as a
group by a.House_id
Create a table of all the possible dates and then join it to the Reservations table so that you have a list of all days between InitialDay and LastDay. Like this:
DECLARE #i date
DECLARE #last date
CREATE TABLE #temp (Date date)
SELECT #i = MIN(Date) FROM Reservations
SELECT #last = MAX(Date) FROM Reservations
WHILE #i <= #last
BEGIN
INSERT INTO #temp VALUES(#i)
SET #i = DATEADD(day, 1, #i)
END
SELECT HouseID, COUNT(*) FROM
(
SELECT DISTINCT HouseID, Date FROM Reservation
LEFT JOIN #temp
ON Reservation.InitialDay <= #temp.Date
AND Reservation.LastDay >= #temp.Date
) AS a
GROUP BY HouseID
DROP TABLE #temp

sql how to add missing week to the table

I have a table that has this data: Date when the employees reported and the week start-date(Monday) for that week. Now they did not work all the dates. For example there is no data on week of christmas. Is there a way I can add the missing week.So, I will still have the week start-date for each and every week. But the report-date can be null.
I cannot declare variables
This is what I have
and this is what i want to add the missing week
Query
SQLFIDDLEEXAMPLE:
CREATE TABLE tb
(
d1 date,
d2 date
);
INSERT INTO tb
(d1, d2)
VALUES
('2015-12-10', '2015-12-07'),
('2015-12-15', '2015-12-14'),
('2015-12-29', '2015-12-28'),
('2016-01-05', '2016-01-04');
SET DATEFIRST 1
INSERT INTO tb
( d1, d2 )
select null, DATEADD(day,number,'2015-01-01')
FROM master..spt_values t1
LEFT JOIN tb t2
ON DATEADD(day,number,'2015-01-01') = t2.d2
WHERE type = 'P'
AND DATEADD(day,number,'2015-01-01') >= '2015-12-01'
AND DATEADD(day,number,'2015-01-01') <= '2016-01-04'
AND DATEPART(weekday,DATEADD(day,number,'2015-01-01')) = 1
AND t2.d2 is null
SELECT *
FROM tb
Result:
| d1 | d2 |
|------------|------------|
| 2015-12-10 | 2015-12-07 |
| 2015-12-15 | 2015-12-14 |
| 2015-12-29 | 2015-12-28 |
| 2016-01-05 | 2016-01-04 |
| (null) | 2015-12-21 |
You can create a new Calendar/Weeks table containing all the weeks in the year. This table should be in advance.
You can then make a reference from your data table to this calendar table (by id or week/year).
Your report should be based on the calendar table with an outer join to your data table.
This way your report will contain all weeks even if some weeks don't have any data.
EDIT: You would need a new table like this:
Week:
| Start date | End date |
| 12/07/15 | 12/13/15 |
| 12/14/15 | 12/20/15 |
| 12/21/15 | 12/27/15 |
etc...
Assuming that #weekly_calendar table contains your valid work weeks (i.e., for Dec 2015). By the way, syntax is for MSSQL. You should specify what database you are using.
You can also dynamically create the calendar on run-time. This is just to show the concept in an easy to understand way.
-- week start dates
-- 2015-12-01
-- 2015-12-07
-- 2015-12-14
-- 2015-12-21
-- 2015-12-28
create table #weekly_calendar (
week_start_date datetime,
week_end_date datetime
)
Assuming that #report_date contains the report date of the employee.
-- report dates
-- 2015-12-02
-- 2015-12-15
-- 2015-12-29
create table #report_date (
report_date datetime
)
This is how you display the unreported dates.
select * from #weekly_calendar w
left join #report_date r
on r.report_date between w.week_start_date and w.week_end_date
If you do not have the week_end_date. Again, assuming your work days start from Monday to Friday.
select * from #weekly_calendar w
left join #report_date r
on r.report_date between w.week_start_date and DATEADD(dd, 6-(DATEPART(dw, w.week_end_date)), w.week_end_date)