How to extrapolate dates in SQL Server to calculate the daily counts? - sql

This is how the data looks like. It's a long table
I need to calculate the number of people employed by day
How to write SQL Server logic to get this result? I treid to create a DATES table and then join, but this caused an error because the table is too big. Do I need a recursive logic?

For future questions, don't post images of data. Instead, use a service like dbfiddle. I'll anyhow add a sketch for an answer, with a better-prepared question you could have gotten a complete answer. Anyhow here it goes:
-- extrema is the least and the greatest date in staff table
with extrema(mn, mx) as (
select least(min(hired),min(retired)) as mn
, greatest(max(hired),max(retired)) as mx
from staff
), calendar (dt) as (
-- we construct a calendar with every date between extreme values
select mn from extrema
union all
select dateadd(day, 1, d)
from calendar
where dt < (select mx from extrema)
)
-- finally we can count the number of employed people for each such date
select dt, count(1)
from calendar c
join staff s
on c.dt between s.hired and s.retired
group by dt;
If you find yourself doing this kind of calculation often, it is a good idea to create a calendar table. You can add other attributes to it such as if it is a day of in the middle of the week etc.
With a constraint as:
CHECK(hired <= retired)
the first part can be simplified to:
with extrema(mn, mx) as (
select min(hired) as mn
, max(retired) as mx
from staff
),

Assuming Current Employees have a NULL retirement date
Declare #Date1 date = '2015-01-01'
Declare #Date2 date = getdate()
Select A.Date
,HeadCount = count(B.name)
From ( Select Top (DateDiff(DAY,#Date1,#Date2)+1)
Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1,master..spt_values n2
) A
Left Join YourTable B on A.Date >= B.Hired and A.Date <= coalesce(B.Retired,getdate())
Group BY A.Date

You need a calendar table for this. You start with the calendar, and LEFT JOIN everything else, using BETWEEN logic.
You can use a real table. Or you can generate it on the fly, like this:
WITH
L0 AS ( SELECT c = 1
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT c = 1 FROM L0 A, L0 B, L0 C, L0 D ),
Nums AS ( SELECT rownum = ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM L1 ),
Dates AS (
SELECT TOP (DATEDIFF(day, '20141231', GETDATE()))
Date = DATEADD(day, rownum, '20141231')
FROM Nums
)
SELECT
d.Date,
NumEmployed = COUNT(*)
FROM Dates d
JOIN YourTable t ON d.Date BETWEEN t.Hired AND t.Retired
GROUP BY
d.Date;
If your dates have a time component then you need to use >= AND < logic

Try limiting the scope of your date table. In this example I have a table of dates named TallyStickDT.
SELECT dt, COUNT(name)
FROM (
SELECT dt
FROM tallystickdt
WHERE dt >= (SELECT MIN(hired) FROM #employees)
AND dt <= GETDATE()
) A
LEFT OUTER JOIN #employees E ON A.dt >= E.Hired AND A.dt <= e.retired
GROUP BY dt
ORDER BY dt

Related

How to partition my data by a specific date and another identifier SQL

with cte as
(
select to_date('01-JUN-2020','DD-MON-YYYY')+(level-1) DT
from dual
connect bY level<= 30
)
select *
from cte x
left outer join
(select date from time where emp in (1, 2)) a on x.dt = a.date
In this scenario I am trying to find the missing days that these persons didn't report to work... it works well for 1 person. I get back their missing days correctly. But when I add 2 persons.. I do not get back the correct missing days for them because I'm only joining on date I guess.
I would like to know how I can partition this data by the persons id and date to be able get accurate days that each were missing.
Please help, thanks.
You would typically cross join the list of dates with the list of persons, and then use not exists to pull out the missing person/date tuples:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp
from cte c
cross join (select distinct emp from times) e
where not exists (
select 1
from times t
where t.emp = e.emp and t.dt = e.date
)
Note that this uses a literal date rather than to_date(), which is more appropriate here.
This gives the missing tuples for all persons at once. If you want just for a predefined list of persons, then:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp
from cte c
cross join (select 1 emp from dual union all select 2 from dual) e
where not exists (
select 1
from times t
where t.emp = e.emp and t.dt = e.date
)
If you want to also see the "presence" dates, then use a left join rather than not exists, as in your original query:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp, -- enumerate the relevant columns from "t" here
from cte c
cross join (select 1 emp from dual union all select 2 from dual) e
left join times t on t.emp = e.emp and t.dt = e.date

How to loop through dates, 30 times to find historical information at points in time?

I am trying to find out the transaction details of customers at different points in time in the past. I came up with a query but I have to change the date every single time in my declare statement. Is there a way to loop through the dates to get data back x amount of days in the past?
DECLARE #StartDate AS Date
SET #StartDate = '2018-10-30'
SELECT u.member_id AS Member_id
,CAST(MAX(pe.executed_time) AS Date) AS Max_Date
,DATEDIFF(dd,#StartDate,MAX(pe.executed_time))*-1+1 AS R
,COUNT(*) AS F
,SUM(pi.Price) AS M
FROM [user] AS u
LEFT JOIN purchase_entry AS pe
ON u.member_id=pe.member_id
LEFT JOIN purchase_item AS pi
ON pe.order_id=pi.order_id
WHERE 1=1
AND pe.[status] = 'purchase'
AND pe.executed_time <= #StartDate -- Change the declared Date to have historical information.
GROUP BY u.id, u.member_id
ORDER BY Recency
I need a way to have my query to loop through the #StartDate for 30 days in the past for instance. For ex: #StartDate = '2018-11-30' then #StartDate '2018-11-29' , and so on ...
You can list the dates that you want:
WITH dates as (
SELECT v.*
FROM (VALUES (CONVERT(DATE, '2018-11-30')), (CONVERT(DATE, '2018-11-29'))
) v(dte)
)
SELECT dates.dte, u.member_id, . . .
FROM dates CROSS JOIN
[user] u JOIN
purchase_entry pe
ON u.member_id = pe.member_id LEFT JOIN
purchase_item pi
ON pe.order_id = pi.order_id
WHERE pe.[status] = 'purchase' AND
pe.executed_time <= date.dte -- Change the declared Date to have historical information.
GROUP BY v.dte, u.id, u.member_id
ORDER BY Recency;
For a specific period of dates, you can construct them using a recursive CTE:
with dates as (
select convert(date, '2018-11-30') as dte, 1 as n
union all
select dateadd(day, -1, date.dte), n + 1
from dates
where n < 30
)
. . .
Use a tally table to generate the date and then query:
DECLARE #StartDate AS Date
SET #StartDate = '2018-10-30'
WITH N AS (
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT TOP 30 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1
CROSS JOIN N N2
CROSS JOIN N N3),
Dates AS(
SELECT DATEADD(DAY, I, #StartDate) AS [Date]
FROM Tally
)
SELECT u.member_id AS Member_id
,CAST(MAX(pe.executed_time) AS Date) AS Max_Date
,DATEDIFF(dd,#StartDate,MAX(pe.executed_time))*-1+1 AS R
,COUNT(*) AS F
,SUM(pi.Price) AS M
FROM [user] AS u
CROSS JOIN Dates D
LEFT JOIN purchase_entry AS pe
ON u.member_id=pe.member_id
LEFT JOIN purchase_item AS pi
ON pe.order_id=pi.order_id
WHERE pe.[status] = 'purchase'
AND pe.executed_time <= D.[Date]
ORDER BY Recency;

How do i find available date ranges from date ranges

Sql Server
I already added bookings from my hotel room management system reservation data. I want sql query for retrieve rooms available date ranges and also i want find specific date range is available
You can use something like the following. It's not an easy query, I'll try to explain as simple as possible.
Use a recursive CTE to generate dates from a specified start date to a specified end date.
Join each date to the different room IDs you might have in your table to create all potential available dates.
Determine which dates are unavailable for each room.
Determine which dates are available for each room by joining all potential available dates and removing unavailable ones (point 2 vs 3).
Determine how to group by each range (I used a ROW_NUMBER with a DENSE_RANK).
Display results in intervals, for each room.
Script:
-- Period to consider
DECLARE #StartDate DATE = '2018-06-20'
DECLARE #EndDate DATE = '2018-09-01'
;WITH GeneratedDates AS
(
SELECT
GeneratedDate = #StartDate
UNION ALL
SELECT
GeneratedDate = DATEADD(DAY, 1, G.GeneratedDate)
FROM
GeneratedDates AS G
WHERE
G.GeneratedDate < #EndDate
),
ExistingRooms AS
(
SELECT DISTINCT
RoomId
FROM
HotelReservation.dbo.Reservation AS R
),
UnavailableDatesByRoom AS
(
SELECT DISTINCT
R.RoomID,
UnavailableDate = G.GeneratedDate
FROM
HotelReservation.dbo.Reservation AS R
INNER JOIN GeneratedDates AS G ON G.GeneratedDate BETWEEN R.CheckIn AND R.CheckOut
),
AvailableDaysByRoom AS
(
SELECT
AvailableDate = G.GeneratedDate,
E.RoomID,
DateRanking = ROW_NUMBER() OVER (PARTITION BY E.RoomID ORDER BY G.GeneratedDate ASC)
FROM
GeneratedDates AS G
CROSS JOIN ExistingRooms AS E
WHERE
NOT EXISTS (
SELECT
'unavailable date for that room'
FROM
UnavailableDatesByRoom AS U
WHERE
U.RoomID = E.RoomID AND
G.GeneratedDate = U.UnavailableDate)
),
AvailableDaysByRoomGroupings AS
(
SELECT
A.*,
MagicRanking = DENSE_RANK() OVER (PARTITION BY A.RoomID ORDER BY DateRanking - DATEDIFF(DAY, '2010-01-01', A.AvailableDate))
FROM
AvailableDaysByRoom AS A
)
SELECT
G.RoomID,
FirstAvailableStartDate = MIN(G.AvailableDate),
LastAvailableStartDate = MAX(G.AvailableDate)
FROM
AvailableDaysByRoomGroupings AS G
GROUP BY
G.RoomID,
G.MagicRanking
ORDER BY
G.RoomID,
FirstAvailableStartDate
OPTION
(MAXRECURSION 32000)

adding a row for missing data

Between a date range 2017-02-01 - 2017-02-10, i'm calculating a running balance.
I have days where we have missing data, how would I include these missing dates with the previous days balance ?
Example data:
we are missing data for 2017-02-04,2017-02-05 and 2017-02-06, how would i add a row in the query with the previous balance?
The date range is a parameter, so could change....
Can i use something like the lag function?
I would be inclined to use a recursive CTE and then fill in the values. Here is one approach using outer apply:
with dates as (
select mind as dte, mind, maxd
from (select min(date) as mind, max(date) as maxd from t) t
union all
select dateadd(day, 1, dte), mind, maxd
from dates
where dte < maxd
)
select d.dte, t.balance
from dates d outer apply
(select top 1 t.*
from t
where t.date <= d.dte
order by t.date desc
) t;
You can generate dates using tally table as below:
Declare #d1 date ='2017-02-01'
Declare #d2 date ='2017-02-10'
;with cte_dates as (
Select top (datediff(D, #d1, #d2)+1) Dates = Dateadd(day, Row_Number() over (order by (Select NULL))-1, #d1) from
master..spt_values s1, master..spt_values s2
)
Select * from cte_dates left join ....
And do left join to your table and get running total
Adding to the date range & CTE solutions, I have created Date Dimension tables in numerous databases where I just left join to them.
There are free scripts online to create date dimension tables for SQL Server. I highly recommend them. Plus, it makes aggregation by other time periods much more efficient (e.g. Quarter, Months, Year, etc....)

SQL adding missing dates to query

I'm trying to add missing dates to a SQL query but it does not work.
Please can you tell me what I'm doing wrong.
I only have read only rights to database.
SQL query:
With cteDateGen AS
(
SELECT 0 as Offset, CAST(DATEADD(dd, 0, '2015-11-01') AS DATE) AS WorkDate
UNION ALL
SELECT Offset + 1, CAST(DATEADD(dd, Offset, '2015-11-05') AS DATE)
FROM cteDateGen
WHERE Offset < 100
), -- generate date from to --
cte AS (
SELECT COUNT(*) OVER() AS 'total' ,ROW_NUMBER()OVER (ORDER BY c.dt DESC) as row
, c.*
FROM clockL c
RIGHT JOIN cteDateGen d ON CAST(c.dt AS DATE) = d.WorkDate
WHERE
c.dt between '2015-11-01' AND '2015-11-05' and
--d.WorkDate BETWEEN '2015-11-01' AND '2015-11-05'
and c.id =10
) -- select user log and add missing dates --
SELECT *
FROM cte
--WHERE row BETWEEN 0 AND 15
--option (maxrecursion 0)
I think your problem is simply the dates in the CTE. You can also simplify it a bit:
With cteDateGen AS (
SELECT 0 as Offset, CAST('2015-11-01' AS DATE) AS WorkDate
UNION ALL
SELECT Offset + 1, DATEADD(day, 1, WorkDate) AS DATE)
-----------------------------------^
FROM cteDateGen
WHERE Offset < 100
), -- generate date from to --
cte AS
(SELECT COUNT(*) OVER () AS total,
ROW_NUMBER() OVER (ORDER BY c.dt DESC) as row,
c.*
FROM cteDateGen d LEFT JOIN
clockL c
ON CAST(c.dt AS DATE) = d.WorkDate AND c.id = 10
-----------------------------------------------^
WHERE d.WorkDate between '2015-11-01' AND '2015-11-05'
) -- select user log and add missing dates --
SELECT *
FROM cte
Notes:
Your query used a constant for the second date in the CTE. The constant was different from the first constant. Hence, it was missing some days.
I think that LEFT JOIN is much easier to follow than RIGHT JOIN. LEFT JOIN is basically "keep all rows in the first table".
The WHERE clause was undoing the outer join in any case. The c.id logic needs to move to the ON clause.
The date arithmetic in the first CTE was unnecessarily complex.