SQL joining tables including NULL counts - sql

This I'm hoping is an easy fix, I have 2 tables, one of days over a 6 month period, the other with sitenames, day(date) and count of attendances that day.
I'm wanting to create a table where for each site, it has a row for every day in the 6 month period and takes that sites count which corresponds to the day, but I also want it to show where there is a NULL (no attendance on that day) - I can do it where it brings out only days with attendance but not the other way around. :(
Example data here: NOTE, the data is held in two temporary tables
Date table #Data
CallDate rn
2022-08-01 1
2022-08-02 2
2022-08-03 3
2022-08-04 4
2022-08-05 5
2022-08-06 6
2022-08-07 7
2022-08-08 8
Attendance table: #SiteData
SiteName CallDate Count
Bassetlaw 2022-08-30 1
Bassetlaw 2022-08-31 1
Bassetlaw 2022-09-13 3
Bassetlaw 2022-09-15 5
Bassetlaw 2022-09-23 1
Bassetlaw 2022-09-27 1
Bassetlaw 2022-11-21 1
Bassetlaw 2022-11-23 1
Bassetlaw 2022-11-26 1
Bassetlaw 2022-11-28 1
So in this instance, I would have 6 months worth of rows, but only 10 days worth of data. I need NULLs for the other days, not just 8 rows.
NOTE: There are more sites, I would want this repeated for all site. In essence, I want a table that has all sites with a row per site per day for 6 months irrespective if they had an attendance or not.

This is done by using the LEFT JOIN command.
See here: http://sqlfiddle.com/#!18/0218c/2
CREATE TABLE T_Data (
rn int,
CallDate date
);
CREATE TABLE T_SiteData (
CallDate date,
Amount int
);
INSERT INTO T_Data SELECT 1, '2022-08-01';
INSERT INTO T_Data SELECT 2, '2022-08-02';
INSERT INTO T_Data SELECT 3, '2022-08-03';
INSERT INTO T_Data SELECT 4, '2022-08-04';
INSERT INTO T_Data SELECT 5, '2022-08-05';
INSERT INTO T_Data SELECT 6, '2022-08-06';
INSERT INTO T_Data SELECT 7, '2022-08-07';
INSERT INTO T_Data SELECT 8, '2022-08-08';
INSERT INTO T_Data SELECT 10, '2022-08-09';
INSERT INTO T_Data SELECT 11, '2022-08-10';
INSERT INTO T_Data SELECT 12, '2022-08-11';
INSERT INTO T_Data SELECT 13, '2022-08-12';
INSERT INTO T_Data SELECT 14, '2022-08-13';
INSERT INTO T_Data SELECT 15, '2022-08-14';
INSERT INTO T_SiteData SELECT '2022-08-01', 1;
INSERT INTO T_SiteData SELECT '2022-08-03', 1;
INSERT INTO T_SiteData SELECT '2022-08-05', 3;
INSERT INTO T_SiteData SELECT '2022-08-12', 5;
SELECT
d.*,
sd.Amount AS [Count]
FROM
T_Data AS d
LEFT JOIN
T_SiteData AS sd
ON
sd.CallDate = d.CallDate
This returns the following:
rn CallDate Count
1 2022-08-01 1
2 2022-08-02 (null)
3 2022-08-03 1
4 2022-08-04 (null)
5 2022-08-05 3
6 2022-08-06 (null)
7 2022-08-07 (null)
8 2022-08-08 (null)
10 2022-08-09 (null)
11 2022-08-10 (null)
12 2022-08-11 (null)
13 2022-08-12 5
14 2022-08-13 (null)
15 2022-08-14 (null)
I used the MS SQL server syntax. But the SELECT query should be the same on almost any other DBMS.
If the table SiteData contains more than one entry per CallDate, the query needs to be adjusted. But your example only showed one entry per CallDate.

Related

How to find first time a price has changed in SQL

I have a table that contains an item ID, the date and the price. All items show their price for each day, but I want only to select the items that have not had their price change, and to show the days without change.
An example of the table is
id
Price
Day
Month
Year
asdf
10
03
11
2022
asdr1
8
03
11
2022
asdf
10
02
11
2022
asdr1
8
02
11
2022
asdf
10
01
11
2022
asdr1
7
01
11
2022
asdf
9
31
10
2022
asdr1
8
31
10
2022
asdf
8
31
10
2022
asdr1
8
31
10
2022
The output I want is:
Date
id
Last_Price
First_Price_Appearance
DaysWOchange
2022-11-03
asdf
10
2022-11-01
2
2022-11-03
asdr1
8
2022-11-02
1
The solutions needs to run quickly, so how are some efficency intensive ways to solve this, considering that the table has millions of rows, and there are items that have not changed their price in years.
The issue for efficiency comes because for each id, I would need to loop the entire table, looking for the first match in which the price has changed, and repeat this for thousands of items.
I am attempting to calculate the difference between the current last price, and all the history, but these becomes slow to process, and may take several minutes to calculate for all of history.
The main concern for this problem is efficiency.
DECLARE #table TABLE (id NVARCHAR(5), Price INT, Date DATE)
INSERT INTO #table (id, Price, Date) VALUES
('asdf', 10, '2022-10-20'),
('asdr1', 8, '2022-10-15'),
('asdf', 10, '2022-11-03'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-02'),
('asdr1', 8, '2022-11-02'),
('asdf', 10, '2022-11-01'),
('asdr1', 7, '2022-11-01'),
('asdf', 9, '2022-10-31'),
('asdr1', 8, '2022-10-31'),
('asdf', 8, '2022-10-31'),
('asdr1', 8, '2022-10-31')
Tables of data are useful, but it's even more so if you can put the demo date into an object.
SELECT id, FirstDate, LastChange, DaysSinceChange, Price
FROM (
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM #table
) a
WHERE rn = 1
This is a quick way to get what you want. If you execute the subquery by itself you can see all the history.
id FirstDate LastChange Price DaysSinceChange
-------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0
asdr1 2022-10-15 2022-11-02 8 1
SELECT id, MIN(Date) OVER (PARTITION BY id ORDER BY Date) AS FirstDate, Date AS LastChange, Price,
CASE WHEN LEAD(Date,1) OVER (PARTITION BY id ORDER BY Date) IS NULL THEN DATEDIFF(DAY,Date,CURRENT_TIMESTAMP)
ELSE DATEDIFF(DAY,LAG(Date) OVER (PARTITION BY id ORDER BY Date),Date)
END AS DaysSinceChange, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS rn
FROM #table
id FirstDate LastChange Price DaysSinceChange rn
------------------------------------------------------
asdf 2022-10-20 2022-11-03 10 0 1
asdf 2022-10-20 2022-11-02 10 1 2
asdf 2022-10-20 2022-11-01 10 1 3
asdf 2022-10-20 2022-10-31 9 11 4
asdf 2022-10-20 2022-10-31 8 0 5
asdf 2022-10-20 2022-10-20 10 NULL 6
asdr1 2022-10-15 2022-11-02 8 1 1
asdr1 2022-10-15 2022-11-02 8 1 2
asdr1 2022-10-15 2022-11-01 7 1 3
asdr1 2022-10-15 2022-10-31 8 16 4
asdr1 2022-10-15 2022-10-31 8 0 5
asdr1 2022-10-15 2022-10-15 8 NULL 6
You can use lag() and a cumulative max():
select id, date, price
from (select t.*,
max(case when price <> lag_price then date end) over (partition by id) as price_change_date
from (select t.*, lag(price) over (partition by id order by date) as lag_price
from t
) t
) t
where price_change_date is null;
This calculates the first date of a price change for each id. It then filters out all rows where a price change occurred.
The use of window functions should be highly efficient, taking advantage of indexes on (id, date) and (id, price, date).

display the number of excluded days from holiday leave

I have a leaves_table which contains id, holiday_start, holiday_end. I have another leaves_holiday table which contains the public holiday name and it's date. now i want to in the leaves_table to add a new column and exclude the days where it is a public holiday
lets say for example
leaves_table
id. holiday_start. holiday_end
1. 09-Jul-2022. 13-Jul-2022
public holiday table
holiday_name. holiday_date
christmas 10-Jul-2022
the query should return no of days excluded as 1
id. holiday_start. holiday_end. excluded days
1 09-Jul-2022. 13-Jul-2022. 1
how do i do this?
here is the create table and insert
create table XX_LEAVES_EXCLUDES
(
exclude_id number not null primary key,
holiday_start date not null,
holiday_end date not null
);
create sequence seq_exclude_id MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 2;
create or replace trigger trg_exclude_id
before insert
on XX_LEAVES_EXCLUDES
for each row
begin
:new.exclude_id:=seq_exclude_id.nextval;
end;
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('23-Jul-2022','20-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('01-Jul-2022','02-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('13-Jul-2022','29-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('12-Jul-2022','01-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('01-Jul-2022','29-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('08-Jul-2022','08-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('03-Jul-2022','20-Aug-2022');
2nd table (public holiday calendar table)
CREATE TABLE "XX_LEAVES_PUBLIC_HOLIDAYS"
( "PUBLIC_HOLIDAY_UAE_YEAR_2022" VARCHAR2(50) NOT NULL,
"HOLIDAY_DATE" DATE NOT NULL ENABLE
)
INSERT INTO XX_LEAVES_PUBLIC_HOLIDAYS (PUBLIC_HOLIDAY_UAE_YEAR_2022, HOLIDAY_DATE) VALUES (National Day,'10-Jul-2022');
compare leave date rage with hodiday and get count as excluded_days
select l.id, l.holiday_start, l.holiday_end,
(select Count(1) from leaves_holiday lh
where l.holiday_start<= lh.holiday_date and l.holiday_end >= lh.holiday_date) as excluded_days
from leaves_table l
One option is to create calendar of all holiday dates (leaves_calendar CTE in my example) and then join it to public_holiday so that you'd know which dates to exclude.
Sample data:
SQL> with
2 leaves_table (id, holiday_start, holiday_end) as
3 (select 1, date '2022-07-09', date '2022-07-13' from dual union all
4 select 2, date '2022-05-25', date '2022-05-30' from dual
5 ),
6 public_holiday (holiday_name, holiday_date) as
7 (select 'Christmas' , date '2022-07-10' from dual union all
8 select 'My holiday', date '2022-07-12' from dual),
9 --
Query begins here; first create a calendar ...
10 leaves_calendar as
11 (select l.id, l.holiday_start + column_value - 1 as datum
12 from leaves_table l cross join
13 table(cast(multiset(select level from dual
14 connect by level <= l.holiday_end - l.holiday_start + 1
15 ) as sys.odcinumberlist))
16 )
... then return the result: start and end date, number of excluded dates and holiday names (you didn't ask for that, but ... not a problem)
17 select c.id,
18 min(c.datum) as holiday_start,
19 max(c.datum) as holiday_end,
20 sum(case when p.holiday_date = c.datum then 1 else 0 end) as excluded_days,
21 listagg(p.holiday_name, ', ') within group (order by p.holiday_date) as excluded
22 from leaves_calendar c left join public_holiday p on p.holiday_date = c.datum
23 group by c.id;
ID HOLIDAY_START HOLIDAY_END EXCLUDED_DAYS EXCLUDED
---------- --------------- --------------- ------------- ------------------------------
1 09.07.2022 13.07.2022 2 Christmas, My holiday
2 25.05.2022 30.05.2022 0
SQL>
With sample data you provided:
SQL> select * from xx_leaves_excludes;
EXCLUDE_ID HOLIDAY_START HOLIDAY_END
---------- --------------- ---------------
1 23.07.2022 20.08.2022
2 01.07.2022 02.08.2022
3 13.07.2022 29.08.2022
4 12.07.2022 01.08.2022
5 01.07.2022 29.08.2022
6 08.07.2022 08.08.2022
7 03.07.2022 20.08.2022
7 rows selected.
SQL> select * from public_holiday;
HOLIDAY_NAME HOLIDAY_DATE
--------------- ---------------
Christmas 10.07.2022
My holiday 12.07.2022
Query looks like this:
SQL> with
2 leaves_calendar as
3 (select l.exclude_id, l.holiday_start + column_value - 1 as datum
4 from xx_leaves_excludesl cross join
5 table(cast(multiset(select level from dual
6 connect by level <= l.holiday_end - l.holiday_start + 1
7 ) as sys.odcinumberlist))
8 )
9 select c.exclude_id,
10 min(c.datum) as holiday_start,
11 max(c.datum) as holiday_end,
12 sum(case when p.holiday_date = c.datum then 1 else 0 end) as excluded_days,
13 listagg(p.holiday_name, ', ') within group (order by p.holiday_date) as excluded
14 from leaves_calendar c left join public_holiday p on p.holiday_date = c.datum
15 group by c.exclude_id;
EXCLUDE_ID HOLIDAY_START HOLIDAY_END EXCLUDED_DAYS EXCLUDED
---------- --------------- --------------- ------------- ----------------------------------------
1 23.07.2022 20.08.2022 0
2 01.07.2022 02.08.2022 2 Christmas, My holiday
3 13.07.2022 29.08.2022 0
4 12.07.2022 01.08.2022 1 My holiday
5 01.07.2022 29.08.2022 2 Christmas, My holiday
6 08.07.2022 08.08.2022 2 Christmas, My holiday
7 03.07.2022 20.08.2022 2 Christmas, My holiday
7 rows selected.
SQL>

Is this possible in SQL? Min and Max Dates On a Total. Where it changes in between Dates

I am trying to figure out how to write a query that will give me the correct historical data between dates. But only using sql. I know it is possible coding a loop, but I'm not sure if this is possible in a SQL query. Dates: DD/MM/YYYY
An Example of Data
ID
Points
DATE
1
10
01/01/2018
1
20
02/01/2019
1
25
03/01/2020
1
10
04/01/2021
With a simple query
SELECT ID, Points, MIN(Date), MAX(Date)
FROM table
GROUP BY ID,POINTS
The Min date for 10 points would be 01/01/2018, and the Max Date would be 04/01/2021. Which would be wrong in this instance. As It should be:
ID
Points
Min DATE
Max DATE
1
10
01/01/2018
01/01/2019
1
20
02/01/2019
02/01/2020
1
25
03/01/2020
03/01/2021
1
10
04/01/2021
04/01/2021
I was thinking of using LAG, but need some ideas here. What I haven't told you is there is a record per day. So I would need to group until a change of points. This is to create a view from the data that I already have.
It looks like - for your sample data set - the following lead should suffice:
select id, points, date as MinDate,
IsNull(DateAdd(day, -1, Lead(Date,1) over(partition by Id order by Date)), Date) as MaxDate
from t
Example Fiddle
I'm guessing you want the MAX date to be 1 day before the next MIN date.
And you can use the window function LEAD to get the next MIN date.
And if you group also by the year, then the date ranges match the expected result.
SELECT ID, Points
, MIN([Date]) AS [Min Date]
, COALESCE(DATEADD(day, -1, LEAD(MIN([Date])) OVER (PARTITION BY ID ORDER BY MIN([Date]))), MAX([Date])) AS [Max Date]
FROM your_table
GROUP BY ID, Points, YEAR([Date]);
ID
Points
Min Date
Max Date
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
2021-01-04
Test on db<>fiddle here
We can do this by creating two tables one with the minimum and one with the maximum date for each grouping and then combining them
CREATE TABLE dataa(
id INT,
points INT,
ddate DATE);
INSERT INTO dataa values(1 , 10 ,'2018-10-01');
INSERT INTO dataa values(1 , 20 ,'2019-01-02');
INSERT INTO dataa values(1 , 25 ,'2020-01-03');
INSERT INTO dataa values(1 , 10 ,'2021-01-04');
SELECT
mi.id, mi.points,mi.date minDate, ma.date maxDate
FROM
(select id, points, min(ddate) date from dataa group by id,points) mi
JOIN
(select id, points, max(ddate) date from dataa group by id,points) ma
ON
mi.id = ma.id
AND
mi.points = ma.points;
DROP TABLE dataa;
this gives the following output
+------+--------+------------+------------+
| id | points | minDate | maxDate |
+------+--------+------------+------------+
| 1 | 10 | 2018-10-01 | 2021-01-04 |
| 1 | 20 | 2019-01-02 | 2019-01-02 |
| 1 | 25 | 2020-01-03 | 2020-01-03 |
+------+--------+------------+------------+
I've used the default date formatting. This could be modified if you wish.
*** See my other answer, as I don't think this answer is correct after reexamining the OPs question. Leaving ths answer in place, in case it has any value.
As I understand the problem consecutive daily values with the same value for a given ID may be ignored. This can be done by examining the prior value using the LAG() function and excluding records where the current value is unchanged from the prior.
From the remaining records, the LEAD() function can be used to look ahead to the next included record to extract the date where this value is superseded. Max Date is then calculated as one day prior.
Below is an example that includes expanded test data to cover multiple IDs and repeated Points values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *, PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
SELECT ID, Points, MinDate = Date,
MaxDate = DATEADD(day, -1, (LEAD(Date) OVER (PARTITION BY Id ORDER BY Date)))
FROM CTE
WHERE (PriorPoints <> Points OR PriorPoints IS NULL) -- Exclude unchanged
ORDER BY Id, Date
Results:
ID
Points
MinDate
MaxDate
1
10
2018-01-01
2019-01-01
1
20
2019-01-02
2020-01-02
1
25
2020-01-03
2021-01-03
1
10
2021-01-04
null
2
10
2022-01-01
2022-01-31
2
20
2022-02-01
2022-05-31
2
25
2022-06-01
2022-07-31
2
20
2022-08-01
2022-09-07
2
25
2022-09-08
2022-10-08
2
10
2022-10-09
null
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
null
db<>fiddle
For the last value for a given ID, the calculated MaxDate is NULL indicating no upper bound to the date range. If you really want MaxDate = MinDate for this case, you can add ISNULL( ..., Date).
(I am adding this as an alternative (and simpler) interpretation of the OP's question.)
Problem restatement: Given a collection if IDs, Dates, and Points values, a group is defined as any consecutive sequence of the same Points value for a given ID and ascending dates. For each such group, calculate the min and max dates.
The start of such a group can be identified as a row where the Points value changes from the preceding value, or if there is no preceding value for a given ID. If we first tag such rows (NewGroup = 1), we can then assign group numbers based on a count of preceding tagged rows (including the current row). Once we have assigned group numbers, it is then a simple matter to apply a group and aggregate operation.
Below is a sample that includes some additional test data to show multiple IDs and repeating values.
DECLARE #Data TABLE (Id INT, Points INT, Date DATE)
INSERT #Data
VALUES
(1, 10, '2018-01-01'), -- Start
(1, 20, '2019-01-02'), -- Updated
(1, 25, '2020-01-03'), -- Updated
(1, 10, '2021-01-04'), -- Updated
(2, 10, '2022-01-01'), -- Start
(2, 20, '2022-02-01'), -- Updated
(2, 20, '2022-03-01'), -- No change
(2, 20, '2022-04-01'), -- No change
(2, 20, '2022-05-01'), -- No change
(2, 25, '2022-06-01'), -- Updated
(2, 25, '2022-07-01'), -- No change
(2, 20, '2022-08-01'), -- Updated
(2, 25, '2022-09-08'), -- Updated
(2, 10, '2022-10-09'), -- Updated
(3, 10, '2022-01-01'), -- Start
(3, 10, '2022-01-02'), -- No change
(3, 20, '2022-01-03'), -- Updated
(3, 20, '2022-01-04'), -- No change
(3, 20, '2022-01-05'), -- No change
(3, 10, '2022-01-06'), -- Updated
(3, 10, '2022-01-07'); -- No change
WITH CTE AS (
SELECT *,
PriorPoints = LAG(Points) OVER (PARTITION BY Id ORDER BY Date)
FROM #Data
)
, CTE2 AS (
SELECT *,
NewGroup = CASE WHEN (PriorPoints <> Points OR PriorPoints IS NULL)
THEN 1 ELSE 0 END
FROM CTE
)
, CTE3 AS (
SELECT *, GroupNo = SUM(NewGroup) OVER(
PARTITION BY ID
ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM CTE2
)
SELECT Id, Points, MinDate = MIN(Date), MaxDate = MAX(Date)
FROM CTE3
GROUP BY Id, GroupNo, Points
ORDER BY Id, GroupNo
Results:
Id
Points
MinDate
MaxDate
1
10
2018-01-01
2018-01-01
1
20
2019-01-02
2019-01-02
1
25
2020-01-03
2020-01-03
1
10
2021-01-04
2021-01-04
2
10
2022-01-01
2022-01-01
2
20
2022-02-01
2022-05-01
2
25
2022-06-01
2022-07-01
2
20
2022-08-01
2022-08-01
2
25
2022-09-08
2022-09-08
2
10
2022-10-09
2022-10-09
3
10
2022-01-01
2022-01-02
3
20
2022-01-03
2022-01-05
3
10
2022-01-06
2022-01-07
To see the intermediate results, replace the final select with SELECT * FROM CTE3 ORDER BY Id, Date.
If you wish to treat gaps in dates as group criteria, add a PriorDate calculation to CTE and add OR Date <> PriorDate to the NewGroup condition.
db<>fiddle
Caution: In your original post, you state that "this is to create a view". Beware that if the above logic is included in a view, the entire result may be recalculated every time the view is accessed, regardless of any ID or date criteria applied. It might make more sense to use the above to populate and periodically refresh a historic roll-up data table for efficient access. Another alternative is to make a stored procedure with appropriate parameters that could filter that data before feeding it into the above.

Get previous month date values from data stored within SQL Server table

My table structure in SQL Server looks as below.
id startdate enddate value
---------------------------------------
1 2019-02-06 2019-02-07 11
1 2019-01-22 2019-02-05 10
1 2019-01-15 2019-01-21 14
1 2018-12-13 2018-01-14 15
1 2018-12-09 2018-12-12 14
1 2018-08-13 2018-12-08 17
1 2018-07-19 2018-08-12 19
1 2018-06-13 2018-07-18 20
Now my query needs to display value from highest start date for that month. Which is fine and I know what needs to be done but Not start just highest date value for that month, if no value is there for that start date, we carry forward value from last month. So basically if you notice on above data, after December 2018 values, there are no values for November, October, September etc but I want to return MM/YYYY values for that month in result but value for those months should be what we found on earlier month which is August values which in this example is 17. Please note that enddate will always be as of one day before new start date begins. Probably that can be used for back filling and carry forwarding missing month values?
So my result should look like below.
id date value
----------------------------
1 2019-02 11
1 2019-01 10
1 2018-12 15
1 2018-11 17
1 2018-10 17
1 2018-09 17
1 2018-08 17
1 2018-07 19
1 2018-06 20
Do you think this can be done without using cursor here?
Alexander Volok's answer is solid, so I won't go into too much extra code. But I thought I'd explain the reasoning. In essence, what you need to do is create a skeleton date table containing all the dates and primary keys you want returned. I'm guessing you have more than one id value in your real data, so probably something like this (whether you choose to persist it or not is up to you)
create table #skelly
(
id int,
_year int,
_month int
primary key (id, _year, _month)
)
You can get much more precise if you need to be, by only including dates which fall between the min and max StartDate per id, but that's an exercise I leave up to you.
From there, it's then just a matter of filling in the values you care about against that skeleton table. You can do this in a number of ways; by joining, cross applying or a correlated subquery (as Alexander Volok used).
DECLARE #start DATE, #end DATE;
SELECT #start = '20180601', #end = GETDATE();
;WITH Months AS
(
SELECT EOMONTH(DATEADD(month, n-1, #start)) AS DateValue FROM (
SELECT TOP (DATEDIFF(MONTH, #start, #end) + 1)
n = ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects
) D
)
, InputData AS
(
SELECT 1 AS id, '2019-02-06' startdate, '2019-02-07' as enddate, 11 AS [value] UNION ALL
SELECT 1, '2019-01-22', '2019-01-25', 10 UNION ALL
SELECT 1, '2019-01-15', '2019-01-17', 14 UNION ALL
SELECT 1, '2018-12-13', '2018-12-19', 15 UNION ALL
SELECT 1, '2018-12-09', '2018-12-10', 14 UNION ALL
SELECT 1, '2018-08-13', '2018-12-08', 17 UNION ALL
SELECT 1, '2018-07-19', '2018-07-25', 19 UNION ALL
SELECT 1, '2018-06-13', '2018-07-18', 20
)
SELECT FORMAT(m.DateValue, 'yyyy-MM') AS [Month]
, (SELECT TOP 1 I.value FROM InputData I WHERE I.startdate < M.DateValue ORDER BY I.startdate DESC ) [Value]
FROM months m
ORDER BY M.DateValue DESC
Results to:
Month Value
2019-02 11
2019-01 10
2018-12 15
2018-11 17
2018-10 17
2018-09 17
2018-08 17
2018-07 19
2018-06 20

Count of people by hour

I need some help working out how many people were on site for each hour.
The data looks like this
Id Roomid, NumPeople, Starttime, Closetime.
1 1 4 2018/10/03 09:06 2018/10/03 12:43
2 2 8 2018/10/03 10:16 2018/10/03 13:12
3 1 6 2018/10/03 13:02 2018/10/03 15:01
What I need out is the max count of people during the hour, each hour
Time | PeoplePresent
9 4
10 12
11 12
12 12
13 14
14 6
15 6
Getting the count of people as the arrived is simple enough, but I can’t think where to start to get the presence for each hour. Can anyone suggest a strategy for this. I ok with the simple SQL stuff but I’m certain this requires some advanced SQL functions.
Tested the following in SQL Server 2008 R2:
You can use a recursive CTE to build the list of hours, including the row id and NumPeople values. Then you can sum them together to get your final output. I put together the following test data based on the question.
CREATE TABLE #times
(
Id int
, Roomid INT
, NumPeople INT
, Starttime DATETIME
, Closetime DATETIME
)
INSERT INTO #times
(
Id
,Roomid
,NumPeople
,Starttime
,Closetime
)
VALUES
(1, 1, 4 , '2018/10/03 09:06', '2018/10/03 12:43')
,(2, 2, 8, '2018/10/03 10:16', '2018/10/03 13:12')
,(3, 1, 6, '2018/10/03 13:02', '2018/10/03 15:01')
;WITH recursive_CTE (id, startHour, currentHour, diff, NumPeople) AS
(
SELECT
Id
,startHour = DATEPART(HOUR, t.Starttime)
,currentHour = DATEPART(HOUR, t.Starttime)
,diff = DATEDIFF(HOUR, Starttime, Closetime)
,t.NumPeople
FROM #times t
UNION ALL
SELECT
r.id
,r.startHour
,r.currentHour + 1
,r.diff
,r.NumPeople
FROM recursive_CTE r
WHERE r.currentHour < startHour + diff
)
SELECT
Time = currentHour
,PeoplePresent = SUM(NumPeople)
FROM recursive_CTE
GROUP BY currentHour
DROP TABLE #times
Query results:
Time PeoplePresent
9 4
10 12
11 12
12 12
13 14
14 6
15 6