Fill missing gaps in data using a date column

Fill missing gaps in data using a date column - sql

I have a temp table that returns this output
PRICE | DATE
1.491500 | 2019-02-01
1.494000 | 2019-02-04
1.486500 | 2019-02-06
I want to fill in the missing gaps in data by duplicating the last known record prior to the gaps in data using the date. Is their a way to update the existing temp table or create a new temp table with this desired output dynamically:
PRICE | DATE
1.491500 | 2019-02-01
1.491500 | 2019-02-02
1.491500 | 2019-02-03
1.494000 | 2019-02-04
1.494000 | 2019-02-05
1.486500 | 2019-02-06
I am working on sql server 2008r2

Because SQL Server does not support IGNORE NULLS in LAG() this is a bit tricky. I would go for a recursive subquery of the form:
with cte as (
select price, date, dateadd(day, -1, lead(date) over (order by date)) as last_date
from t
union all
select price, dateadd(day, 1, date), last_date
from cte
where date < last_date
)
select price, date
from cte
order by date;
Here is a db<>fiddle.
In SQL Server 2008, you can replace the lead() with:
with cte as (
select price, date,
(select min(date)
from t t2
where t2.date > t.date
) as last_date
from t
union all
select price, dateadd(day, 1, date), last_date
from cte
where date < last_date
)
select price, date
from cte
order by date;

Assuming there is a dates table (if not you can easily make one), you can do this by left joining the existing table to the dates table. Thereafter assign groups per dates found using a running sum. The max value per group is what would be needed to fill in the missing values.
select dt,max(price) over(partition by grp) as price
from (select p.price,d.dt,sum(case when p.dt is null then 0 else 1 end) over(order by d.dt) as grp
from dates d
left join prices p on p.dt = d.dt
) t
Sample Demo
Making a dates table with a recursive cte. Persist it as needed.
--Generate dates in 2019
with dates(dt) as (select cast('2019-01-01' as date)
union all
select dateadd(day,1,dt)
from dates
where dt < '2019-12-31'
)
select * from dates
option(maxrecursion 0)

Related

SQL Query - Combine rows based on multiple columns

On the image above, I'd like to combine rows with the same value on consecutive days.
Combined rows will have the earliest date on From column and the latest date on To column.
Looking at the example, even if Rows 3 and 4 have the same value, they were not combined because of the date gap.
I've tried using LAG and LEAD functions but no luck.

You can try below way -
DEMO
with c as
(
select *, datediff(dd,todate,laedval) as leaddiff,
datediff(dd,todate,lagval) as lagdiff
from
(
select *,lead(todate) over(partition by value order by todate) laedval,
lag(todate) over(partition by value order by todate) lagval
from t1
)A
)
select * from
(
select value,min(todate) as fromdate,max(todate) as todate from c
where coalesce(leaddiff,0)+coalesce(lagdiff,0) in (1,-1)
group by value
union all
select value,fromdate,todate from c
where coalesce(leaddiff,0)+coalesce(lagdiff,0)>1 or coalesce(leaddiff,0)+coalesce(lagdiff,0)<-1
)A order by value
OUTPUT:
value fromdate todate
1 16/07/2019 00:00:00 17/07/2019 00:00:00
3 21/07/2019 00:00:00 26/07/2019 00:00:00
2 18/07/2019 00:00:00 18/07/2019 00:00:00
2 20/07/2019 00:00:00 20/07/2019 00:00:00

I am going to recommend the following approach:
Find where each new group begins. You can do this by comparing the previous maximum todate with the fromdate in this row.
Do a cumulative sum of the starts to define a group.
Aggregate the results.
This can be handled using window functions and aggregation:
select value, min(fromdate) as fromdate, max(todate) as todate
from (select t.*,
sum(case when prev_todate >= dateadd(day, -1, fromdate)
then 0 -- overlap, so this does not begin a new group
else 1 -- no overlap, so this does begin a new group
end) over
(partition by value order by fromdate) as grp
from (select t.*,
max(todate) over (partition by value
order by fromdate
rows between unbounded preceding and 1 preceding
) as prev_todate
from t
) t
) t
group by value, grp
order by value, min(fromdate);
Here is a db<>fiddle.

How to calculate daily average from aggregate results with SQL?

I'm working on outputting some data and I want to pull the daily average of some numbers.
As you can see, what I want to do is count the amount of rows received/results(think the row ID) and then divide it against the day value to make the daily average.(30/1) , (64/2) etc I've tried everything, but I keep running into a wall with this.
As it stands, I'm guessing to make this work a sub query of some sort is needed. I just don't know how to get the day(Row id 1,2,3,4 etc) to use for the division.
SELECT calendar_date, SUM(NY_dayscore * cAttendance)
FROM vw_Appointments
WHERE status = 'Confirmed'
Group by calendar_date
Attempted count with distinct, to no avail
SUM(NY_dayscore * cAttendance) ) / count(distinct calendar_date)
My original code is long and cba to post it all. So just attempting to post a small sample code to get guidance on the issue.

In SQL Server 2012+, you would use the cumulative average:
select calendar_date, sum(NY_dayscore * cAttendance),
avg(sum(NY_dayscore * cAttendance)) over (order by calendar_date) as running_average
from vw_appointments a
where status = 'Confirmed'
group by calendar_date
order by calendar_date;
In SQL Server 2008, this is more difficult:
with a as (
select calendar_date, sum(NY_dayscore * cAttendance) as showed
from vw_appointments a
where status = 'Confirmed'
group by calendar_date
)
select a.*, a2.running_average
from a outer apply
(select avg(showed) as running_average
from a a2
where a2.calendar_date <= a.calendar_date
) a2
order by calendar_date;

Is it ROW_NUMBER() that you are missing?
SELECT
calendar_date,
SUM(NY_dayscore * cAttendance) / (ROW_NUMBER() OVER (ORDER BY calendar_date ASC)) AS average
FROM vw_Appointments
WHERE status = 'Confirmed'
GROUP BY calendar_date
ORDER BY calendar_date

I think you need sum(showed) over (..)/row_number() over (..)
WITH Table1(date, showed) AS
(
SELECT '2019-01-02', 30 UNION ALL
SELECT '2019-01-03', 34 UNION ALL
SELECT '2019-01-03', 41 UNION ALL
SELECT '2019-01-04', 48
)
SELECT date,
sum(showed) over (order by date) /
row_number() over (order by date)
as daily_average
FROM Table1
GROUP BY showed, date;
date daily_average
2019-01-02 30
2019-01-03 52
2019-01-03 35
2019-01-04 38
Demo

SQL how to write a query that return missing date ranges?

I am trying to figure out how to write a query that looks at certain records and finds missing date ranges between today and 9999-12-31.
My data looks like below:
ID |start_dt |end_dt |prc_or_disc_1
10412 |2018-07-17 00:00:00.000 |2018-07-20 00:00:00.000 |1050.000000
10413 |2018-07-23 00:00:00.000 |2018-07-26 00:00:00.000 |1040.000000
So for this data I would want my query to return:
2018-07-10 | 2018-07-16
2018-07-21 | 2018-07-22
2018-07-27 | 9999-12-31
I'm not really sure where to start. Is this possible?

You can do that using the lag() function in MS SQL (but that is available starting with 2012?).
with myData as
(
select *,
lag(end_dt,1) over (order by start_dt) as lagEnd
from myTable),
myMax as
(
select Max(end_dt) as maxDate from myTable
)
select dateadd(d,1,lagEnd) as StartDate, dateadd(d, -1, start_dt) as EndDate
from myData
where lagEnd is not null and dateadd(d,1,lagEnd) < start_dt
union all
select dateAdd(d,1,maxDate) as StartDate, cast('99991231' as Datetime) as EndDate
from myMax
where maxDate < '99991231';
If lag() is not available in MS SQL 2008, then you can mimic it with row_number() and joining.

select
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then end_dt +1 END as F1,
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then ISNULL(LEAD(start_dt) over (order by ID) - 1, '99991231') END as F2
from t
Working SQLFiddle example is -> Here
FOR 2008 VERSION
SELECT
X.end_dt + 1 as F1,
ISNULL(Y.start_dt-1, '99991231') as F2
FROM t X
LEFT JOIN (
SELECT
*
, (SELECT MAX(ID) FROM t WHERE ID < A.ID) as ID2
FROM t A) Y ON X.ID = Y.ID2
WHERE DATEDIFF(day, X.end_dt, ISNULL(Y.start_dt, '99991231')) > 1
Working SQLFiddle example is -> Here

This should work in 2008, it assumes that ranges in your table do not overlap. It will also eliminate rows where the end_date of the current row is a day before the start date of the next row.
with dtRanges as (
select start_dt, end_dt, row_number() over (order by start_dt) as rownum
from table1
)
select t2.end_dt + 1, coalesce(start_dt_next -1,'99991231')
FROM
( select dr1.start_dt, dr1.end_dt,dr2.start_dt as start_dt_next
from dtRanges dr1
left join dtRanges dr2 on dr2.rownum = dr1.rownum + 1
) t2
where
t2.end_dt + 1 <> coalesce(start_dt_next,'99991231')

http://sqlfiddle.com/#!18/65238/1
SELECT
*
FROM
(
SELECT
end_dt+1 AS start_dt,
LEAD(start_dt-1, 1, '9999-12-31')
OVER (ORDER BY start_dt)
AS end_dt
FROM
yourTable
)
gaps
WHERE
gaps.end_dt >= gaps.start_dt
I would, however, strongly urge you to use end dates that are "exclusive". That is, the range is everything up to but excluding the end_dt.
That way, a range of one day becomes '2018-07-09', '2018-07-10'.
It's really clear that my range is one day long, if you subtract one from the other you get a day.
Also, if you ever change to needing hour granularity or minute granularity you don't need to change your data. It just works. Always. Reliably. Intuitively.
If you search the web you'll find plenty of documentation on why inclusive-start and exclusive-end is a very good idea from a software perspective. (Then, in the query above, you can remove the wonky +1 and -1.)

This solves your case, but provide some sample data if there will ever be overlaps, fringe cases, etc.
Take one day after your end date and 1 day before the next line's start date.
DECLARE # TABLE (ID int, start_dt DATETIME, end_dt DATETIME, prc VARCHAR(100))
INSERT INTO # (id, start_dt, end_dt, prc)
VALUES
(10410, '2018-07-09 00:00:00.00','2018-07-12 00:00:00.000','1025.000000'),
(10412, '2018-07-17 00:00:00.00','2018-07-20 00:00:00.000','1050.000000'),
(10413, '2018-07-23 00:00:00.00','2018-07-26 00:00:00.000','1040.000000')
SELECT DATEADD(DAY, 1, end_dt)
, DATEADD(DAY, -1, LEAD(start_dt, 1, '9999-12-31') OVER(ORDER BY id) )
FROM #

You may want to take a look at this:
http://sqlfiddle.com/#!18/3a224/1
You just have to edit the begin range to today and the end range to 9999-12-31.

Day after max date in data

I am loading data into a table. I don't have any info on how frequent or when the source data is loaded, all I know is I need data from the source to run my script.
Here's the issue, if I run max(date) I get the latest date from the source, but I don't know if the data is still loading. I've ran into cases where I've only gotten a percentage of the data. Thus, I need the next business day after max date.
I want to know is there a way to get the second latest date in the system. I know I can get max(date) - 1, but that give me literally the day after. I don't need the literal day after.
Example, if I run the script on Tuesday, max(date) will be Monday, but since weekend are not in the source system, I need to get Friday instead of Monday.
DATE
---------
2017-04-29
2017-04-25
2017-04-21
2017-04-19
2017-04-18
2017-04-15
2017-04-10
max(date) = 2017-04-29
how do I get 2017-04-25?

Depending on your version of SQL Server, you can use a windowing function like row_number:
select [Date]
from
(
select [Date],
rn = row_number() over(order by [Date] desc)
from #yourtable
) d
where rn = 2
Here is a demo.
Should you have multiple of the same date, you can perform a distinct first:
;with cte as
(
select distinct [date]
from #yourtable
)
select [date]
from
(
select [date],
rn = row_number() over(order by [date] desc)
from cte
) x
where rn = 2;

You can use row_number and get second as below
select * from ( select *, Rown= row_number() over (order by date desc) from yourtable ) a
where a.RowN = 2

More recent SQL Server versions support FETCH FIRST:
select date
from tablename
order by date desc
offset 1 fetch first 1 row only
OFFSET 1 means skip one row. (The 2017-04-29 row.)

;With cte([DATE])
AS
(
SELECT '2017-04-29' union all
SELECT '2017-04-25' union all
SELECT '2017-04-21' union all
SELECT '2017-04-19' union all
SELECT '2017-04-18' union all
SELECT '2017-04-15' union all
SELECT '2017-04-10'
)
SELECT [DATE] FROM
(
SELECT *,ROW_NUMBER()OVER(ORDER BY Seq)-1 As Rno FROM
(
SELECT *,MAX([DATE])OVER(ORDER BY (SELECT NULL))Seq FROM cte
)dt
)Final
WHERE Final.Rno=1
OutPut
DATE
-----
2017-04-25

You can also use FIRST_VALUE with a dynamic date something like DATEADD(DD, -1, GETDATE()). The example below has the date hard coded.
SELECT DISTINCT
FIRST_VALUE([date]) OVER(ORDER BY [date] DESC) AS FirstDate
FROM CTE
WHERE [date] < '2017-04-25'

Another way
DECLARE #T TABLE ([DATE] DATE)
INSERT INTO #T VALUES
('2017-04-29'),
('2017-04-25'),
('2017-04-21'),
('2017-04-19'),
('2017-04-18'),
('2017-04-15'),
('2017-04-10');
SELECT
MAX([DATE]) AS [DATE]
FROM #T
WHERE DATENAME(DW,[DATE]) NOT IN ('Saturday','Sunday')

Another way of doing it, just for example sake...
SELECT MIN(A.date)
FROM
(
SELECT TOP 2 DISTINCT date
FROM YourTable AS C
ORDER BY date DESC
) AS A

To club the rows for week days

I have data like below:
StartDate EndDate Duration
----------
41890 41892 3
41898 41900 3
41906 41907 2
41910 41910 1
StartDate and EndDate are respective ID values for any dates from calendar. I want to calculate the sum of duration for consecutive days. Here I want to include the days which are weekends. E.g. in the above data, let's say 41908 and 41909 are weekends, then my required result set should look like below.
I already have another proc that can return me the next working day, i.e. if I pass 41907 or 41908 or 41909 as DateID in that proc, it will return 41910 as the next working day. Basically I want to check if the DateID returned by my proc when I pass the above EndDateID is same as the next StartDateID from above data, then both the rows should be clubbed. Below is the data I want to get.
ID StartDate EndDate Duration
----------
278457 41890 41892 3
278457 41898 41900 3
278457 41906 41910 3
Please let me know in case the requirement is not clear, I can explain further.
My Date Table is like below:
DateId Date Day
----------
41906 09-04-2014 Thursday
41907 09-05-2014 Friday
41908 09-06-2014 Saturdat
41909 09-07-2014 Sunday
41910 09-08-2014 Monday
Here is the SQL Code for setup:
CREATE TABLE Table1
(
StartDate INT,
EndDate INT,
LeaveDuration INT
)
INSERT INTO Table1
VALUES(41890, 41892, 3),
(41898, 41900, 3),
(41906, 41907, 3),
(41910, 41910, 1)
CREATE TABLE DateTable
(
DateID INT,
Date DATETIME,
Day VARCHAR(20)
)
INSERT INTO DateTable
VALUES(41907, '09-05-2014', 'Friday'),
(41908, '09-06-2014', 'Saturday'),
(41909, '09-07-2014', 'Sunday'),
(41910, '09-08-2014', 'Monday'),
(41911, '09-09-2014', 'Tuesday')

This is rather complicated. Here is an approach using window functions.
First, use the date table to enumerate the dates without weekends (you can also take out holidays if you want). Then, expand the periods into one day per row, by using a non-equijoin.
You can then use a trick to identify sequential days. This trick is to generate a sequential number for each id and subtract it from the sequential number for the dates. This is a constant for sequential days. The final step is simply an aggregation.
The resulting query is something like this:
with d as (
select d.*, row_number() over (order by date) as seqnum
from dates d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(d.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from table t join
d ds
on ds.date between t.startdate and t.enddate
) t
group by t.id, grp;
EDIT:
The following is the version on this SQL Fiddle:
with d as (
select d.*, row_number() over (order by date) as seqnum
from datetable d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(ds.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from (select t.*, 'abc' as id from table1 t) t join
d ds
on ds.dateid between t.startdate and t.enddate
) t
group by grp;
I believe this is working, but the date table doesn't have all the dates in it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Fill missing gaps in data using a date column - sql

Related

SQL Query - Combine rows based on multiple columns

How to calculate daily average from aggregate results with SQL?

SQL how to write a query that return missing date ranges?

Day after max date in data

To club the rows for week days

Categories

Resources