How to fill date gaps in MySQL? - sql

How i can fill date gaps in MySQL? Here is my query:
SELECT DATE(posted_at) AS date,
COUNT(*) AS total,
SUM(attitude = 'positive') AS positive,
SUM(attitude = 'neutral') AS neutral,
SUM(attitude = 'negative') AS negative
FROM `messages`
WHERE (`messages`.brand_id = 1)
AND (`messages`.`spam` = 0
AND `messages`.`duplicate` = 0
AND `messages`.`ignore` = 0)
GROUP BY date ORDER BY date
It returns proper result set - but i want to fill gaps between dates start and end by zeros. How i can do this?

You'll need to create a helper table and fill it with all dates from start to end, then just LEFT JOIN with that table:
SELECT d.dt AS date,
COUNT(*) AS total,
SUM(attitude = 'positive') AS positive,
SUM(attitude = 'neutral') AS neutral,
SUM(attitude = 'negative') AS negative
FROM dates d
LEFT JOIN
messages m
ON m.posted_at >= d.dt
AND m.posted_at < d.dt + INTERVAL 1 DAYS
AND spam = 0
AND duplicate = 0
AND ignore = 0
GROUP BY
d.dt
ORDER BY
d.dt
Basically, what you need here is a dummy rowsource.
MySQL is the only major system which lacks a way to generate it.
PostgreSQL implements a special function generate_series to do that, while Oracle and SQL Server can use recursion (CONNECT BY and recursive CTEs, accordingly).

I don't know whether MySQL will support the following/similar syntax; but if not, then you could just create and drop a temporary table.
--Inputs
declare #FromDate datetime, /*Inclusive*/
#ToDate datetime /*Inclusive*/
set #FromDate = '20091101'
set #ToDate = '20091130'
--Query
declare #Dates table (
DateValue datetime NOT NULL
)
set NOCOUNT ON
while #FromDate <= #ToDate /*Inclusive*/
begin
insert into #Dates(DateValue) values(#FromDate)
set #FromDate = #FromDate + 1
end
set NOCOUNT OFF
select dates.DateValue,
Col1...
from #Dates dates
left outer join SourceTableOrView data on
data.DateValue >= dates.DateValue
and data.DateValue < dates.DateValue + 1 /*NB: Exclusive*/
where ...?

Related

Return smalldatetime value from scalar function SELECT query

I'm looking to create a scalar function in SQL Server (2017) that leverages a calendar table I built awhile back in order to calculate and return a date a given number of business days forward in time from a given date. I have been struggling with how to pass the SMALLDATETIME return value back appropriately. To give some idea what I'm attempting:
CREATE FUNCTION dbo.AddBusDaysToDate
(
#startDate SMALLDATETIME,
#numBusDays INT,
)
RETURNS SMALLDATETIME
AS
BEGIN
DECLARE #rs SMALLDATETIME;
SELECT #rs = TOP(1) dt
FROM (
SELECT TOP(#numBusDays) dt
FROM dbo.OurCalendar
WHERE isWeekday = 1
AND isHoliday = 0
AND dt >= #startDate
ORDER BY dt ASC
) as ID
ORDER BY dt DESC
RETURN #rs
END
dt is a SMALLDATETIME data type on our calendar table.
The query itself runs as intended when values plugged in for the variables, but I was trying to repurpose a similar function that calculated the difference in business days between two points on the calendar, with a different data type. So I'm unsure if I'm pulling in a row to the #rs instead of the individual value, or how to separate/isolate that specific 'cell' from the SELECT query result. I expect I'm probably missing something very simple.
Any help or a point in the right direction would be very well appreciated.
I was able to resolve with the following:
CREATE FUNCTION dbo.AddBusDaysToDate
(
#startDate SMALLDATETIME,
#numBusDays INT,
)
RETURNS SMALLDATETIME
AS
BEGIN
DECLARE #rs SMALLDATETIME;
DECLARE #workdayModifier INT;
IF EXISTS (
SELECT dt FROM dbo.OurCalendar
WHERE dt = #startDate
AND isWeekday = 1
AND isHoliday = 0
)
SET #workdayModifier = 1
ELSE
SET #workdayModifier = 0
SELECT TOP(1) #rs = dt
FROM (
SELECT TOP(#numBusDays + #workdayModifier) dt
FROM dbo.OurCalendar
WHERE isWeekday = 1
AND isHoliday = 0
AND dt >= #startDate
ORDER BY dt ASC
) as ID
ORDER BY dt DESC
RETURN #rs
END

How to run Query in loops and counts the number of rows of each loop

I have a query that collects data for me, at the end of it I'm filtering on two dates and I count the number of rows.
FROM TAB
WHERE
(tab.transfer_date < '2019-03-11' AND Real_Updated_date >= '2019-03-11')
ORDER BY transfer_date
Is there a possibility to increase both dates by '1' till '2019-03-20'
and count and print how many rows I had in each day?
Thanks!
Full Query:
WITH TAB AS (
SELECT
[vortex_hvc].[vortex_dbo].material_history.updated_datetime
,[vortex_hvc].[vortex_dbo].material_history.transfer_date
,cast(
case
when [vortex_hvc].[vortex_dbo].material_history.transfer_date = [vortex_hvc].[vortex_dbo].material_history.updated_datetime then getdate()
else [vortex_hvc].[vortex_dbo].material_history.updated_datetime end as datetime
) as Real_Updated_date
FROM [vortex_hvc].[vortex_dbo].[vw_public_material_location]
join [vortex_hvc].[vortex_dbo].[vw_public_material_unit]
on vw_public_material_location.material_name = vw_public_material_unit.unit_number
JOIN [vortex_hvc].[vortex_dbo].[material_history]
ON [vortex_hvc].[vortex_dbo].vw_public_material_location.material_id = [vortex_hvc].[vortex_dbo].material_history.material_id
where
DateDiff(d,[vortex_hvc].[vortex_dbo].material_history.transfer_date, getdate()) < 30
AND
[vortex_hvc].[vortex_dbo].vw_public_material_location.quantity = 1
and
[vortex_hvc].[vortex_dbo].material_history.location_id in ('3492','3500','3981','3493','3504','3497','4140',
'3498', '3496','3627','4378','3512','4376','4542','4379','3802','4517','4410','4182','4758','3499','4897','4239','4820',
'4133','4377','4342','5042','5113','5358','5100','5550','5548','5549','5359',
'5594','5601','5614','5696','5701')
)
select tab.*
FROM TAB
where
(tab.transfer_date < '2019-03-11' ANd Real_Updated_date >= '2019-03-11')
order by transfer_date
You could do something like this:
DECLARE #dateFilter datetime, #enddate datetime
DECLARE #mycounts AS TABLE (mydate datetime, mycount int)
DECLARE #mydata AS TABLE (updated_datetime datetime, transfer_date datetime, real_updated_date datetime)
INSERT INTO #mydata
SELECT
[vortex_hvc].[vortex_dbo].material_history.updated_datetime
,[vortex_hvc].[vortex_dbo].material_history.transfer_date
,cast(
case
when [vortex_hvc].[vortex_dbo].material_history.transfer_date = [vortex_hvc].[vortex_dbo].material_history.updated_datetime then getdate()
else [vortex_hvc].[vortex_dbo].material_history.updated_datetime end as datetime
) as Real_Updated_date
FROM [vortex_hvc].[vortex_dbo].[vw_public_material_location]
join [vortex_hvc].[vortex_dbo].[vw_public_material_unit]
on vw_public_material_location.material_name = vw_public_material_unit.unit_number
JOIN [vortex_hvc].[vortex_dbo].[material_history]
ON [vortex_hvc].[vortex_dbo].vw_public_material_location.material_id = [vortex_hvc].[vortex_dbo].material_history.material_id
where
DateDiff(d,[vortex_hvc].[vortex_dbo].material_history.transfer_date, getdate()) < 30
AND
[vortex_hvc].[vortex_dbo].vw_public_material_location.quantity = 1
and
[vortex_hvc].[vortex_dbo].material_history.location_id in ('3492','3500','3981','3493','3504','3497','4140',
'3498', '3496','3627','4378','3512','4376','4542','4379','3802','4517','4410','4182','4758','3499','4897','4239','4820',
'4133','4377','4342','5042','5113','5358','5100','5550','5548','5549','5359',
'5594','5601','5614','5696','5701')
)
SET #dateFilter = '2019-03-11' --this is the first date used as filter
SET #enddate='2019-03-20' --this is the last one
WHILE #dateFilter <= #enddate
BEGIN
INSERT INTO #mycounts
SELECT #dateFilter, count(*) as mycount FROM #mydata tab
WHERE
(tab.transfer_date < #dateFilter AND Real_Updated_date >= #dateFilter)
SET #dateFilter = DATEADD(day,1,#dateFilter)
END
SELECT * FROM #mycounts
By using DATEADD you are going to be able to filter different months/years as well.

SQL Server - Efficient generation of dates in a range

Using SQL Server 2016.
I have a stored procedure that produces a list of options against a range of dates. Carriage options against days for clarity but unimportant to the specifics here.
The first step in the stored procedure generates a list of dates to store additional data against, and generating this list is taking substantially longer than the balance of the code. While this process is individual short, the number of calls means that this one piece of code is putting the system under more load than anything else.
With that in mind I have been testing efficiency of several options.
Iterative common table expression:
CREATE FUNCTION [dbo].[udf_DateRange_CTE] (#StartDate DATE,#EndDate DATE)
RETURNS #Return TABLE (Date DATE NOT NULL)
AS
BEGIN
WITH dates(date)
AS (SELECT #StartDate [Date]
UNION ALL
SELECT DATEADD(dd, 1, [Date])
FROM dates
WHERE [Date] < #EndDate
)
INSERT INTO #Return
SELECT date
FROM dates
OPTION (MAXRECURSION 0)
RETURN
END
A while loop:
CREATE FUNCTION [dbo].[udf_DateRange_While] (#StartDate DATE,#EndDate DATE)
RETURNS #Retun TABLE (Date DATE NOT NULL,PRIMARY KEY (Date))
AS
BEGIN
WHILE #StartDate <= #EndDate
BEGIN
INSERT INTO #Retun
VALUES (#StartDate)
SET #StartDate = DATEADD(DAY,1,#StartDate)
END
RETURN
END
A lookup from a pre-populated table of dates:
CREATE FUNCTION [dbo].[udf_DateRange_query] (#StartDate DATE,#EndDate DATE)
RETURNS #Return TABLE (Date DATE NOT NULL)
AS
BEGIN
INSERT INTO #Return
SELECT Date
FROM DateLookup
WHERE Date >= #StartDate
AND Date <= #EndDate
RETURN
END
In terms of efficiency I have test generating a years worth of dates, 1000 times and had the following results:
CTE: 10.0 Seconds
While: 7.7 Seconds
Query: 2.6 Seconds
From this the query is definitely the faster option but does require a permanent table of dates that needs to be created and maintained. This means that the query is no loner "self-contained" and it would be possible to request a date outside of the given date range.
Does anyone know of any more efficient ways of generating dates for a range, or any optimisation I can apply to the above?
Many thanks.
You can try like following. This should be fast compared CTE or WHILE loop.
DECLARE #StartDate DATETIME = Getdate() - 1000
DECLARE #EndTime DATETIME = Getdate()
SELECT *
FROM (SELECT #StartDate + RN AS DATE
FROM (SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) RN
FROM master..[spt_values]) T) T1
WHERE T1.DATE <= #EndTime
ORDER BY DATE
Note: This will work for day difference <= 2537 days
If you want to support more range, you can use CROSS JOIN on master..[spt_values] to generate range between 0 - 6436369 days like following.
DECLARE #StartDate DATETIME = Getdate() - 10000
DECLARE #EndTime DATETIME = Getdate()
SELECT #StartDate + RN AS DATE FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..[spt_values] T1
CROSS JOIN master..[spt_values] T2
) T
WHERE RN <= DATEDIFF(DAY,#StartDate,#EndTime)

How to avoid not to query tables or views in scalar functions?

I have scalar functions( 4 functions) in my View. It drastically reduces the view's performance. I believe the reason for that is I use SELECT queries in my scalar functions.
EG:
CREATE FUNCTION [dbo].[udf_BJs_GENERAL]
(
#TankSystemId int,
#TimeStamp datetime2(7)
)
RETURNS varchar(10)
AS
BEGIN
DECLARE #leakChk varchar(10);
DECLARE #allowableVariance float;
DECLARE #GallonsPumped int;
DECLARE #DailyOverOrShort float;
DECLARE #TimePeriod datetime2(7);
DECLARE #ReportDate datetime2(7)
SELECT TOP 1 #TimePeriod = Date
FROM [bjs].udv_DailySiraData
where TankSystemId=#TankSystemId ORDER BY Date DESC
SET #ReportDate=#TimePeriod
IF( #TimeStamp <= #TimePeriod)
SET #ReportDate=#TimeStamp
SELECT #GallonsPumped = SUM(GallonsPumped)
FROM [bjs].[udv_DailySiraData]
where TankSystemId=#TankSystemId
and Date <=#ReportDate and Date >= DATEADD(mm, DATEDIFF(mm,0,#ReportDate), 0)
SELECT #DailyOverOrShort = SUM(DailyVar)
FROM [bjs].[udv_DailySiraData]
where TankSystemId=#TankSystemId
and Date <=#ReportDate and Date >= DATEADD(mm, DATEDIFF(mm,0,#ReportDate), 0)
SELECT #allowableVariance= (#GallonsPumped/100) + 130
SET #leakChk='FAIL'
IF (#allowableVariance > ABS(#DailyOverOrShort))
SET #leakChk = 'PASS'
RETURN #leakChk;
How can i avoid such situations? Is there a way to do select queries in my View and pass that result to my scalar function?
Try this:
create function dbo.udf_BJs_GENERAL(
#TankSystemId int,
#TimeStamp datetime2(7)
) returns varchar(10) as
with dates as (
select top 1
ReportDate = case when #TimeStamp <= Date then #TimeStamp else Date
from bjs.udv_DailySiraData
where TankSystemId=#TankSystemId
order by Date desc
),
gallons as (
select
allowableVariance = ( sum(GallonsPumped)/100) + 130,
DailyOverOrShort = sum(DailyVar)
from bjs.udv_DailySiraData data
join dates
on data.Date <= dates.ReportDate
and date.Date >= dateadd(mm, datediffmm, 0, dates.ReportDate), 0)
where TankSystemId = #TankSystemId
)
select
leakChk = cast( case when allowableVariance > ABS(DailyOverOrShort))
then 'PASS' else 'FAIL' end as varchar(10) )
from gallons
your case is special, your have a special input parameter,assue the timestamp parameter is on Day level
This view will return check result of each TankSystemId on every day.
Then join will your query with TankSystemId and Day.
But if the input parameter is more detail. I think it is difficult to convert this function to view
CREATE view [dbo].[uvw_BJs_GENERAL]
AS
BEGIN
/*
SET #ReportDate=#TimePeriod
IF( #TimeStamp <= #TimePeriod)
SET #ReportDate=#TimeStamp
*/
SELECT TankSystemId,b.[Date]
,GallonsPumped = SUM(GallonsPumped),DailyOverOrShort = SUM(DailyVar)
,leakChk=CASE WHEN (SUM(GallonsPumped)/100) + 130)> ABS(SUM(DailyVar)) THEN 'PASS' ELSE 'FAIL' END
FROM [bjs].[udv_DailySiraData] AS a
INNER JOIN (
SELECT CONVERT(DATE,[Date]) AS [Date] FROM [bjs].[udv_DailySiraData] GROUP BY TankSystemId, CONVERT(DATE,[Date])
) b ON a.TankSystemId=b.TankSystemId AND DATEDIFF(d,a.[Date],b.[Date])>=0
-- and Date <=#ReportDate and Date >= DATEADD(mm, DATEDIFF(mm,0,#ReportDate), 0)
GROUP BY TankSystemId,b.[Date]
END

What is a good way to find gaps in a set of datespans?

What is a way to find gaps in a set of date spans?
For example, I have these date spans:
1/ 1/11 - 1/10/11
1/13/11 - 1/15/11
1/20/11 - 1/30/11
Then I have a start and end date of 1/7/11 and 1/14/11.
I want to be able to tell that between 1/10/11 and 1/13/11 there is a gap so the start and end date is not possible. Or I want to return only the datespans up to the first gap encountered.
If this can be done in SQL server that would be good.
I was thinking to go through each date to find out if it lands in a datespan... if it does not then there's a gap on that day.
Jump to 2nd last code block for: *I want to be able to tell that
between 1/10/11 and 1/13/11 there is
a gap so the start and end date is*
not possible.
Jump to last code block for: *I want to return only
the datespans up to the first gap
encountered.*
First of all, here's a virtual table to discuss
create table spans (date1 datetime, date2 datetime);
insert into spans select '20110101', '20110110';
insert into spans select '20110113', '20110115';
insert into spans select '20110120', '20110130';
This is a query that will list, individually, all the dates in the calendar
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
select distinct a.date1+v.number
from spans A
inner join master..spt_values v
on v.type='P' and v.number between 0 and datediff(d, a.date1, a.date2)
-- we don't care about spans that don't intersect with our range
where A.date1 <= #enddate
and #startdate <= A.date2
Armed with this query, we can now test to see if there are any gaps, by
counting the days in the calendar against the expected number of days
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
select case when count(distinct a.date1+v.number)
= datediff(d,#startdate, #enddate) + 1
then 'No gaps' else 'Gap' end
from spans A
inner join master..spt_values v
on v.type='P' and v.number between 0 and datediff(d, a.date1, a.date2)
-- we don't care about spans that don't intersect with our range
where A.date1 <= #enddate
and #startdate <= A.date2
-- count only those dates within our range
and a.date1 + v.number between #startdate and #enddate
Another way to do this is to just build the calendar from #start
to #end up front and look to see if there is a span with this date
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
-- startdate+v.number is a day on the calendar
select #startdate + v.number
from master..spt_values v
where v.type='P' and v.number between 0
and datediff(d, #startdate, #enddate)
-- run the part above this line alone to see the calendar
-- the condition checks for dates that are not in any span (gap)
and not exists (
select *
from spans
where #startdate + v.number between date1 and date2)
The query returns ALL dates that are gaps in the date range #start - #end
A TOP 1 can be added to just see if there are gaps
To return all records that are before the gap, use the query as a
derived table in a larger query
declare #startdate datetime, #enddate datetime
select #startdate = '20110107', #enddate = '20110114'
select *
from spans
where date1 <= #enddate and #startdate <= date2 -- overlaps
and date2 < ( -- before the gap
select top 1 #startdate + v.number
from master..spt_values v
where v.type='P' and v.number between 0
and datediff(d, #startdate, #enddate)
and not exists (
select *
from spans
where #startdate + v.number between date1 and date2)
order by 1 ASC
)
Assuming MySQL, something like this would work:
select #olddate := null;
select start_date, end_date, datediff(end_date, #olddate) as diff, #olddate:=enddate
from table
order by start_date asc, end_date asc
having diff > 1;
Basically: cache the previous row's end_date in the #olddate variable, and then do a diff on that "old" value with the currel enddate. THe having clause will return only the records where the difference between two rows is greater than a day.
disclaimer: Haven't tested this, but the basic query construct should work.
I want to be able to tell that between
1/10/11 and 1/13/11 there is a gap so
the start and end date is not
possible.
I think you're asking this question: does the data in your table have a gap between the start date and the end date?
I created a one-column table, date_span, and inserted your date spans into it.
You can identify a gap by counting the number of days between start date and end date, and comparing that the the number of rows in date_span for the same range.
select
date '2011-01-14' - date '2011-01-07' + 1 as elapsed_days,
count(*) from date_span
where cal_date between '2011-01-07' and '2011-01-14';
returns
elapsed_days count
-- --
8 6
Since they're not equal, there's a gap in the table "date_span" between 2011-01-07 and 2011-01-14. I'll stop there for now, because I'm really not certain what you're trying to do.