Need help to understand SQL query - sql

I have the following code that gets the months between two date ranges using a CTE
declare
#date_start DateTime,
#date_end DateTime
;WITH totalMonths AS
(
SELECT
DATEDIFF(MONTH, #date_start, #date_end) totalM
),
numbers AS
(
SELECT 1 num
UNION ALL
SELECT n.num + 1 num
FROM numbers n, totalMonths c
WHERE n.num <= c.totalM
)
SELECT
CONVERT(varchar(6), DATEADD(MONTH, numbers.num - 1, #date_start), 112)
FROM
numbers
OPTION (MAXRECURSION 0);
This works, but I do not understand how it works
Especially this part
numbers AS
(
SELECT 1 num
UNION ALL
SELECT n.num + 1 num
FROM numbers n, totalMonths c
WHERE n.num <= c.totalM
)
Thanks in advance, sorry for my English

This query is using two CTEs, one recursive, to generate a list of values from nothing (SQL isn't really good at doing this).
totalMonths AS (SELECT DATEDIFF(MONTH, #date_start, #date_end) totalM),
This is part is basically a convoluted way of binding the result of the DATEDIFF to the name totalM. This could've been implemented as just a variable if you can declare things:
DECLARE #totalM int = DATEDIFF(MONTH, #date_start, #date_end);
Then you would of course use #totalM to refer to the value.
numbers AS (
SELECT 1 num
UNION ALL
SELECT n.num+1 num FROM numbers n, totalMonths c
WHERE n.num<= c.totalM
)
This part is essentially a simple loop implemented using recursion to generate the numbers from 1 to totalMonths. The first SELECT specifies the first value (1) and the one after that specifies the next value, which is int greater than the previous one. Evaluating recursive CTEs has somewhat special semantics so it's a good idea to read up on them. Finally the WHERE specifies the stopping condition so that the recursion doesn't go on forever.
What all this does is generate an equivalent to a physical "numbers" table that just has one column the numbers from 1 onwards.
The SELECT at the very end uses the result of the numbers CTE to generate a bunch of dates.
Note that the OPTION (MAXRECURSION 0) at the end is also relevant to the recursive CTE. This disables the server-wide recursion depth limit so that the number generating query doesn't stop short if the range is very long, or a bothersome DBA set a very low default limit.

totalMonths query evaluates to a scalar result (single value) indicating the number of months that need to be generated. It probably makes more sense to just do this inline instead of using a named CTE.
numbers generates a sequence of rows with a column called num starting at 1 and ending at totalM + 1 which was computed in the previous step. It is able to reference this value by means of a cross join. Since there's only one row it essentially just appends that one column to the table horizontally. The query is recursive so each pass adds a new row to the result by adding 1 to the last added row (really just the one column) until the value of the previously added row exceeds totalM. The first half of the union is the starting value; the second half refers to itself via from numbers and incrementally builds the result in a sort of loop.
The output is derived from the numbers input. One is subtracted from each num giving a range from 0 to totalM and that value is treated as the number of months to add to the starting date. The date value is converted to a varchar of length six which means the final two characters containing the day are truncated.
Suppose that #date_start is January 31, 2016 and #date_end is March 1, 2016. There is never any comparison of the actual date values so it doesn't matter that March 31 is generated in the sequence but also falls later than the passed #date_end value. Any dates in the respective start and end months can be chosen to generate identical sequences.

SELECT 1 num
is your starting point of your recursive CTE and that is (numbers n) in the first teration.In the second iteration the out put of the first
SELECT n.num+1 num FROM numbers n, totalMonths c
WHERE n.num <= c.totalM
becomes numbers (n) and so on.

Related

SQL Server : average count of alerts per day, not including days with no alerts

I have a table that acts as a message log, with the two key tables being TIMESTAMP and TEXT. I'm working on a query that grabs all alerts (from TEXT) for the past 30 days (based on TIMESTAMP) and gives a daily average for those alerts.
Here is the query so far:
--goback 30 days start at midnight
declare #olderdate as datetime
set #olderdate = DATEADD(Day, -30, DATEDIFF(Day, 0, GetDate()))
--today at 11:59pm
declare #today as datetime
set #today = dateadd(ms, -3, (dateadd(day, +1, convert(varchar, GETDATE(), 101))))
print #today
--Grab average alerts per day over 30 days
select
avg(x.Alerts * 1.0 / 30)
from
(select count(*) as Alerts
from MESSAGE_LOG
where text like 'The process%'
and text like '%has alerted%'
and TIMESTAMP between #olderdate and #today) X
However, I want to add something that checks whether there were any alerts for a day and, if there are no alerts for that day, doesn't include it in the average. For example, if there are 90 alerts for a month but they're all in one day, I wouldn't want the average to be 3 alerts per day since that's clearly misleading.
Is there a way I can incorporate this into my query? I've searched for other solutions to this but haven't been able to get any to work.
This isn't written for your query, as I don't have any DDL or sample data, thus I'm going to provide a very simple example instead of how you would do this.
USE Sandbox;
GO
CREATE TABLE dbo.AlertMessage (ID int IDENTITY(1,1),
AlertDate date);
INSERT INTO dbo.AlertMessage (AlertDate)
VALUES('20190101'),('20190101'),('20190105'),('20190110'),('20190115'),('20190115'),('20190115');
GO
--Use a CTE to count per day:
WITH Tots AS (
SELECT AlertDate,
COUNT(ID) AS Alerts
FROM dbo.AlertMessage
GROUP BY AlertDate)
--Now the average
SELECT AVG(Alerts*1.0) AS DayAverage
FROM Tots;
GO
--Clean up
DROP TABLE dbo.AlertMessage;
You're trying to compute a double-aggregate: The average of daily totals.
Without using a CTE, you can try this as well, which is generalized a bit more to work for multiple months.
--get a list of events per day
DECLARE #Event TABLE
(
ID INT NOT NULL IDENTITY(1, 1)
,DateLocalTz DATE NOT NULL--make sure to handle time zones
,YearLocalTz AS DATEPART(YEAR, DateLocalTz) PERSISTED
,MonthLocalTz AS DATEPART(MONTH, DateLocalTz) PERSISTED
)
/*
INSERT INTO #Event(EntryDateLocalTz)
SELECT DISTINCT CONVERT(DATE, TIMESTAMP)--presumed to be in your local time zone because you did not specify
FROM dbo.MESSAGE_LOG
WHERE UPPER([TEXT]) LIKE 'THE PROCESS%' AND UPPER([TEXT]) LIKE '%HAS ALERTED%'--case insenitive
*/
INSERT INTO #Event(DateLocalTz)
VALUES ('2018-12-31'), ('2019-01-01'), ('2019-01-01'), ('2019-01-01'), ('2019-01-12'), ('2019-01-13')
--get average number of alerts per alerting day each month
-- (this will not return months with no alerts,
-- use a LEFT OUTER JOIN against a month list table if you need to include uneventful months)
SELECT
YearLocalTz
,MonthLocalTz
,AvgAlertsOfAlertingDays = AVG(CONVERT(REAL, NumDailyAlerts))
FROM
(
SELECT
YearLocalTz
,MonthLocalTz
,DateLocalTz
,NumDailyAlerts = COUNT(*)
FROM #Event
GROUP BY YearLocalTz, MonthLocalTz, DateLocalTz
) AS X
GROUP BY YearLocalTz, MonthLocalTz
ORDER BY YearLocalTz ASC, MonthLocalTz ASC
Some things to note in my code:
I use PERSISTED columns to get the month and year date parts (because I'm lazy when populating tables)
Use explicit CONVERT to escape integer math that rounds down decimals. Multiplying by 1.0 is a less-readable hack.
Use CONVERT(DATE, ...) to round down to midnight instead of converting back and forth between strings
Do case-insensitive string searching by making everything uppercase (or lowercase, your preference)
Don't subtract 3 milliseconds to get the very last moment before midnight. Change your semantics to interpret the end of a time range as exclusive, instead of dealing with the precision of your datatypes. The only difference is using explicit comparators (i.e. use < instead of <=). Also, DATETIME resolution is 1/300th of a second, not 3 milliseconds.
Avoid using built-in keywords as column names (i.e. "TEXT"). If you do, wrap them in square brackets to avoid ambiguity.
Instead of dividing by 30 to get the average, divide by the count of distinct days in your results.
select
avg(x.Alerts * 1.0 / x.dd)
from
(select count(*) as Alerts, count(distinct CAST([TIMESTAMP] AS date)) AS dd
...

How do you create a list of all the days in the month using a non recursive approach? (Recursive Sample Provided)

I am interested in the days of the current month. The following code is how I would do this recursively:
WITH AllMonthDays as (
SELECT n = 1
UNION ALL
SELECT n + 1 FROM AllMonthDays WHERE n + 1 <= DAY(EOMONTH(GETDATE()))
)
SELECT
datefromparts(YEAR(GETDATE()), MONTH(GETDATE()), n) as dates
FROM AllMonthDays;
I appreciate the above may not be optimal. But my question here is how would I achieve the above non recursively and not hard coding any dates.
EDIT: Removed CAST to VCHAR Entirely, thanks to the suggestion from 'Gordon Linoff'.
Often, a table called master..spt_values is used for this purpose. It is rather undocumented but generally available. Actually, any table with at least 31 rows can be used.
So, to generate the numbers and construct the date:
with n as (
select top 31 row_number() over (order by (select null)) as n
from master..spt_values
)
select datefromparts(year(getdate()), month(getdate()), n.n) as thedate
from n
where n.n <= day(eomonth(getdate()));
Note: datefromparts() is a very convenient way to construct a date. Also, when you use varchar in SQL Server, always include a length. The default varies by context and finding bugs caused by this problem can be quite difficult.

db2 suppress recursive warning

I have a recursive sql that I am running which works but gives me the following warning.
SQL0347W The recursive common table expression "DT_LAST_YEAR" may
contain an infinite loop. SQLSTATE=01605
How can I get rid of the warning?
INSERT INTO REP_MAN_TRAN_COUNTS (SITEDIRECTORYID, BUSINESSDATE, TRANCOUNT)
WITH dt_this_year (level, seqdate) AS
(
SELECT 1, date(current timestamp) -7 DAYS FROM sysibm.sysdummy1
UNION ALL
SELECT level, seqdate + level days FROM dt_this_year WHERE level < 1000 AND seqdate + 1 days < date(current timestamp)
)
,dt_last_year (level, seqdate) AS
(
SELECT 1, date(current timestamp) -7 DAYS - 1 year FROM sysibm.sysdummy1
UNION ALL
SELECT level, seqdate + level days FROM dt_last_year WHERE level < 1000 AND seqdate + 1 days < date(current timestamp) -1 year
)
select 10049, date(dts.calendarday), count(*) trancount
from (
SELECT seqdate AS calendarday FROM dt_this_year
UNION
SELECT seqdate AS calendarday FROM dt_last_year
) dts LEFT JOIN ccftrxheader ccf
ON date(dts.calendarday) = date(ccf.businessdate)
WHERE ccf.sitedirectoryid=10049
GROUP BY ccf.sitedirectoryid,dts.calendarday
How do you get rid of warnings?
By changing the code so that it no longer generates the warning in the first place. Hiding warnings is problematic, because it often disguises a potentially larger problem. I'm fairly certain it's complaining here because the termination clause you provide for level can't ever be reached (because you never manipulate it).
Personally, I'd probably re-write your query into something like this:
INSERT INTO Rep_Man_Tran_Counts (siteDirectoryId, businessDate, tranCount)
WITH dt_Calendar_Data (level, calendarDay) AS
(SELECT l, c
FROM (VALUES (1, CURRENT_DATE - 7 DAYS),
(1, CURRENT_DATE - 7 DAYS - 1 YEAR)) t(l, c)
UNION ALL
SELECT level + 1, calendarDay + 1 DAYS
FROM dt_Calendar_Data
WHERE level < 7)
SELECT 10049, dtCal.calendarDay, COALESCE(COUNT(*), 0) as tranCount
FROM dt_Calendar_Data dtCal
LEFT JOIN ccftrxHeader ccf
ON ccf.businessDate = dtCal.calendarDay
AND ccf.siteDirectoryId = 10049
GROUP BY dtCal.seqDate
(untested, as you've provided no sample data, and I don't have a DB2 instance)
I've assumed you actually wanted a LEFT JOIN, as opposed to the regular INNER JOIN you were actually getting (due to the condition in the WHERE clause, and probably the GROUP BY as well). To avoid adding nulls to your data, I've wrapped the count in COALESCE(...), which will give you 0 instead.
I've also assumed that businessDate is a DATE type, and not a timestamp. If it is a timestamp this query needs to be adjusted (note that the function you were using would for the optimizer to ignore indices).
Note that order of operations with dates matter! Thankfully when dealing with year ranges, you only have one day to worry about in the Gregorian calendar (February 29th). Your current ordering will compare identical calendar days at the start of the range (which one has the "gap" depends on whether this year or last year is a leap year).
EDIT:
Sure, lets look at that CTE:
FROM(VALUES (1, CURRENT_DATE - 7 DAYS),
(1, CURRENT_DATE - 7 DAYS - 1 YEAR)) t(l, c)
This is just a standard VALUES clause used as a table reference. This is the SQL Standard way to construct a small temp table (Rather than referencing the dummy tables, which tend to be vendor-specific). If the statement is run on 2014-02-26 then the resulting table will be:
t
l c
===============
1 "2014-02-19"
1 "2013-02-19"
These columns get renamed by the column listing of the CTE, which are then referenced in the join (and in the case of a recursive CTE, by the recursive portion).
This then forms the starting data for the rest of the recursive query:
UNION ALL
SELECT level + 1, calendarDay + 1 DAYS
FROM dt_Calendar_Data
WHERE level < 7
In DB2 (and some other RDBMSs), recursive CTEs essentially execute iteratively, acting off the results of the "previous" invocation. Every time around, we increment level, and add another day to calendarDay. The "next" rows are then:
level calendarDay
======================
2 "2014-02-20"
2 "2013-02-20"
This continues until the "previous" row has level = 7, which means a new row is not generated (check the WHERE clause). In general, it's best to only have one termination condition (and make progress every iteration), to make it easier for the optimizer to spot. The resulting data is then in the ranges:
level calendarDay
=====================
1 "2014-02-19"
. .....
7 "2014-02-26"
1 "2013-02-19"
. .....
7 "2013-02-26"
... as a side note, I generated the this year/last year data together to make the number of references shorter. If you only needed the one year, level is unnecessary.

Find closest date in SQL Server

I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate

3rd <day_of_week> of the Month - MySQL

I'm working on a recurrence application for events. I have a date range of say, January 1 2010 to December 31 2011. I want to return all of the 3rd Thursdays (arbitrary) of the each month, efficiently. I could do this pretty trivially in code, the caveat is that it must be done in a stored procedure. Ultimately I'd want something like:
CALL return_dates(event_id);
That event_id has a start_date of 1/1/2010 and end_date of 12/31/2011. Result set would be something like:
1/20/2010
2/14/2010
3/17/2010
4/16/2010
5/18/2010
etc.
I'm just curious what the most efficient method of doing this would be, considering I might end up with a very large result set in my actual usage.
One idea that comes to mind - you can create a table and store the dates you're interested in there.
Ok, I haven't tested it, but I think the most efficient way of doing it is via a tally table which is a useful thing to have in the db anyway:
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[num_seq]') AND type in (N'U'))
DROP TABLE [dbo].[num_seq];
SELECT TOP 100000 IDENTITY(int,1,1) AS n
INTO num_seq
FROM MASTER..spt_values a, MASTER..spt_values b;
CREATE UNIQUE CLUSTERED INDEX idx_1 ON num_seq(n);
You can then use this to build up the date range between the two dates. It's fast because
it just uses the index (in fact often faster than a loop, so I'm led to believe)
create procedure getDates
#eventId int
AS
begin
declare #startdate datetime
declare #enddate datetime
--- get the start and end date, plus the start of the month with the start date in
select #startdate=startdate,
#enddate=enddate
from events where eventId=#eventId
select
#startdate+n AS date,
from
dbo.num_seq tally
where
tally.n<datediff(#monthstart, #enddate) and
Datepart(dd,#startdate+n) between 15 and 21 and
Datepart(dw, #startdate+n) = '<day>'
Aside from getting the start and end dates, the third x id each month must be between the 15th and the 21st inclusive.
The day names in that range must be unique, so we can locate it straight away.
If you wanted the second dayname, just modify the range appropriately or use a parameter to calculate it.
It constucts a date table using the startdate, and then adding days on (via the list of numbers in the tally table) until it reaches the end date.
Hope it helps!