Weekly find events - sql

I need to look back two years by week to find if there have been multiple events for a week. Like if 7/1/19-7/8/19 if there was 3 events that member would appear. Is there any way to do this other than a giant case statement for each week? Like case when event = y and event_date between todate('07/01/2019','mm/dd/yyyy') and todate('07/08/2019','mm/dd/yyyy') then 1 else 0 end. Would I need to do that for all 104 weeks?
select distinct prov
,svcdate
,svccod
,membno
,unitct
from claim
where svcdate > '20170911'
That query returns the name- bike shop, svcdate(event date)- '20180812, code - h499, member id- 456, and units- 5
Thanks.

You can group by the week of the year, and the year itself. The HAVING clause then allows you to filter based on aggregates:
--Set up example test table and populate with example data
DECLARE #Events TABLE (eventDate DATETIME)
INSERT INTO #Events VALUES ('20180101'), ('20180102'), ('20180103')
SELECT DATEPART(wk, eventDate), DATEPART(year, eventDate)
FROM #Events
WHERE eventDate BETWEEN #START_DATE AND #END_DATE
GROUP BY DATEPART(wk, eventDate), DATEPART(year, eventDate)
having COUNT(1) > 1
The above will return the week number and the year, and you can convert that into a more friendly form.
Edit: Having thought about this again - it could depend if a) you're just interested in weeks with multiple events or b) with more than a specified number of events. If a), you could adopt a simpler WHERE EXISTS (I wasn't sure what your PK was in that table, so have called it claimID):
SELECT *
FROM claim c
WHERE EXISTS (SELECT 1
FROM claim c2
WHERE c.claimID > c2.claimID
and (DATEPART(dw, c.svcdate) = DATEPART(dw, c2.svcdate) AND DATEPART(year, c.svcdate) = DATEPART(year, c2.svcdate))
The above falls down if you're after weeks with n or more events in it - then I think the first example works better.
It also depends on what output you're after - are you after the claim rows themselves, or just a list of dates?

Related

Match between tables with New, Not New

I am still learning TSQL at the moment and im new to here so forgive me if Ive not done this right.
I have a table that each day loads new days data. Each day that loads has a report date for the previous day.
I want to get yesterdays data (eg - 17/09/2019) from the table, and I want to look at the data in the same table from the day before that (eg - 16/09/2019) and I want to run a check for the reference number and if the Reference number appears on the day before then I want it to say Not New, and if it does match to the day before then I want it to say New.
The columns I have is :
ReferenceNumber, ReportData, NewAppt
NewAppt column will be where it put the outcome of New/Not New
Something like this should work:
WITH Yesterday AS (
SELECT DISTINCT
ReferenceNumber,
CONVERT(DATE, ReportDate) AS ReportDate
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -1, GETDATE())),
DayBeforeYesterday AS (
SELECT DISTINCT
ReferenceNumber
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -2, GETDATE()))
SELECT
y.ReferenceNumber,
y.ReportDate,
CASE
WHEN x.ReferenceNumber IS NOT NULL THEN 0
ELSE 1
END AS NewAppointment
FROM
Yesterday y
LEFT JOIN DayBeforeYesterday x ON x.ReferenceNumber = y.ReferenceNumber;
Make a list of all the DISTINCT reference numbers from each day, and then join them into one big list, with the logic to see if there was a reference number yesterday that was also in the list from the day before yesterday.
I suppose your column ReportData is some sort of 'date' type and contains only the date (no time).
Furthermore, for each date, there should be at most 1 record for a specific ReferenceNumber.
In that case, try this:
SELECT t1.ReferenceNumber,
t1.ReportData,
CASE
WHEN t2.ReferenceNumber IS NULL THEN 'New'
ELSE 'Not New'
END AS NewAppt
FROM my_table t1
LEFT OUTER JOIN my_table t2
ON t1.ReferenceNumber = t2.ReferenceNumber
AND t2.ReportData = DATEADD(day, -1, t1.ReportData)
WHERE t1.ReportData = '2019-09-17';
Here's an approach using LAG which removes the need to join to the same table several times and instead just checks the preceding row for that Reference Number.
Note that in my interpretation of your request if a Reference Number disappears for a day and then returns then it's flagged as new. You can adapt the query to simply check if the number has appeared at any point in the past if that's not what you need.
CREATE TABLE #TestData (ReferenceNumber int,Reportdata date)
INSERT INTO #TestData
VALUES (1,'2019-01-16'),(1,'2019-01-17'),(1,'2019-01-18'),(2,'2019-01-18'),(3,'2019-01-17'),(3,'2019-01-18'),(4,'2019-01-17')
SELECT
ReferenceNumber
,ReportData
,IIF(
LAG(ReportData) OVER(PARTITION BY ReferenceNumber ORDER BY ReportData)
= dateadd(day,-1,ReportData)
,'Not New'
,'New'
) AS NewAppt
FROM #TestData

MSSQL Date Analysis

I need to write a query where it looks at a plethora of dates and determines if that date was 3 or more years ago, 2 or more years ago, 1 or more year ago, 6 or more months ago, or less than 6 months ago.
Is there a way to do this without writing in physical dates, so that the analysis can be run again later without needing to change the dates?
I have not started to write the query yet, but I have been trying to map it out first.
You should use case. I would recommend something like:
select (case when datecol < dateadd(year, -3, getdate()) as '3 years ago'
when datecol < dateadd(year, -2, getdate()) as '2 years ago'
. . .
end)
I specifically do not recommend using datediff(). It is counterintuitive because it counts the number of "boundaries" between two dates. So, 2016-12-31 and 2017-01-01 are one year apart.
You can use the DATEDIFF function to calculate the number of months, days, years, etc. between two dates, e.g.
select datediff(day, '2016-01-01', '2017-01-01')
returns 366, because 2016 was a leap year
To get the current date, use the GETDATE() function.
I tend to use a generic Tier Table for several reasons.
Logic is moved from code.
Alternate Tiers may be deployed depending on your audience.
Most importantly, things change.
The following will generate a series of dates, and then summarize by the desired tier. I should add, this is a simplified example
Example
-- Create Sample Tier Table
Declare #Tier table (Tier_Group varchar(50),Tier_Seq int,Tier_Title varchar(50),Tier_R1 int,Tier_R2 int)
Insert into #Tier values
('MyAgeTier',1,'+3 Years' ,36,999999)
,('MyAgeTier',2,'2 - 3 Years' ,24,36)
,('MyAgeTier',3,'1 - 2 Years' ,12,24)
,('MyAgeTier',4,'6 Mths - 1 Year',6 ,12)
,('MyAgeTier',5,'<6 Mths' ,0 ,6)
,('MyAgeTier',6,'Total ' ,0 ,999999)
Select Tier_Title
,Dates = count(*)
,MinDate = min(D)
,MaxDate = max(D)
From #Tier A
Join (
-- Your Actual Source
Select Top (DateDiff(DAY,'2010-01-01','2017-07-31')+1)
D=cast(DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),'2010-01-01') as date)
From master..spt_values n1,master..spt_values n2
) B
On Tier_Group = 'MyAgeTier' and DateDiff(MONTH,D,GetDate()) between Tier_R1 and Tier_R2-1
Group By Tier_Title,Tier_R1
Order by Tier_R1 Desc
Returns (this example)

Query to check number of records created in a month.

My table creates a new record with timestamp daily when an integration is successful. I am trying to create a query that would check (preferably automated) the number of days in a month vs number of records in the table within a time frame.
For example, January has 31 days, so i would like to know how many days in january my process was not successful. If the number of records is less than 31, than i know the job failed 31 - x times.
I tried the following but was not getting very far:
SELECT COUNT (DISTINCT CompleteDate)
FROM table
WHERE CompleteDate BETWEEN '01/01/2015' AND '01/31/2015'
Every 7 days the system executes the job twice, so i get two records on the same day, but i am trying to determine the number of days that nothing happened (failures), so i assume some truncation of the date field is needed?!
One way to do this is to use a calendar/date table as the main source of dates in the range and left join with that and count the number of null values.
In absence of a proper date table you can generate a range of dates using a number sequence like the one found in the master..spt_values table:
select count(*) failed
from (
select dateadd(day, number, '2015-01-01') date
from master..spt_values where type='P' and number < 365
) a
left join your_table b on a.date = b.CompleteDate
where b.CompleteDate is null
and a.date BETWEEN '01/01/2015' AND '01/31/2015'
Sample SQL Fiddle (with count grouped by month)
Assuming you have an Integers table*. This query will pull all dates where no record is found in the target table:
declare #StartDate datetime = '01/01/2013',
#EndDate datetime = '12/31/2013'
;with d as (
select *, date = dateadd(d, i - 1 , #StartDate)
from dbo.Integers
where i <= datediff(d, #StartDate, #EndDate) + 1
)
select d.date
from d
where not exists (
select 1 from <target> t
where DATEADD(dd, DATEDIFF(dd, 0, t.<timestamp>), 0) = DATEADD(dd, DATEDIFF(dd, 0, d.date), 0)
)
Between is not safe here
SELECT 31 - count(distinct(convert(date, CompleteDate)))
FROM table
WHERE CompleteDate >= '01/01/2015' AND CompleteDate < '02/01/2015'
You can use the following query:
SELECT DATEDIFF(day, t.d, dateadd(month, 1, t.d)) - COUNT(DISTINCT CompleteDate)
FROM mytable
CROSS APPLY (SELECT CAST(YEAR(CompleteDate) AS VARCHAR(4)) +
RIGHT('0' + CAST(MONTH(CompleteDate) AS VARCHAR(2)), 2) +
'01') t(d)
GROUP BY t.d
SQL Fiddle Demo
Explanation:
The value CROSS APPLY-ied, i.e. t.d, is the ANSI string of the first day of the month of CompleteDate, e.g. '20150101' for 12/01/2015, or 18/01/2015.
DATEDIFF uses the above mentioned value, i.e. t.d, in order to calculate the number of days of the month that CompleteDate belongs to.
GROUP BY essentially groups by (Year, Month), hence COUNT(DISTINCT CompleteDate) returns the number of distinct records per month.
The values returned by the query are the differences of [2] - 1, i.e. the number of failures per month, for each (Year, Month) of your initial data.
If you want to query a specific Year, Month then just simply add a WHERE clause to the above:
WHERE YEAR(CompleteDate) = 2015 AND MONTH(CompleteDate) = 1

Calculating Open incidents per month

We have Incidents in our system with Start Time and Finish Time and project name (and other info) .
We would like to have report: How many Incidents has 'open' status per month per project.
Open status mean: Not finished.
If incident is created in December 2009 and closed in March 2010, then it should be included in December 2009, January and February of 2010.
Needed structure should be like this:
Project Year Month Count
------- ------ ------- -------
Test 2009 December 2
Test 2010 January 10
Test 2010 February 12
....
In SQL Server:
SELECT
Project,
Year = YEAR(TimeWhenStillOpen),
Month = DATENAME(month, MONTH(TimeWhenStillOpen)),
Count = COUNT(*)
FROM (
SELECT
i.Project,
i.Incident,
TimeWhenStillOpen = DATEADD(month, v.number, i.StartTime)
FROM (
SELECT
Project,
Incident,
StartTime,
FinishTime = ISNULL(FinishTime, GETDATE()),
MonthDiff = DATEDIFF(month, StartTime, ISNULL(FinishTime, GETDATE()))
FROM Incidents
) i
INNER JOIN master..spt_values v ON v.type = 'P'
AND v.number BETWEEN 0 AND MonthDiff - 1
) s
GROUP BY Project, YEAR(TimeWhenStillOpen), MONTH(TimeWhenStillOpen)
ORDER BY Project, YEAR(TimeWhenStillOpen), MONTH(TimeWhenStillOpen)
Briefly, how it works:
The most inner subselect, that works directly on the Incidents table, simply kind of 'normalises' the table (replaces NULL finish times with the current time) and adds a month difference column, MonthDiff. If there can be no NULLs in your case, just remove the ISNULL expression accordingly.
The outer subselect uses MonthDiff to break up the time range into a series of timestamps corresponding to the months where the incident was still open, i.e. the FinishTime month is not included. A system table called master..spt_values is also employed there as a ready-made numbers table.
Lastly, the main select is only left with the task of grouping the data.
A useful technique here is to create either a table of "all" dates (clearly that would be infinite so I mean a sufficiently large range for your purposes) OR create two tables: one of all the months (12 rows) and another of "all" years.
Let's assume you go for the 1st of these:
create table all_dates (d date)
and populate as appropriate. I'm going to define your incident table as follows
create table incident
(
incident_id int not null,
project_id int not null,
start_date date not null,
end_date date null
)
I'm not sure what RDBMS you are using and date functions vary a lot between them so the next bit may need adjusting for your needs.
select
project_id,
datepart(yy, all_dates.d) as "year",
datepart(mm, all_dates.d) as "month",
count(*) as "count"
from
incident,
all_dates
where
incident.start_date <= all_dates.d and
(incident.end_date >= all_dates.d or incident.end_date is null)
group by
project_id,
datepart(yy, all_dates.d) year,
datepart(mm, all_dates.d) month
That is not going to quite work as we want as the counts will be for every day that the incident was open in each month. To fix this we either need to use a subquery or a temporary table and that really depends on the RDBMS...
Another problem with it is that, for open incidents it will show them against all future months in your all_dates table. adding a all_dates.d <= today solves that. Again, different RDBMSs have different methods of giving back now/today/systemtime...
Another approach is to have an all_months rather than all_dates table that just has the date of first of the month in it:
create table all_months (first_of_month date)
select
project_id,
datepart(yy, all_months.first_of_month) as "year",
datepart(mm, all_months.first_of_month) as "month",
count(*) as "count"
from
incident,
all_months
where
incident.start_date <= dateadd(day, -1, dateadd(month, 1, first_of_month)
(incident.end_date >= first_of_month or incident.end_date is null)
group by
project_id,
datepart(yy, all_months.first_of_month),
datepart(mm, all_months.first_of_month)

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.