SQL find duplicate records from past 7 days - sql

Hi I am trying to find duplicate webvisits within the past 7 days, I have built a query but it is taking too long to run. Any help in optimizing this query would be much appreciated. I am finding duplicates using the visitorguid.
WITH TooClose
AS
(
SELECT
a.visitid AS BeforeID,
b.visitID AS AfterID,
a.omniturecid as [Before om id],
b.omniturecid as [after om id],
a.pubid as [Before pub id],
b.pubid as [after pub id],
a.VisitorGuid as [Before guid],
b.VisitorGuid as [after guid],
a.date as [Before date],
b.date as [after date]
FROM
webvisits a
INNER JOIN WebVisits b ON a.VisitorGuid = b.VisitorGuid
AND a.date < b.Date
AND DATEDIFF(DAY, a.date, b.date) < 7
Where a.Date >= '7/1/2015')
SELECT
*
FROM
TooClose
WHERE
BeforeID NOT IN (SELECT AfterID FROM TooClose)

If I understand your question correctly, you are trying to find all duplicate webvisits within the past 7 days. Not sure what qualifies as a duplicate webvisit, but here is my attempt to what might work for you:
;WITH q1
AS (
SELECT a.VisitorGuid
,a.date
FROM webvisits a
WHERE a.DATE >= DATEADD(DAY, -7, cast(getdate() as date))
)
,q2 AS
(SELECT q1.VisitorGuid
,count(*) as rcount
FROM q1
GROUP BY q1.VisitorGuid
)
SELECT q2.VisitorGuid
FROM q2
WHERE q2.rcount > 1
SQL Fiddle Demo
UPDATED
;WITH q1
AS (
SELECT a.VisitorGuid
,a.date
,a.omniturecid
FROM webvisits a
WHERE a.DATE >= DATEADD(DAY, -7, cast(getdate() as date))
)
,q2 AS
(SELECT q1.VisitorGuid
,count(*) as rcount
FROM q1
GROUP BY q1.VisitorGuid
HAVING Count(*)> 1
)
SELECT q1.VisitorGuid,
q1.omniturecid,
q1.date
FROM q1
INNER JOIN q2 on q1.VisitorGuid = q2.VisitorGuid
SQL Fiddle Demo2

As others have pointed out, it is a good idea to provide sample data. Preferable as an sqlfiddle or as create and insert statements. Here's one approach, I'm to lazy to invent the structure and fill it with data to test with, so it might contain some errors:
SELECT ... FROM (
SELECT VisitorGuid
, date
, lead(date) over (partition by visitorguid
order by date) as next_date
, omniturecid
, lead(omniturecid) over (partition by visitorguid
order by date) as next_omniturecid
, ...
, lead(...) ...
FROM webvisits a
) as x
WHERE DATEDIFF(DAY, date, next_date) < 7

Related

Taking most recent values in sum over date range

I have a table which has the following columns: DeskID *, ProductID *, Date *, Amount (where the columns marked with * make the primary key). The products in use vary over time, as represented in the image below.
Table format on the left, and a (hopefully) intuitive representation of the data on the right for one desk
The objective is to have the sum of the latest amounts of products by desk and date, including products which are no longer in use, over a date range.
e.g. using the data above the desired table is:
So on the 1st Jan, the sum is 1 of Product A
On the 2nd Jan, the sum is 2 of A and 5 of B, so 7
On the 4th Jan, the sum is 1 of A (out of use, so take the value from the 3rd), 5 of B, and 2 of C, so 8 in total
etc.
I have tried using a partition on the desk and product ordered by date to get the most recent value and turned the following code into a function (Function1 below) with #date Date parameter
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum' from (
select #date 'Date', t.DeskID, t.ProductID, t.Amount
, row_number() over (partition by t.DeskID, t.ProductID order by t.Date desc) as roworder
from Table1 t
where 1 = 1
and t.Date <= #date
) t
where t.roworder = 1
group by t.DeskID
And then using a utility calendar table and cross apply to get the required values over a time range, as below
select * from Calendar c
cross apply Function1(c.CalendarDate)
where c.CalendarDate >= '20190101' and c.CalendarDate <= '20191009'
This has the expected results, but is far too slow. Currently each desk uses around 50 products, and the products roll every month, so after just 5 years each desk has a history of ~3000 products, which causes the whole thing to grind to a halt. (Roughly 30 seconds for a range of a single month)
Is there a better approach?
Change your function to the following should be faster:
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum'
FROM (SELECT m.DeskID, m.ProductID, MAX(m.[Date) AS MaxDate
FROM Table1 m
where m.[Date] <= #date) d
INNER JOIN Table1 t
ON d.DeskID=t.DeskID
AND d.ProductID=t.ProductID
and t.[Date] = d.MaxDate
group by t.DeskID
The performance of TVF usually suffers. The following removes the TVF completely:
-- DROP TABLE Table1;
CREATE TABLE Table1 (DeskID int not null, ProductID nvarchar(32) not null, [Date] Date not null, Amount int not null, PRIMARY KEY ([Date],DeskID,ProductID));
INSERT Table1(DeskID,ProductID,[Date],Amount)
VALUES (1,'A','2019-01-01',1),(1,'A','2019-01-02',2),(1,'B','2019-01-02',5),(1,'A','2019-01-03',1)
,(1,'B','2019-01-03',4),(1,'C','2019-01-03',3),(1,'B','2019-01-04',5),(1,'C','2019-01-04',2),(1,'C','2019-01-05',2)
GO
DECLARE #StartDate date=N'2019-01-01';
DECLARE #EndDate date=N'2019-01-05';
;WITH cte_p
AS
(
SELECT DISTINCT DeskID,ProductID
FROM Table1
WHERE [Date] <= #EndDate
),
cte_a
AS
(
SELECT #StartDate AS [Date], p.DeskID, p.ProductID, ISNULL(a.Amount,0) AS Amount
FROM (
SELECT t.DeskID, t.ProductID
, MAX(t.Date) AS FirstDate
FROM Table1 t
WHERE t.Date <= #StartDate
GROUP BY t.DeskID, t.ProductID) f
INNER JOIN Table1 a
ON f.DeskID=a.DeskID
AND f.ProductID=a.ProductID
AND f.[FirstDate]=a.[Date]
RIGHT JOIN cte_p p
ON p.DeskID=a.DeskID
AND p.ProductID=a.ProductID
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], t.DeskID, t.ProductID, t.Amount
FROM Table1 t
INNER JOIN cte_a a
ON t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date])
WHERE a.[Date]<#EndDate
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], a.DeskID, a.ProductID, a.Amount
FROM cte_a a
WHERE NOT EXISTS(SELECT 1 FROM Table1 t
WHERE t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date]))
AND a.[Date]<#EndDate
)
SELECT [Date], DeskID, SUM(Amount)
FROM cte_a
GROUP BY [Date], DeskID;

Efficient sql subquery on same table based off datetime value

Below I have a simple query to get all the movie ratings for today joining an "event" table and "movie" table.
Select e.*, m.moviename
From Event e, movie m
Where e.eventdate >= DATEADD(day, -1, GETDATE())
and e.moviekey = m.moviekey
order by e.Ratings desc;
Question
In the example above, how would you retrieve the ratings from 1 week ago, and 1 month ago. So the query would return 2 extra columns RatingOneMonthAgo, RatingsOneWeekAgo,etc.
I've looked into subqueries and it's not clicking any help would be appreciated.
Thanks
You could use CTEs to pull this information in (similar to using subqueries).
The following query assumes that you having ratings for every day, and no duplicates (multiple ratings for the same movie on the same day):
WITH cteOneWeekAgo
AS
(
SELECT
moviekey
, Ratings
FROM Event
WHERE CAST(eventdate AS date) = DATEADD(WEEK, -1, CAST(GETDATE() AS date))
)
,
cteOneMonthAgo
AS
(
SELECT
moviekey
, Ratings
FROM Event
WHERE CAST(eventdate AS date) = DATEADD(MONTH, -1, CAST(GETDATE() AS date))
)
SELECT
e.*
, m.moviename
, w.Ratings Ratings_OneWeekAgo
, mth.Ratings Ratings_OneMonthAgo
FROM
Event e
JOIN movie m ON e.moviekey = m.moviekey
LEFT JOIN cteOneWeekAgo w ON e.moviekey = w.moviekey
LEFT JOIN cteOneMonthAgo mth ON e.moviekey = mth.moviekey
WHERE e.eventdate >= DATEADD(DAY, -1, GETDATE())
ORDER BY e.Ratings DESC
I also wrote a more complex query, which will pull in the most recent ratings for the movie before the date you're looking for if ratings for that date don't exist.
WITH cteOneWeekAgo
AS
(
SELECT
moviekey
, Ratings
, eventdate
FROM
(
SELECT
moviekey
, Ratings
, eventdate
, ROW_NUMBER() OVER (PARTITION BY moviekey ORDER BY eventdate DESC) R
FROM Event
WHERE CAST(eventdate AS date) <= DATEADD(WEEK, -1, CAST(GETDATE() AS date))
) Q
WHERE R = 1
)
,
cteOneMonthAgo
AS
(
SELECT
moviekey
, Ratings
, eventdate
FROM
(
SELECT
moviekey
, Ratings
, eventdate
, ROW_NUMBER() OVER (PARTITION BY moviekey ORDER BY eventdate DESC) R
FROM Event
WHERE CAST(eventdate AS date) <= DATEADD(MONTH, -1, CAST(GETDATE() AS date))
) Q
WHERE R = 1
)
SELECT
e.*
, m.moviename
, w.eventdate Ratings_OneWeekAgo_MostRecentDate
, w.Ratings Ratings_OneWeekAgo
, mth.eventdate Ratings_OneMonthAgo_MostRecentDate
, mth.Ratings Ratings_OneMonthAgo
FROM
Event e
JOIN movie m ON e.moviekey = m.moviekey
LEFT JOIN cteOneWeekAgo w ON e.moviekey = w.moviekey
LEFT JOIN cteOneMonthAgo mth ON e.moviekey = mth.moviekey
WHERE e.eventdate >= DATEADD(DAY, -1, GETDATE())
ORDER BY e.Ratings DESC

Display Month Gaps for Each location

I have the following query which takes in the opps and calculates the duration, and revenue for each month. However, for some locations, where there is no data, it is missing some months. Essentially, I would like all months to appear for each of the location and record type. I tried a left outer join on the calendar but that didn't seem to work either.
Here is the query:
;With DateSequence( [Date] ) as
(
Select CAST(#fromdate as DATE) as [Date]
union all
Select CAST(dateadd(day, 1, [Date]) as Date)
from DateSequence
where Date < #todate
)
INSERT INTO CalendarTemp (Date, Day, DayOfWeek, DayOfYear, WeekOfYear, Month, MonthName, Year)
Select
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
;
DELETE FROM CalendarTemp WHERE DayOfWeek IN ('Saturday', 'Sunday');
SELECT
AccountId
,AccountName
,Office
,Stage = (CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
,Id
,Name
,RecordType= (CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,Start_Date
,End_Date
,Probability
,Estimated_Revenue_Won = ISNULL(Amount, 0)
,ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) AS Row
--,Revenue_Per_Day = CAST(ISNULL(Amount/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0),0) as money)
,YEAR(c.Date) as year
,MONTH(c.Date) as Month
,c.MonthName
--, ISNULL(CAST(Sum((Amount)/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0)) as money),0) As RevenuePerMonth
FROM SF_Extracted_Opps o
LEFT OUTER JOIN CalendarTemp c on o.Start_Date <= c.Date AND o.End_Date >= c.Date
WHERE
Start_Date <= #todate AND End_Date >= #fromdate
AND Office IN (#Location)
AND recordtypeid IN ('LAS1')
GROUP BY
AccountId
,AccountName
,Office
,(CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
,Id
,Name
,(CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,Amount
--, CAST(ISNULL(Amount/NULLIF(dbo.CalculateNumberOFWorkDays(Start_Date, End_Date),0),0) as money)
,Start_Date
,End_Date
,Probability
,YEAR(c.Date)
,Month(c.Date)
,c.MonthName
,dbo.CalculateNumberOFWorkDays(Start_Date, End_Date)
ORDER BY Office
, (CASE
WHEN recordtypeid = 'LAS1' THEN 'S'
END)
,(CASE WHEN StageName = 'Closed Won' THEN 'Closed Won'
ELSE 'Open'
END)
, [Start_Date], Month(c.Date), AccountName, Row;
I tried adding another left outer join to this and using this a sub query and the join essentially on the calendar based on the year and month, but that did not seem to work either. Suggestions would be extremely appreciated.
--Date Calendar for each location:
;With DateSequence( [Date], Locatio) as
(
Select CAST(#fromdate as DATE) as [Date], oo.Office as location
union all
Select CAST(dateadd(day, 1, [Date]) as Date), oo.Office as location
from DateSequence dts
join Opportunity_offices oo on 1 = 1
where Date < #todate
)
--select result
INSERT INTO CalendarTemp (Location,Date, Day, DayOfWeek, DayOfYear, WeekOfYear, Month, MonthName, Year)
Select
location,
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
;
you have your LEFT JOIN backwards if you want all records from CalendarTemp and only those that match from SF_Extracted_Opps then you the CalendarTemp should be the table on the LEFT. You can however switch LEFT JOIN to RIGHT JOIN and it should be fixed. The other issue will be your WHERE statement is using columns from your SF_Extracted_Opps table which will just make that an INNER JOIN again.
here is one way to fix.
SELECT
.....
FROM
CalendarTemp c
LEFT JOIN SF_Extracted_Opps o
ON o.Start_Date <= c.Date AND o.End_Date >= c.Date
AND o.Start_Date <= #todate AND End_Date >= #fromdate
AND o.Office IN (#Location)
AND o.recordtypeid IN ('LAS1')
The other issue you might run into is because you remove weekends from your CalendarTemp Table not all dates are represented I would test with the weekends still in and out and see if you get different results.
this line:
AND o.Start_Date <= #todate AND End_Date >= #fromdate
should not be needed either because you are already limiting the dates from the line before and values in your CalendarTempTable
A note about your CalendarDate table you don't have to go back and delete those records simply add the day of week as a WHERE statement on the select that populates that table.
Edit for All Offices you can use a cross join of your offices table with your CalendarTemp table to do this do it in your final query not the cte that builds the calendar. The problem with doing it in the CTE calendar definition is that it is recursive so you would have to do it in both the anchor and the recursive member definition.
SELECT
.....
FROM
CalendarTemp c
CROSS JOIN Opportunity_offices oo
LEFT JOIN SF_Extracted_Opps o
ON o.Start_Date <= c.Date AND o.End_Date >= c.Date
AND o.Start_Date <= #todate AND End_Date >= #fromdate
AND oo.office = o.Office
AND o.recordtypeid IN ('LAS1')

SQL COUNT that includes 0 values

have a query that lists that amount of jobs for each day over a 7 day period.
works fine but it doesn't include 0 results.
what do i need to do get have it include 0 results.
select date_received, count(*)
from calls with (nolock)
where contract = 'BLAH'
and date_received between DATEADD(day,-8,GETDATE()) AND GETDATE()-1
group by date_Received
order by date_received
this query produces results for 6 days, the 7th day has 0 calls, but that day to be included.
If you have a calendar table, you can do:
SELECT A.[Date] date_received,
COUNT(*) N
FROM dbo.Calendar A
LEFT JOIN ( SELECT *
FROM dbo.calls
WHERE contract = 'BLAH') B
ON A.[Date] = B.date_received
WHERE A.[Date] >= DATEADD(DAY,-8,CONVERT(DATE,GETDATE())
AND A.[Date] <= DATEADD(DAY,-1,CONVERT(DATE,GETDATE()))
GROUP BY A.[Date]
If not, you can use a CTE for you calendar table:
;WITH Calendar AS
(
SELECT DATEADD(DAY,-1*number,CONVERT(DATE,GETDATE())) [Date]
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 8
)
SELECT A.[Date] date_received,
COUNT(*) N
FROM Calendar A
LEFT JOIN ( SELECT *
FROM dbo.calls
WHERE contract = 'BLAH') B
ON A.[Date] = B.date_received
GROUP BY A.[Date]

How to output only one max value from this query in SQL?

Yesterday Thomas helped me a lot by providing exactly the query I wanted. And now I need a variant of it, and hopes someone can help me out.
I want it to output only one row, namely a max value - but it has to build on the algorithm in the following query:
WITH Calendar AS (SELECT CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT DATEADD(d, 1, Date) AS Expr1
FROM Calendar AS Calendar_1
WHERE (DATEADD(d, 1, Date) < #EndDate))
SELECT C.Date, C2.Country, COALESCE (SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM Calendar AS C CROSS JOIN
Country AS C2 LEFT OUTER JOIN
Requests AS R ON C.Date BETWEEN R.[Start date] AND R.[End date] AND R.CountryID = C2.CountryID
WHERE (C2.Country = #Country)
GROUP BY C.Date, C2.Country OPTION (MAXRECURSION 0)
The output from above will be like:
Date Country Allocated testers
06/01/2010 Chile 3
06/02/2010 Chile 4
06/03/2010 Chile 0
06/04/2010 Chile 0
06/05/2010 Chile 19
but what I need right now is
Allocated testers
19
that is - only one column - one row - the max value itself... (for the (via parameters (that already exists)) selected period of dates and country)
use order and limit
ORDER BY 'people needed DESC' LIMIT 1
EDITED
as LIMIT is not exist in sql
use ORDER BY and TOP
select TOP 1 .... ORDER BY 'people needed' DESC
WITH Calendar
AS (
SELECT
CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT
DATEADD(d, 1, Date) AS Expr1
FROM
Calendar AS Calendar_1
WHERE
( DATEADD(d, 1, Date) < #EndDate )
)
SELECT TOP 1 *
FROM
(
SELECT
C.Date
,C2.Country
,COALESCE(SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM
Calendar AS C
CROSS JOIN Country AS C2
LEFT OUTER JOIN Requests AS R
ON C.Date BETWEEN R.[Start date] AND R.[End date]
AND R.CountryID = C2.CountryID
WHERE
( C2.Country = #Country )
GROUP BY
C.Date
,C2.Country
OPTION
( MAXRECURSION 0 )
) lst
ORDER BY lst.[Allocated testers] DESC
Full example following the discussion in #Salil answer..
WITH Calendar AS (SELECT CAST(#StartDate AS datetime) AS Date
UNION ALL
SELECT DATEADD(d, 1, Date) AS Expr1
FROM Calendar AS Calendar_1
WHERE (DATEADD(d, 1, Date) < #EndDate))
SELECT TOP 1 C.Date, C2.Country, COALESCE (SUM(R.[Amount of people per day needed]), 0) AS [Allocated testers]
FROM Calendar AS C CROSS JOIN
Country AS C2 LEFT OUTER JOIN
Requests AS R ON C.Date BETWEEN R.[Start date] AND R.[End date] AND R.CountryID = C2.CountryID
WHERE (C2.Country = #Country)
GROUP BY C.Date, C2.Country
ORDER BY 3 DESC
OPTION (MAXRECURSION 0)
the ORDER BY 3 means order by the 3rd field in the SELECT statement.. so if you remove the first two fields, change this accordingly..