SQL Results show up in both queries - sql

I'm trying to return results where people have signed a particular survey. However ''m having issues returning survey answers when they have previously answered Survey 1 in the past and show in both Survey 1 & Survey 2.
How do i ensure survey answers only appear once by selecting the most recent survey results so that they do not show in both Surveys?
The results in italics represent a duplicate record for a store which has answered both surveys but i only want the most recently answered survey to appear. In this instance they should only appear in Survey 1 as it is most recent
CODE
go
use [database]
--Select Outlets that have answers to Survey 1
(select distinct activityanswers.CustomerNumber as Outlet, 'Survey 1' as
'Survey Program', max(answereddate) as 'Last Answered Date'
from dbo.activityanswers
where activityid in (select id from activitys where ActivityGroupId =
'1061293')
group by customernumber
)
--Select Outlets that have answers to Survey 2
(select distinct activityanswers.CustomerNumber as Outlet, 'Survey 2' as
'Survey Program', max(answereddate) as 'Last Answered Date'
from dbo.activityanswers
where activityid in (select id from activitys where ActivityGroupId =
'1061294')
group by customernumber
)
Survey 1 RESULTS
Store Survey AnswerTime
1285939 Survey 1 2018-08-27 10:13:57.000
1348372 Survey 1 2018-08-27 09:21:18.000
2142522 Survey 1 2018-08-27 15:26:29.000
2147380 Survey 1 2018-08-24 22:26:49.000
Survey 2 RESULTS
Store Survey AnswerTime
2147380 Survey 2 2018-08-24 21:58:59.000
2641188 Survey 2 2018-08-27 11:39:31.000

You can get the result with a Single SQL Statement. Try using Row_Number or Rank Function.
Something like this might work
;WITH CTE
AS
(
SELECT
RN =ROW_NUMBER() OVER(PARTITION BY ANS.CustomerNumber ORDER BY ANS.answereddate DESC,
ACT.ActivityGroupId ASC),
ANS.CustomerNumber as Outlet,
CASE ACT.ActivityGroupId
WHEN '1061293' THEN 'Survey 1'
ELSE 'Survey 2' END as 'Survey Program',
ANS.answereddate as 'Last Answered Date'
FROM dbo.ActivityAnswers ANS
INNER JOIN Activitys ACT
ON ANS.activityid = ACT.ID
WHERE ACT.ActivityGroupId IN
(
'1061293',
'1061294'
)
)
SELECT
*
FROM CTE
WHERE RN = 1

I think you should filter after grouping, try something like this:
select a.CustomerNumber as Outlet, a.Last_Answered_Date from (
select CustomerNumber, max(answereddate) as 'Last_Answered_Date'
from activityanswers
group by customernumber) a
join activityanswers b on a.CustomerNumber = b.CustomerNumber and a.
[Last_Answered_Date] = b.answereddate
where b.activityid in (select id from activitys where ActivityGroupId = '1061294')

You can do it by checking whether a particular customer has a newer answer or not (not exists subquery). That way, you can also eliminate the need to group by.
select
CustomerNumber as Outlet,
'Survey 1' as 'Survey Program',
answereddate as 'Last Answered Date'
from dbo.activityanswers a
where activityid in (
select id from activitys where ActivityGroupId = '1061293')
and not exists (
select from dbo.activityanswers b
where b.CustomerNumber = a.CustomerNumber
and b.answereddate > a.answereddate)

Maybe a sub-query would serve your purpose !
SELECT
Outlet
, SurveyProgram
, LastAnsweredDate
FROM (
SELECT
ans.CustomerNumber Outlet
, CASE WHEN ActivityGroupId = '1061293' THEN 'Survey 1' ELSE 'Survey 2' END SurveyProgram
, answereddate LastAnsweredDate
, ROW_NUMBER() OVER(PARTITION BY ans.CustomerNumber ORDER BY answereddate DESC) RN
FROM
activityanswers ans
LEFT JOIN activitys act ON act.ID = ans.activityid AND ActivityGroupId IN('1061293', '1061294')
GROUP BY
ans.CustomerNumber
) D
WHERE
RN = 1
I have replaced the IN() with a LEFT JOIN which is much better approach in your query. Also, from the query it will give you both Survey 1 and 2, so we used ROW_NUMBER to filter them. I ordered them in DESC, so the recent datetime will be at the top. So, getting the first row from each CustomerNumber will give you the recent records. . This would give you more stretching flexibility in the query.

Related

Trying to simplify a SQL query without UNION

I'm very bad at explaining, so let me try to lay out my issue. I have a table that resembles the following:
Source Value User
======== ======= ======
old1 1 Phil
new 2 Phil
old2 3 Phil
new 4 Phil
old1 1 Mike
old2 2 Mike
new 1 Jeff
new 2 Jeff
What I need to do is create a query that gets values for users based on the source and the value. It should follow this rule:
For every user, get the highest value. However, disregard the 'new'
source if either 'old1' or 'old2' exists for that user.
So based on those rules, my query should return the following from this table:
Value User
======= ======
3 Phil
2 Mike
2 Jeff
I've come up with a query that does close to what is asked:
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) MainPriority
WHERE [SourcePriority] = 1
GROUP BY [User]
UNION
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) SecondaryPriority
WHERE [SourcePriority] = 2
GROUP BY [User]
However this returns the following results:
Value User
======= ======
3 Phil
4 Phil
2 Mike
2 Jeff
Obviously that extra value for Phil=4 is not desired. How should I attempt to fix this query? I also understand that this is a pretty convoluted solution and that it can probably be more easily solved by proper use of aggregates, however I'm not too familiar with aggregates yet which resulted in me resorting to a union. Essentially I'm looking for help creating the cleanest-looking solution possible.
Here is the SQL code if anyone wanted to populate the table themselves to give it a try:
CREATE TABLE #UserValues
(
[Source] VARCHAR(10),
[Value] INT,
[User] VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new', 2, 'Phil'),
('old2', 3, 'Phil'),
('new', 4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new', 1, 'Jeff'),
('new', 2, 'Jeff')
You can solve it fairly easily without resorting to window functions. In this case, you need the maximum value where ((not new) OR (there isn't an old1 or old2 entry)).
Here's a query that works correctly with your sample data:
SELECT
MAX(U1.[Value]) as 'Value'
,U1.[User]
FROM
#UserValues U1
WHERE
U1.[Source] <> 'new'
OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]
You can use priorities order by with row_number() :
select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user]
order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc
);
However, if you have only source limited with 3 then you can also do :
. . .
order by row_number() over (partition by [user]
order by (case when source = 'new' then 2 else 1 end), value desc
)
with raw_data
as (
select row_number() over(partition by a.[user] order by a.value desc) as rnk
,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old
,a.*
from uservalues a
)
,curated_data
as(select *
,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
from raw_data rd
where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end
)
select *
from curated_data
where rnk2=1
I am doing the following
raw_data ->First i get rank the values on the basis of max available value per user. Also i get to check if the user has any records which are pegged at old1 or old2 in the source column
curated_data ->i eliminate records which have the highest value(rnk=1) as new if they have cnt_old >0. Also now i rank(rnk2) the records on the highest value available from this result set.
I select the highest available value from curated_data(ie rnk2=1)
I think you should consider setting up an XREF table to define which source is what priority, for a possible more complicated priorisation in the future. I do it with a temp table:
CREATE TABLE #SourcePriority
(
[Source] VARCHAR(10),
[SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1),
('new', 2)
You might also create a View to look up the SourcePriority to the original table. I do it wit a CTE + possible implementation how to look up the top priority with the highest value:
;WITH CTE as (
SELECT s.[SourcePriority], u.[Value], u.[User]
FROM #UserValues as u
INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
SELECT MIN ([SourcePriority]) as [TopPriority], [User]
FROM cte
GROUP BY [User]
) as s
INNER JOIN cte as v
ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]
I think you want:
select top (1) with ties uv.*
from (select uv.*,
sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
from #UserValues uv
) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);

how to self join a table?

After Updating query i got following output with this query
i am writing following query to put two sets of records next to each other.
WITH cte AS
(SELECT SubSubsidaryAccountCode,
sum(Debit) AS debit,
sum(Credit) AS credit,
ROW_NUMBER() over (PARTITION BY SubSubsidaryAccountCode
ORDER BY SubSubsidaryAccountCode) AS RN
FROM TBLLedgerLevel4
WHERE SubSubsidaryAccountCode LIKE '4%'
OR SubSubsidaryAccountCode LIKE '5%'
GROUP BY SubSubsidaryAccountCode ),
cte2 AS
(SELECT *
FROM TBLLevel4)
SELECT a.SubSubsidaryAccountCode ,
c.SubSubsidaryAccountName AS RevenuName ,
sum(a.Debit) AS debit ,
b.SubSubsidaryAccountCode AS SubsidaryAccount2 ,
d.SubSubsidaryAccountName AS ExpenseName ,
sum(b.Credit) AS credit
FROM cte a
JOIN cte2 c ON a.SubSubsidaryAccountCode = c.SubSubsidaryAccountCode
FULL JOIN cte b ON a.RN = b.RN
JOIN cte2 d ON b.SubSubsidaryAccountCode=d.SubSubsidaryAccountCode
WHERE a.SubSubsidaryAccountCode LIKE '4%'
AND b.SubSubsidaryAccountCode LIKE '5%'
GROUP BY a.SubSubsidaryAccountCode,
b.SubSubsidaryAccountCode,
c.SubSubsidaryAccountName,
d.SubSubsidaryAccountName
Output:
SubSubsidaryAccountCode RevenuName debit SubsidaryAccount2 ExpenseName credit
4-106-1001-10026 Cash Sale 52889 5-105-1005-10011 Rf Battles 18091289
4-108-1012-10037 New Sale1 1000 5-105-1005-10011 Rf Battles 18091289
The above output contains records against two Accounting codes that starts with 4 and 5 .Now the records against accounting code that are starting with accounting code like 4 are populated and are according to desired results but the 2nd row of record starting with accounting code 5 does not have any value and must have null value in it.Please help me out to solve this problem.
Your sample makes it seem like it's arbitrary which ones are populated and which are NULL, like you just want to put two sets of records next to each other. You could do that by adding a ROW_NUMBER() and using that in a LEFT JOIN:
;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY MainAccountID ORDER BY LedgerDate) AS RN
FROM TBLLedgerLevel1
)
SELECT a.LedgerDate
, a.MainAccountId
, a.VoucherCode
, a.Debit
, b.LedgerDate AS Date2
, b.MainAccountId AS MainAccount2
, b.VoucherCode AS VoucherCode2
, b.Credit
FROM cte a
LEFT JOIN cte b
ON a.RN = b.RN
and b.MainAccountId='5'
WHERE a.MainAccountId='4'
If there could be more MainAccountId='5' records than MainAccountId='4' you'd probably want a FULL JOIN and could use COALESCE() to choose which field to display. Also, making use of aliases cleans up code significantly in my opinion.
Update: Not sure exactly on this, but to add the name you'll need to add a JOIN to TBLLevel1, something like:
;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY MainAccountID ORDER BY LedgerDate) AS RN
FROM TBLLedgerLevel1
)
SELECT a.LedgerDate
, a.MainAccountId
, a.VoucherCode
, a.Debit
, b.LedgerDate AS Date2
, b.MainAccountId AS MainAccount2
, b.VoucherCode AS VoucherCode2
, b.Credit
, c.MainAccountName
FROM cte a
JOIN TBLLevel1 c
ON a.MainAccountID = c.ID
LEFT JOIN cte b
ON a.RN = b.RN
and b.MainAccountId='5'
WHERE a.MainAccountId='4'
If there's more than 1 record per MainAccountId in the TBLLevel1 table you'll need to add criteria to the JOIN to make sure only the proper value is included.

SQL: multiple counts from same table

I am having a real problem trying to get a query with the data I need. I have tried a few methods without success. I can get the data with 4 separate queries, just can't get hem into 1 query. All data comes from 1 table. I will list as much info as I can.
My data looks like this. I have a customerID and 3 columns that record who has worked on the record for that customer as well as the assigned acct manager
RecID_Customer___CreatedBy____LastUser____AcctMan
1-------1374----------Bob Jones--------Mary Willis------Bob Jones
2-------1375----------Mary Willis------Bob Jones--------Bob Jones
3-------1376----------Jay Scott--------Mary Willis-------Mary Willis
4-------1377----------Jay Scott--------Mary Willis------Jay Scott
5-------1378----------Bob Jones--------Jay Scott--------Jay Scott
I want the query to return the following data. See below for a description of how each is obtained.
Employee___Created__Modified__Mod Own__Created Own
Bob Jones--------2-----------1---------------1----------------1
Mary Willis------1-----------2---------------1----------------0
Jay Scott--------2-----------1---------------1----------------1
Created = Counts the number of records created by each Employee
Modified = Number of records where the Employee is listed as Last User
(except where they created the record)
Mod Own = Number of records for each where the LastUser = Acctman
(account manager)
Created Own = Number of Records created by the employee where they are
the account manager for that customer
I can get each of these from a query, just need to somehow combine them:
Select CreatedBy, COUNT(CreatedBy) as Created
FROM [dbo].[Cust_REc] GROUP By CreatedBy
Select LastUser, COUNT(LastUser) as Modified
FROM [dbo].[Cust_REc] Where LastUser != CreatedBy GROUP By LastUser
Select AcctMan, COUNT(AcctMan) as CreatePort
FROM [dbo].[Cust_REc] Where AcctMan = CreatedBy GROUP By AcctMan
Select AcctMan, COUNT(AcctMan) as ModPort
FROM [dbo].[Cust_REc] Where AcctMan = LastUser AND NOT AcctMan = CreatedBy GROUP By AcctMan
Can someone see a way to do this? I may have to join the table to itself, but my attempts have not given me the correct data.
The following will give you the results you're looking for.
select
e.employee,
create_count=(select count(*) from customers c where c.createdby=e.employee),
mod_count=(select count(*) from customers c where c.lastmodifiedby=e.employee),
create_own_count=(select count(*) from customers c where c.createdby=e.employee and c.acctman=e.employee),
mod_own_count=(select count(*) from customers c where c.lastmodifiedby=e.employee and c.acctman=e.employee)
from (
select employee=createdby from customers
union
select employee=lastmodifiedby from customers
union
select employee=acctman from customers
) e
Note: there are other approaches that are more efficient than this but potentially far more complex as well. Specifically, I would bet there is a master Employee table somewhere that would prevent you from having to do the inline view just to get the list of names.
this seems pretty straight forward. Try this:
select a.employee,b.created,c.modified ....
from (select distinct created_by from data) as a
inner join
(select created_by,count(*) as created from data group by created_by) as b
on a.employee = b.created_by)
inner join ....
This highly inefficient query may be a rough start to what you are looking for. Once you validate the data then there are things you can do to tidy it up and make it more efficient.
Also, I don't think you need the DISTINCT on the UNION part because the UNION will return DISTINCT values unless UNION ALL is specified.
SELECT
Employees.EmployeeID,
Created =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.CreatedBy=Employees.EmployeeID),
Mopdified =(SELECT COUNT(*) FROM Cust_REc WHERE Cust_REc.LastUser=Employees.EmployeeID AND Cust_REc.CreateBy<>Employees.EmployeeID),
ModOwn =
CASE WHEN NOT Empoyees.IsManager THEN NULL ELSE
(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID)
END,
CreatedOwn=(SELECT COUNT(*) FROM Cust_REc WHERE AcctMan=Employees.EmployeeID AND CReatedBy=Employees.EMployeeID)
FROM
(
SELECT
EmployeeID,
IsManager=CASE WHEN EXISTS(SELECT AcctMan FROM CustRec WHERE AcctMan=EmployeeID)
FROM
(
SELECT DISTINCT
EmployeeID
FROM
(
SELECT EmployeeID=CreatedBy FROM Cust_Rec
UNION
SELECT EmployeeID=LastUser FROM Cust_Rec
UNION
SELECT EmployeeID=AcctMan FROM Cust_Rec
)AS Z
)AS Y
)
AS Employees
I had the same issue with the Modified column. All the other columns worked okay. DCR example would work well with the join on an employees table if you have it.
SELECT CreatedBy AS [Employee],
COUNT(CreatedBy) AS [Created],
--Couldn't get modified to pull the right results
SUM(CASE WHEN LastUser = AcctMan THEN 1 ELSE 0 END) [Mod Own],
SUM(CASE WHEN CreatedBy = AcctMan THEN 1 ELSE 0 END) [Created Own]
FROM Cust_Rec
GROUP BY CreatedBy

Count Response once in 30 days SQL

If I have a customer respond to the same survey in 30 days more than once, I only want to count it once. Can someone show me code to do that please?
create table #Something
(
CustID Char(10),
SurveyId char(5),
ResponseDate datetime
)
insert #Something
select 'Cust1', '100', '5/6/13' union all
select 'Cust1', '100', '5/13/13' union all
select 'Cust2', '100', '4/20/13' union all
select 'Cust2', '100', '5/22/13'
select distinct custid, SurveyId, Count(custid) as CountResponse from #Something
group by CustID, SurveyId
The above code only gives me the total count of Response, not sure how to code to count only once per 30 day period.
The output I'm looking for should be like this:
CustomerID SurveyId CountResponse
Cust1 100 1
Cust2 100 2
Going on the theory that you want your periods calculated as 30 days from the first time a survey is submitted, here is a (gross) solution.
declare #Something table
(
CustID Char(10),
SurveyId char(5),
ResponseDate datetime
)
insert #Something
select 'Cust1', '100', '5/6/13' union all
select 'Cust1', '100', '5/13/13' union all
select 'Cust1', '100', '7/13/13' union all
select 'Cust2', '100', '4/20/13' union all
select 'Cust2', '100', '5/22/13' union all
select 'Cust2', '100', '7/20/13' union all
select 'Cust2', '100', '7/24/13' union all
select 'Cust2', '100', '9/28/13'
--SELECT CustID,SurveyId,COUNT(*) FROM (
select a.CustID,a.SurveyId,b.ResponseStart,--CONVERT(int,a.ResponseDate-b.ResponseStart),
CASE
WHEN CONVERT(int,a.ResponseDate-b.ResponseStart) > 30
THEN ((CONVERT(int,a.ResponseDate-b.ResponseStart))-(CONVERT(int,a.ResponseDate-b.ResponseStart) % 30))/30+1
ELSE 1
END CustomPeriod -- defines periods 30 days out from first entry of survey
from #Something a
inner join
(select CustID,SurveyId,MIN(ResponseDate) ResponseStart
from #Something
group by CustID,SurveyId) b
on a.SurveyId=b.SurveyId
and a.CustID=b.CustID
group by a.CustID,a.SurveyId,b.ResponseStart,
CASE
WHEN CONVERT(int,a.ResponseDate-b.ResponseStart) > 30
THEN ((CONVERT(int,a.ResponseDate-b.ResponseStart))-(CONVERT(int,a.ResponseDate-b.ResponseStart) % 30))/30+1
ELSE 1
END
--) x GROUP BY CustID,SurveyId
At the very least you'd probably want to make the CASE statement a function so it reads a bit cleaner. Better would be defining explicit windows in a separate table. This may not be feasible if you want to avoid situations like surveys returned at the end of period one followed by another in period two a couple days later.
You should consider handling this on input if possible. For example, if you are identifying a customer in an online survey, reject attempts to fill out a survey. Or if someone is mailing these in, make the data entry person reject it if one has come within 30 days.
Or, along the same lines as "wild and crazy", add a bit and an INSERT trigger. Only turn the bit on if no surveys of that type for that customer found within the time period.
Overall, phrasing the issue a little more completely would be helpful. However I do appreciate the actual coded example.
I'm not a SQL Server guy, but in Oacle if you subtract integer values from a 'date', you're effectively subtracting "days," so something like this could work:
SELECT custid, surveyid
FROM Something a
WHERE NOT EXISTS (
SELECT 1
FROM Something b
WHERE a.custid = b.custid
AND a.surveyid = b.surveyid
AND b.responseDate between a.responseDate AND a.responseDate - 30
);
To get your counts (if I udnerstand what you're asking for):
-- Count of times custID returned surveyID, not counting same
-- survey within 30 day period.
SELECT custid, surveyid, count(*) countResponse
FROM Something a
WHERE NOT EXISTS (
SELECT 1
FROM Something b
WHERE a.custid = b.custid
AND a.surveyid = b.surveyid
AND b.responseDate between a.responseDate AND a.responseDate - 30
)
GROUP BY custid, surveyid
UPDATE: Per the case raised below, this actually wouldn't quite work. What you should probably do is iterate through your something table and insert the rows for the surveys you want to keep in a results table, then compare against the results table to see if there's already been a survey received in the last 30 days you want considered. I could show you how to do something like this in oracle PL/SQL, but I don't know the syntax off hand for SQL server. Maybe someone else who knows sql server wants to steal this strategy to code up an answer for you, or maybe this is enough for you to go on.
Call me wild and crazy, but I would solve this problem by storing more state with each survey. The approach I would take is to add a bit type column that indicates whether a particular survey should be counted (i.e., a Countable column). This solves the tracking of state problem inherent in solving this relationally.
I would set values in Countable to 1 upon insertion, if no survey with the same CustID/SurveyId can be found in the preceding 30 days with a Countable set to 1. I would set it to 0, otherwise.
Then the problem becomes trivially solvable. Just group by CustID/SurveyId and sum up the values in the Countable column.
One caveat of this approach is that it imposes that surveys must be added in chronological order and cannot be deleted without a recalculation of Countable values.
Here's one way to handle it I believe. I tested quickly, and it worked on the small sample of records so I'm hopeful it will help you out. Best of luck.
SELECT s.CustID, COUNT(s.SurveyID) AS SurveyCount
FROM #something s
INNER JOIN (SELECT CustID, SurveyId, ResponseDate
FROM (SELECT #Something.*,
ROW_NUMBER() OVER (PARTITION BY custid ORDER BY ResponseDate ASC) AS RN
FROM #something) AS t
WHERE RN = 1 ) f ON s.CustID = f.CustID
WHERE s.ResponseDate BETWEEN f.ResponseDate AND f.ResponseDate+30
GROUP BY s.CustID
HAVING COUNT(s.SurveyID) > 1
Your question is ambiguous, which may be the source of your difficulty.
insert #Something values
('Cust3', '100', '1/1/13'),
('Cust3', '100', '1/20/13'),
('Cust3', '100', '2/10/13')
Should the count for Cust3 be 1 or 2? Is the '2/10/13' response invalid because it was less than 30 days after the '1/20/13' response? Or is the '2/10/13' response valid because the '1/20/13' is invalidated by the '1/1/13' response and therefore more than 30 days after the previous valid response?
The code below is one approach which yields your example output. However, if you add a select 'Cust1', '100', '4/20/13', the result will still be Cust1 100 1 because they are all within 30 days of each prior survey response and so only the first one would be counted. Is this the desired behavior?
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM #SurveysTaken
WHERE (NOT EXISTS
(SELECT 1
FROM #SurveysTaken AS PriorSurveys
WHERE (CustID = #SurveysTaken.CustID)
AND (SurveyId = #SurveysTaken.SurveyId)
AND (ResponseDate >= DATEADD(d, - 30, #SurveysTaken.ResponseDate))
AND (ResponseDate < #SurveysTaken.ResponseDate)))
GROUP BY CustID, SurveyID
Alternatively, you could break the year into arbitrary 30 day periods, resetting with each new year.
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, YEAR(ResponseDate) AS RepsonseYear,
DATEPART(DAYOFYEAR, ResponseDate) / 30 AS ThirtyDayPeriod
FROM #SurveysTaken) AS SurveysByPeriod
GROUP BY CustID, SurveyID
You could also just go by month.
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, YEAR(ResponseDate) AS ResponseYear,
MONTH(ResponseDate) AS ResponseMonth
FROM #SurveysTaken) AS SurveysByMonth
GROUP BY CustID, SurveyID
You could use 30 day periods from an arbitrary epoch date. (Perhaps by pulling the date the survey was first created from another query?)
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, DATEDIFF(D, '1/1/2013', ResponseDate) / 30 AS ThirtyDayPeriod
FROM #SurveysTaken) AS SurveysByPeriod
GROUP BY CustID, SurveyID
One final variation on arbitrary thirty periods is to base them on the first time the customer ever responded to the survey in question.
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, DATEDIFF(DAY,
(SELECT MIN(ResponseDate)
FROM #SurveysTaken AS FirstSurvey
WHERE (CustID = #SurveysTaken.CustID)
AND (SurveyId = #SurveysTaken.SurveyId)), ResponseDate) / 30 AS ThirtyDayPeriod
FROM #SurveysTaken) AS SurveysByPeriod
GROUP BY CustID, SurveyID
There is one issue that you run into with the epoch/period trick which is that the counted surveys occur only once per period but aren't necessarily 30 days apart.

Seemingly Simple Query in Pl Sql

I have a table "defects" in the following format:
id status stat_date line div area
1 Open 09/21/09 F A cube
1 closed 01/01/10 F A cube
2 Open 10/23/09 B C Back
3 Open 11/08/09 S B Front
3 closed 12/12/09 S B Front
My problem is that I want to write a query that just extracts the "Open" defects. If I write a query to simply extract all open defects, then I get the wrong result because there are some defects,
that have 2 records associated with it. For example, with the query that I wrote I would get defect id#s 1 and 3 in my result even though they are closed. I hope I have explained my problem well. Thank you.
Use:
SELECT t.*
FROM DEFECTS t
JOIN (SELECT d.id,
MAX(d.stat_date) 'msd'
FROM DEFECTS d
GROUP BY d.id) x ON x.id = t.id
AND x.msd = t.stat_date
WHERE t.status != 'closed'
The join is getting the most recent date for each id value.
Join back to the original table on based on the id and date in order to get only the most recent rows.
Filter out those rows with the closed status to know the ones that are currently open
So you want to get the most recent row per id and of those, only select those that are open. This is a variation of the common greatest-n-per-group problem.
I would do it this way:
SELECT d1.*
FROM defects d1
LEFT OUTER JOIN defects d2
ON (d1.id = d2.id AND d1.stat_date < d2.stat_date)
WHERE d2.id IS NULL
AND d1.status = 'Open';
Select *
from defects d
where status = 'Open'
and not exists (
select 1 from defects d1
where d1.status = 'closed'
and d1.id = d.id
and d1.stat_date > d.stat_date
)
This should get what you want. I wouldn't have a record for open and closing a defect, rather just a single record to track a single defect. But that may not be something you can change easily.
SELECT id FROM defects
WHERE status = 'OPEN' AND id NOT IN
(SELECT id FROM defects WHERE status = 'closed')
This query handles multiple opens/closes/opens, and only does one pass through the data (i.e. no self-joins):
SELECT * FROM
(SELECT DISTINCT
id
,FIRST_VALUE(status)
OVER (PARTITION BY id
ORDER BY stat_date desc)
as last_status
,FIRST_VALUE(stat_date)
over (PARTITION BY id
ORDER BY stat_date desc)
AS last_stat_date
,line
,div
,area
FROM defects)
WHERE last_status = 'Open';