Count Response once in 30 days SQL - sql

If I have a customer respond to the same survey in 30 days more than once, I only want to count it once. Can someone show me code to do that please?
create table #Something
(
CustID Char(10),
SurveyId char(5),
ResponseDate datetime
)
insert #Something
select 'Cust1', '100', '5/6/13' union all
select 'Cust1', '100', '5/13/13' union all
select 'Cust2', '100', '4/20/13' union all
select 'Cust2', '100', '5/22/13'
select distinct custid, SurveyId, Count(custid) as CountResponse from #Something
group by CustID, SurveyId
The above code only gives me the total count of Response, not sure how to code to count only once per 30 day period.
The output I'm looking for should be like this:
CustomerID SurveyId CountResponse
Cust1 100 1
Cust2 100 2

Going on the theory that you want your periods calculated as 30 days from the first time a survey is submitted, here is a (gross) solution.
declare #Something table
(
CustID Char(10),
SurveyId char(5),
ResponseDate datetime
)
insert #Something
select 'Cust1', '100', '5/6/13' union all
select 'Cust1', '100', '5/13/13' union all
select 'Cust1', '100', '7/13/13' union all
select 'Cust2', '100', '4/20/13' union all
select 'Cust2', '100', '5/22/13' union all
select 'Cust2', '100', '7/20/13' union all
select 'Cust2', '100', '7/24/13' union all
select 'Cust2', '100', '9/28/13'
--SELECT CustID,SurveyId,COUNT(*) FROM (
select a.CustID,a.SurveyId,b.ResponseStart,--CONVERT(int,a.ResponseDate-b.ResponseStart),
CASE
WHEN CONVERT(int,a.ResponseDate-b.ResponseStart) > 30
THEN ((CONVERT(int,a.ResponseDate-b.ResponseStart))-(CONVERT(int,a.ResponseDate-b.ResponseStart) % 30))/30+1
ELSE 1
END CustomPeriod -- defines periods 30 days out from first entry of survey
from #Something a
inner join
(select CustID,SurveyId,MIN(ResponseDate) ResponseStart
from #Something
group by CustID,SurveyId) b
on a.SurveyId=b.SurveyId
and a.CustID=b.CustID
group by a.CustID,a.SurveyId,b.ResponseStart,
CASE
WHEN CONVERT(int,a.ResponseDate-b.ResponseStart) > 30
THEN ((CONVERT(int,a.ResponseDate-b.ResponseStart))-(CONVERT(int,a.ResponseDate-b.ResponseStart) % 30))/30+1
ELSE 1
END
--) x GROUP BY CustID,SurveyId
At the very least you'd probably want to make the CASE statement a function so it reads a bit cleaner. Better would be defining explicit windows in a separate table. This may not be feasible if you want to avoid situations like surveys returned at the end of period one followed by another in period two a couple days later.
You should consider handling this on input if possible. For example, if you are identifying a customer in an online survey, reject attempts to fill out a survey. Or if someone is mailing these in, make the data entry person reject it if one has come within 30 days.
Or, along the same lines as "wild and crazy", add a bit and an INSERT trigger. Only turn the bit on if no surveys of that type for that customer found within the time period.
Overall, phrasing the issue a little more completely would be helpful. However I do appreciate the actual coded example.

I'm not a SQL Server guy, but in Oacle if you subtract integer values from a 'date', you're effectively subtracting "days," so something like this could work:
SELECT custid, surveyid
FROM Something a
WHERE NOT EXISTS (
SELECT 1
FROM Something b
WHERE a.custid = b.custid
AND a.surveyid = b.surveyid
AND b.responseDate between a.responseDate AND a.responseDate - 30
);
To get your counts (if I udnerstand what you're asking for):
-- Count of times custID returned surveyID, not counting same
-- survey within 30 day period.
SELECT custid, surveyid, count(*) countResponse
FROM Something a
WHERE NOT EXISTS (
SELECT 1
FROM Something b
WHERE a.custid = b.custid
AND a.surveyid = b.surveyid
AND b.responseDate between a.responseDate AND a.responseDate - 30
)
GROUP BY custid, surveyid
UPDATE: Per the case raised below, this actually wouldn't quite work. What you should probably do is iterate through your something table and insert the rows for the surveys you want to keep in a results table, then compare against the results table to see if there's already been a survey received in the last 30 days you want considered. I could show you how to do something like this in oracle PL/SQL, but I don't know the syntax off hand for SQL server. Maybe someone else who knows sql server wants to steal this strategy to code up an answer for you, or maybe this is enough for you to go on.

Call me wild and crazy, but I would solve this problem by storing more state with each survey. The approach I would take is to add a bit type column that indicates whether a particular survey should be counted (i.e., a Countable column). This solves the tracking of state problem inherent in solving this relationally.
I would set values in Countable to 1 upon insertion, if no survey with the same CustID/SurveyId can be found in the preceding 30 days with a Countable set to 1. I would set it to 0, otherwise.
Then the problem becomes trivially solvable. Just group by CustID/SurveyId and sum up the values in the Countable column.
One caveat of this approach is that it imposes that surveys must be added in chronological order and cannot be deleted without a recalculation of Countable values.

Here's one way to handle it I believe. I tested quickly, and it worked on the small sample of records so I'm hopeful it will help you out. Best of luck.
SELECT s.CustID, COUNT(s.SurveyID) AS SurveyCount
FROM #something s
INNER JOIN (SELECT CustID, SurveyId, ResponseDate
FROM (SELECT #Something.*,
ROW_NUMBER() OVER (PARTITION BY custid ORDER BY ResponseDate ASC) AS RN
FROM #something) AS t
WHERE RN = 1 ) f ON s.CustID = f.CustID
WHERE s.ResponseDate BETWEEN f.ResponseDate AND f.ResponseDate+30
GROUP BY s.CustID
HAVING COUNT(s.SurveyID) > 1

Your question is ambiguous, which may be the source of your difficulty.
insert #Something values
('Cust3', '100', '1/1/13'),
('Cust3', '100', '1/20/13'),
('Cust3', '100', '2/10/13')
Should the count for Cust3 be 1 or 2? Is the '2/10/13' response invalid because it was less than 30 days after the '1/20/13' response? Or is the '2/10/13' response valid because the '1/20/13' is invalidated by the '1/1/13' response and therefore more than 30 days after the previous valid response?

The code below is one approach which yields your example output. However, if you add a select 'Cust1', '100', '4/20/13', the result will still be Cust1 100 1 because they are all within 30 days of each prior survey response and so only the first one would be counted. Is this the desired behavior?
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM #SurveysTaken
WHERE (NOT EXISTS
(SELECT 1
FROM #SurveysTaken AS PriorSurveys
WHERE (CustID = #SurveysTaken.CustID)
AND (SurveyId = #SurveysTaken.SurveyId)
AND (ResponseDate >= DATEADD(d, - 30, #SurveysTaken.ResponseDate))
AND (ResponseDate < #SurveysTaken.ResponseDate)))
GROUP BY CustID, SurveyID
Alternatively, you could break the year into arbitrary 30 day periods, resetting with each new year.
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, YEAR(ResponseDate) AS RepsonseYear,
DATEPART(DAYOFYEAR, ResponseDate) / 30 AS ThirtyDayPeriod
FROM #SurveysTaken) AS SurveysByPeriod
GROUP BY CustID, SurveyID
You could also just go by month.
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, YEAR(ResponseDate) AS ResponseYear,
MONTH(ResponseDate) AS ResponseMonth
FROM #SurveysTaken) AS SurveysByMonth
GROUP BY CustID, SurveyID
You could use 30 day periods from an arbitrary epoch date. (Perhaps by pulling the date the survey was first created from another query?)
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, DATEDIFF(D, '1/1/2013', ResponseDate) / 30 AS ThirtyDayPeriod
FROM #SurveysTaken) AS SurveysByPeriod
GROUP BY CustID, SurveyID
One final variation on arbitrary thirty periods is to base them on the first time the customer ever responded to the survey in question.
SELECT CustID, SurveyID, COUNT(*) AS CountResponse
FROM (SELECT DISTINCT CustID, SurveyID, DATEDIFF(DAY,
(SELECT MIN(ResponseDate)
FROM #SurveysTaken AS FirstSurvey
WHERE (CustID = #SurveysTaken.CustID)
AND (SurveyId = #SurveysTaken.SurveyId)), ResponseDate) / 30 AS ThirtyDayPeriod
FROM #SurveysTaken) AS SurveysByPeriod
GROUP BY CustID, SurveyID
There is one issue that you run into with the epoch/period trick which is that the counted surveys occur only once per period but aren't necessarily 30 days apart.

Related

count the number of times a combination of values occurs

Dataset looking at the types of crime for a given city.
Incident ID
Incident Code
Incident Category
Incident Subcategory
Incident Description
618691
4134
Assault
Simple Assault
Battery
618691
15300
Offences Against The Family And Children
Other
Hate Crime (secondary only)
618701
7053
Vehicle Impounded
Vehicle Impounded
Vehicle, Impounded
618701
65010
Traffic Violation Arrest
Traffic Violation Arrest
Traffic Violation Arrest
618701
65050
Other Miscellaneous
Other
Driving While Under The Influence Of Alcohol
626010
5043
Burglary
Burglary - Residential
Burglary, Residence, Unlawful Entry
626010
6381
Larceny Theft
Larceny Theft - Other
Embezzlement from Dependent or Elder Adult by Caretaker
626010
7041
Recovered Vehicle
Recovered Vehicle
Vehicle, Recovered, Auto
626010
16650
Drug Offense
Drug Violation
Methamphetamine Offense
Each IncidentID has 2, 3, or 4 Incident Codes associated with it.
I want to be able to count the number of times each combination of 2, 3, or 4 Incident Codes appears in the entire dataset.
For example:
Incident Codes 4134, 15300: x amount of times
Incident Codes 7053, 65010, 65050: x amount of times
Incident Codes 5043, 6381, 7041, 16650: x amount of times
I apologize if I've given a poor explanation - this is my first post on SO and quite frankly I don't know how to best communicate this question.
I don't know what SQL code to run to get my answer. The closest I've come to finding an answer is this post, Select combination of two columns, and count occurrences of this combination, but it already has the data separated into two columns, which my data is not there.
My thought is to split the additional codes into other columns, but perhaps there is a way to avoid doing that by having the code run the calculation for me without it.
I appreciate any and all input you may be able to give!
Let's suppose your table is named "TableX". I think this query should be near to what you need:
Select T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode, Count(1) AS AmountOfTimes
From TableX T1
Join TableX T2 ON T2.IncidentID = T1.IncidentID AND
T2.IncidentCode <> T1.IncidentCode
Left Join TableX T3 ON T3.IncidentID = T1.IncidentID AND
T3.IncidentCode <> T1.IncidentCode AND
T3.IncidentCode <> T2.IncidentCode
Left Join TableX T4 ON T4.IncidentID = T1.IncidentID AND
T4.IncidentCode <> T1.IncidentCode AND
T4.IncidentCode <> T2.IncidentCode AND
T4.IncidentCode <> T3.IncidentCode
Group By T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode
You would probably be best to try and NOT get all 3 parts in one query and here is why. Lets say for example that one officer enters their data as codes 1, 2, 3. Another enters codes as 3, 1, 2, and yet another enters as 2, 3, 1. They are all the same "set" of codes just in different order. If you rely on just being the first being the same, you would be getting 3 different rows showing the same thing each with 1 count.
You would be better served by running 3 distinct queries with a WHERE and HAVING clause based on just the codes you are interested in the "set". Something simple like
select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2
This will return all incidents that have BOTH parts, even if the incident was associated with any 3rd and/or 4th additional codes in a given incident. Having the total records IS your count.
So, now, take your codes of interest ex: 1 & 2, and you have the possibility of 2 more incident codes per incident, and you add an additional 30+ combinations of codes 3 & 4 into the mix. If you dont care about the others that may be "extra", it does not screw up your count on the precise piece(s) you are looking for.
Then, all you have to do to get your other "what if" scenario counts is change your IN clause once and the having to match the count. Since you are only filtering based on the specific codes in question, you only want those that have the same count regardless of extra incident codes per example stated.
YT.IncidentCode in ( 7053, 65010, 65050 )
group by
YT.IncidentID
having
count(*) = 3
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
group by
YT.IncidentID
having
count(*) = 4
Now, if you only really care about the final count of each respectively, just wrap that up one more to get the count of rows returned such as
select
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
Then, if you wanted to do this on some time period basis such as you have a given date of the incident, and you wanted to keep running the same query / counts, you could expand and do something like this by doing a UNION to each query.
select
'Assault and Offenses against Family and Children' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
UNION
select
'Vehicle Impound, Traffic Arrest, Other Misc' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 7053, 65010, 65050 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 3 ) PreQualified
UNION
select
'Burglary, Theft, Drugs and Vehicle Recovery' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 4 ) PreQualified
Notice each query in the UNION returns the same number, and order of columns. So it will just return a list (in this case) of 3 rows with a description and count per category regardless of the physical order the incident codes were entered, even IF they were entered in the 3rd and 4th when only looking for 2 code possibilities.
Sometimes a generic query (as in the left-join sample) is ok, and nothing wrong with it, but ask yourself the flexibility and do you want to drill into each permutation just to get your final result numbers.

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

sql count/sum the number of calls until a specific date in another column

I have data that shows the customer calls. I have columns for customer number, phone number(1 customer can have many), date record for each voice call and duration of a call. Table looks lie below example.
CusID | PhoneNum | Date | Duration
20111 43576233 20.01.2016-14:00 00:10:12
20111 44498228 14.01.2016-15:30 00:05:12
20112 43898983 14.01.2016-15:30
What I want is to count the number of call attempts for each number before It is answered(Duration is > 0). So that I can estimate how many time I should call on average to reach a customer or phone number. It should basically count any column per phone number before min(Date) where duration is >0.
SELECT Phone, Min(Date) FROM XX WHERE Duration IS NOT NULL GROUP BY Phone --
I think This should give me the time limit until when I should count the number of calls. I could not figure out how to finish the rest of the job
EDIT- I will add an example
And the result should only count row number 5 since it is the call before the customer is reached for the first time. So resulted table should be like :
Your first step is valid:
SELECT
CusID
,PhoneNum
,MIN(Date) AS MinDate
FROM XX
WHERE Duration IS NOT NULL
GROUP BY CusID, PhoneNum
This gives you one row per PhoneNum with the date of the first successful call.
Now join this to original table and leave only those rows that have a prior date (per PhoneNum). Group it by PhoneNum again and count. The join should be LEFT JOIN to have a row with zero count for numbers that were answered on the first attempt.
WITH
CTE
AS
(
SELECT
CusID
,PhoneNum
,MIN(Date) AS MinDate
FROM XX
WHERE Duration IS NOT NULL
GROUP BY CusID, PhoneNum
)
SELECT
CusID
,PhoneNum
,COUNT(XX.PhoneNum) AS Count
FROM
CTE
LEFT JOIN XX
ON XX.PhoneNum = CTE.PhoneNum
AND XX.Date < CTE.MinDate
GROUP BY CusID, PhoneNum
;
If a number was never answered, it will not be included in the result set at all.
Please try this query:
SELECT phonecalls.CusID, COUNT(0) AS failedcalls, phonenumber, success.firstsuccess FROM phonecalls,
(SELECT min(Date) AS firstsuccess, CusID, phonenumber FROM phonecalls WHERE Duration IS NOT NULL GROUP BY CusID, phonenumber) success
WHERE phonecalls.CusID = success.CusID AND phonecalls.phonenumber = success.phonenumber AND phonecalls.Date < success.firstsuccess
GROUP BY phonecalls.CusID, phonecalls.phonenumber, success.firstsuccess;
I've not tested it...
Note: users which have not established a successfull call are not listed. Is this ok, or do you need them listed as well? If so, you need to "left join":
SELECT phonecalls.CusID, COUNT(0) AS failedcalls, phonenumber, success.firstsuccess FROM phonecalls LEFT JOIN
(SELECT min(Date) AS firstsuccess, CusID, phonenumber FROM phonecalls WHERE Duration IS NOT NULL GROUP BY CusID, phonenumber) success ON
phonecalls.CusID = success.CusID AND phonecalls.phonenumber = success.phonenumber AND phonecalls.Date < success.firstsuccess
GROUP BY phonecalls.CusID, phonecalls.phonenumber, success.firstsuccess;
In SQL Server 2012+, you can use the following logic:
Assign the number of "unanswered" calls to each row in the data. This uses conditional aggregation with a window function.
Then, take the maximum of the count for answered calls for each user.
Count the number of answered calls.
The ratio is the average.
This ignores strings of unanswered calls not followed by an answered call.
The resulting query:
select phone, max(cume_unanswered), count(*) as num_answered,
max(cume_unanswered) * 1.0 / count(*) as ratio
from (select t.*,
sum(case when duration is null then 1 else 0 end) over (partition by phone order by date) as cume_unanswered
from t
) t
where duration is not null
group by phone;

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

Finding the number of concurrent days two events happen over the course of time using a calendar table

I have a table with a structure
(rx)
clmID int
patid int
drugclass char(3)
drugName char(25)
fillDate date
scriptEndDate date
strength int
And a query
;with PatientDrugList(patid, filldate,scriptEndDate,drugClass,strength)
as
(
select rx.patid,rx.fillDate,rx.scriptEndDate,rx.drugClass,rx.strength
from rx
)
,
DrugList(drugName)
as
(
select x.drugClass
from (values('h3a'),('h6h'))
as x(drugClass)
where x.drugClass is not null
)
SELECT PD.patid, C.calendarDate AS overlap_date
FROM PatientDrugList AS PD, Calendar AS C
WHERE drugClass IN ('h3a','h6h')
AND calendardate BETWEEN filldate AND scriptenddate
GROUP BY PD.patid, C.CalendarDate
HAVING COUNT(DISTINCT drugClass) = 2
order by pd.patid,c.calendarDate
The Calendar is simple a calendar table with all possible dates throughout the length of the study with no other columns.
My query returns data that looks like
The overlap_date represents every day that a person was prescribed a drug in the two classes listed after the PatientDrugList CTE.
I would like to find the number of consecutive days that each person was prescribed both families of drugs. I can't use a simple max and min aggregate because that wouldn't tell me if someone stopped this regimen and then started again. What is an efficient way to find this out?
EDIT: The row constructor in the DrugList CTE should be a parameter for a stored procedure and was amended for the purposes of this example.
You are looking for consecutive sequences of dates. The key observation is that if you subtract a sequence from the dates, you'll get a constant date. This defines a group of dates all in sequence, which can then be grouped.
select patid
,MIN(overlap_date) as start_overlap
,MAX(overlap_date) as end_overlap
from(select cte.*,(dateadd(day,row_number() over(partition by patid order by overlap_Date),overlap_date)) as groupDate
from cte
)t
group by patid, groupDate
This code is untested, so it might have some typos.
You need to pivot on something and a max and min work that out. Can you state if someone had both drugs on a date pivot? Then you would be limiting by date if I understand your question correctly.
EG Example SQL:
declare #Temp table ( person varchar(8), dt date, drug varchar(8));
insert into #Temp values ('Brett','1-1-2013', 'h3a'),('Brett', '1-1-2013', 'h6h'),('Brett','1-2-2013', 'h3a'),('Brett', '1-2-2013', 'h6h'),('Joe', '1-1-2013', 'H3a'),('Joe', '1-2-2013', 'h6h');
with a as
(
select
person
, dt
, max(case when drug = 'h3a' then 1 else 0 end) as h3a
, max(case when drug = 'h6h' then 1 else 0 end) as h6h
from #Temp
group by person, dt
)
, b as
(
select *, case when h3a = 1 and h6h = 1 then 1 end as Logic
from a
)
select person, count(Logic) as DaysOnBothPresriptions
from b
group by person