Select 1st and 2nd Record before record X - sql

SQL Server 2008-12
I have table:
InteractionKey char(18)
dEventTime datetime
SeqNo int
cEventData1
There will be multiple entries per InteractionKey - dEventTime only goes out to the Seconds and SeqNo is incremented if two entries occur on the same second.
What I need to do is select the First and Second record BEFORE the record where
cEventData1 = 'Disconnect'
The final product will give me a count of occurrences grouped by cEventData1.
I am currently using a cursor (will update with cursor source momentarily) I would like to use a CTE - but I really struggle with understanding them...
Any ideas would be appreciated!
Update with Data Sample
INTERACTIONKEY dEventTime SeqNo cEventData1
100186322420130722 2013-07-22 11:50:49.000 1 EnterPassword
100186322420130722 2013-07-22 11:50:49.000 2 CheckPassword
100186322420130722 2013-07-22 11:50:49.000 3 Attendant Disconnect
The result of the query would ideally tell me - : NOTE The Action column here can be simply 'Attendant Disconnect' as Action
cEventData1 Action Count
CheckPassword Attendant Disconnect 1
Here is the query I ended up going with based upon the below answer
SELECT DISTINCT t1.InteractionKey,
DisconnectTime = t1.dEventTime,
PreviousEventTime = t2.dEventTime,
PreviousEvent = t2.cEventData1,
t2.SeqNo
FROM IVRHistory t1
OUTER APPLY
( SELECT TOP 1 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM IVRHistory t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime >= t2.dEventTime
AND t1.SeqNo > t2.SeqNo
AND t2.cEventData1 <> 'Attendant Disconnect'
ORDER BY t2.dEventTime DESC, t2.SeqNo DESC
) t2
WHERE t1.cEventData1 = 'Attendant Disconnect'

I would approach this using APPLY:
SELECT t1.InteractionKey,
DisconnectTime = t1.dEventTime,
PreviousEventTime = t2.dEventTime,
PreviousEvent = t2.cEventData1,
t2.SeqNo
FROM T t1
OUTER APPLY
( SELECT TOP 2 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM T t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime > t2.dEventTime
ORDER BY t2.dEventTime DESC
) t2
WHERE t1.cEventData1 = 'Disconnect';
This will give you the two records immediately preceeding the disconnect event. If you need more than two records if there are duplicate times you can use TOP 2 WITH TIES.
Without your sample input and output I am guessing a bit, but from what you have said your final aggregate would be:
SELECT t2.cEventData1,
Occurances = COUNT(*)
FROM T t1
OUTER APPLY
( SELECT TOP 2 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM T t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime > t2.dEventTime
ORDER BY t2.dEventTime DESC
) t2
WHERE t1.cEventData1 = 'Disconnect'
GROUP BY t2.cEventData1;

Related

In SAS, how can I select all the ID groups which has specific relationship between another variables within the ID group?

For example, I want to get dataset2 from dataset1.
From dataset1, all IDs, whose value1 of any specific phase is over 10 points greater than the value2 of a previous phase within the IDs (pointed as arrow), were selected in dataset2.
I am using SAS EG version and it was impossible for me to make such query.
Thank you very much in advance.
You can do this in SQL. To get the rows matching the condition:
select t.*
from t join
t tnext
on tnext.id = t.id and
tnext.phase = t.phase + 1
where tnext.value1 > t.value2 + 10;
Then you can list the ids using in or exists:
select t.*
from t
where t.id in (select t2.id
from t t2 join
t tnext
on tnext.id = t2.id and
tnext.phase = t2.phase + 1
where tnext.value1 > t2.value2 + 10
);
Calculate the difference for each group (DIF())
Get IDs where the difference is greater than 10
Filter main table
data temp;
set have;
by id phase;
/*Part 1*/
difference = dif(value1);
if first.id difference = .;
/*Part 2*/
if difference > 10 then output;
run;
/*Part 3*/
proc sql;
create table want as
select * from have
where ID in (select distinct ID from temp);
quit;

SQL Server - How to check if a value does not exist in other rows of the same table for same column values?

Following are the two tables in SQL Server: TABLE_A and TABLE_B
I need to get the output as follows:
Get IDs from TABLE_A where Exist = 0
We would get 100, 101 & 102
Now, among 100, 101 & 102, no other rows (in the same table) with the same ID value should have Exist = 1
Hence, 100 can't be selected as it has Exist = 1 in the 2nd row.
So, only 101 & 102 remain
With the remaining ID values (101 & 102), check against the ID column in TABLE_B where 'Exist' column value should not be equal to '1' in any of the rows
In TABLE_B, 4th row has Exist = 1 for 102. So, that can't be selected
We have only 101 now. This is required output and that should be selected.
Could you let me know how to write the simplest query to achieve this please? Let me know if the question needs to be improved.
You can use exists & not exists :
with t as (
select t1.*
from t1
where exists (select 1 from t1 t11 where t11.id = t1.id and t11.exists = 0) and
not exists (select 1 from t1 t11 where t11.id = t1.id and t11.exists = 1)
)
select t.*
from t
where not exists (select 1 from t2 where t.id = t2.id and t2.exists = 1);
Try:
SELECT
ID,
SUM(CAST(Exist AS int)) AS [Exists]
FROM
TABLE_A
GROUP BY ID
HAVING SUM(CAST(Exist AS bit)) = 0
will give you the answer to the first part. You can then JOIN this to a similar query for TABLE_B. That is a "simple" way to show how this works. You can write more complex queries as that from #Yogest Sharma
Like #Peter Smith mentioned, you can use the aggregate function SUM. Note that you would need a cast since you cannot use the aggregate function on a field that has a BIT datatype
;WITH CTE AS
(
SELECT ID, SUM(CAST(Exist AS INT)) AS AggExist FROM TABLE_A GROUP BY ID
UNION
SELECT ID, SUM(CAST(Exist AS INT)) As AggExist FROM TABLE_B GROUP BY ID
)
SELECT ID, SUM(AggExist) FROM CTE GROUP BY ID
HAVING SUM(AggExist) = 0
Here is the demo

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.
So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'
The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.

Fetch unique combinations of two field values

Probably it has been asked before but I cannot find an answer.
Table Data has two columns:
Source Dest
1 2
1 2
2 1
3 1
I trying to come up with a MS Access 2003 SQL query that will return:
1 2
3 1
But all to no avail. Please help!
UPDATE: exactly, I'm trying to exclude 2,1 because 1,2 already included. I need only unique combinations where sequence doesn't matter.
For Ms Access you can try
SELECT DISTINCT
*
FROM Table1 tM
WHERE NOT EXISTS(SELECT 1 FROM Table1 t WHERE tM.Source = t.Dest AND tM.Dest = t.Source AND tm.Source > t.Source)
EDIT:
Example with table Data, which is the same...
SELECT DISTINCT
*
FROM Data tM
WHERE NOT EXISTS(SELECT 1 FROM Data t WHERE tM.Source = t.Dest AND tM.Dest = t.Source AND tm.Source > t.Source)
or (Nice and Access Formatted...)
SELECT DISTINCT *
FROM Data AS tM
WHERE (((Exists (SELECT 1 FROM Data t WHERE tM.Source = t.Dest AND tM.Dest = t.Source AND tm.Source > t.Source))=False));
your question is asked incorrectly. "unique combinations" are all of your records. but i think you mean one line per each Source. so it is:
SELECT *
FROM tab t1
WHERE t1.Dest IN
(
SELECT TOP 1 DISTINCT t2.Dest
FROM tab t2
WHERE t1.Source = t2.Source
)
SELECT t1.* FROM
(SELECT
LEAST(Source, Dest) AS min_val,
GREATEST(Source, Dest) AS max_val
FROM table_name) AS t1
GROUP BY t1.min_val, t1.max_val
Will return
1, 2
1, 3
in MySQL.
To eliminate duplicates, "select distinct" is easier than "group by":
select distinct source,dest from data;
EDIT: I see now that you're trying to get unique combinations (don't include both 1,2 and 2,1). You can do that like:
select distinct source,dest from data
minus
select dest,source from data where source < dest
The "minus" flips the order around and eliminates cases where you already have a match; the "where source < dest" keeps you from removing both (1,2) and (2,1)
Use this query :
SELECT distinct * from tabval ;