Determine cluster of access time within 10min intervals per user per day in SQL Server - sql

How to query in SQL from the sample data, it will group or cluster the access_time per user per day within 10min intervals?

This is a complete guess, based on reading between the lines, and is untested due to a lack of consumable sample data.
It, however, looks like you are after a triangular JOIN (these can perform poorly, especially as this won't be SARGable) and a DENSE_RANK:
SELECT YT.[date],
YT.User_ID,
YT2.AccessTime,
DENSE_RANK() OVER (PARTITION BY YT.[date], YT.User_ID ORDER BY YT1.AccessTime) AS Cluster
FROM dbo.YourTable YT
JOIN dbo.YourTable YT2 ON YT.[date] = YT2.[date]
AND YT.User_ID = YT2.User_ID
AND YT.AccessTime <= YT2.AccessTime --This will join the row to itself
AND DATEADD(MINUTE,10,YT.AccessTime) >= YT2.AccessTime; --That is intentional

If I have understood your problem you want to group all accesses for a user in a day when all accesses of that group are in a time interval of 10 minutes. Not counting single accesses, so an access distant more than 10 minutes from every other is not counted as a cluster.
You can identify the clusters joining the accesses table with itself to get all possible time intervals of 10 minutes and number them.
Finally simply rejoin access table to get accesses for each cluster:
; with
user_clusters as (
select a1.date, a1.user_id, a1.access_time cluster_start, a2.access_time cluster_end,
ROW_NUMBER() over (partition by a1.date, a1.user_id order by a1.access_time) user_cluster_id
from ACCESS_TIMES a1
join ACCESS_TIMES a2 on a1.date = a2.date and a1.user_id = a2.user_id
and a1.access_time < a2.access_time
and datediff(minute, a1.access_time, a2.access_time)<10
)
select *
from user_clusters c
join ACCESS_TIMES a on a.date = c.date and a.user_id = c.user_id and a.access_time between c.cluster_start and cluster_end
order by a.date, a.user_id, c.user_cluster_id, a.access_time
output:
date user_id access_time user_cluster_id
'2020-09-19', 'AA083P', '2020-09-19 18:15:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 1
'2020-09-19', 'AA083P', '2020-09-19 18:22:00', 2
'2020-09-19', 'AA083P', '2020-09-19 18:28:00', 2
'2020-09-20', 'AB162Y', '2020-09-20 19:34:00', 1
'2020-09-20', 'AB162Y', '2020-09-20 19:37:00', 1

Related

Calculate time span between two specific statuses on the database for each ID

I have a table on the database that contains statuses updated on each vehicle I have, I want to calculate how many days each vehicle spends time between two specific statuses 'Maintenance' and 'Read'.
My table looks something like this
and I want to result to be like this, only show the number of days a vehicle spends in maintenance before becoming ready on a specific day
The code I written looks like this
drop table if exists #temps1
select
VehicleId,
json_value(VehiclesHistoryStatusID.text,'$.en') as VehiclesHistoryStatus,
VehiclesHistory.CreationTime,
datediff(day, VehiclesHistory.CreationTime ,
lead(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ) ) as days,
lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) as PrevStatus,
case
when (lag(json_value(VehiclesHistoryStatusID.text,'$.en')) over (order by VehiclesHistory.CreationTime) <> json_value(VehiclesHistoryStatusID.text,'$.en')) THEN datediff(day, VehiclesHistory.CreationTime , (lag(VehiclesHistory.CreationTime ) over (order by VehiclesHistory.CreationTime ))) else 0 end as testing
into #temps1
from fleet.VehicleHistory VehiclesHistory
left join Fleet.Lookups as VehiclesHistoryStatusID on VehiclesHistoryStatusID.Id = VehiclesHistory.StatusId
where (year(VehiclesHistory.CreationTime) > 2021 and (VehiclesHistory.StatusId = 140 Or VehiclesHistory.StatusId = 144) )
group by VehiclesHistory.VehicleId ,VehiclesHistory.CreationTime , VehiclesHistoryStatusID.text
order by VehicleId desc
drop table if exists #temps2
select * into #temps2 from #temps1 where testing <> 0
select * from #temps2
Try this
SELECT innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
,SUM(DATEDIFF(DAY,innerQ.PrevMaintenance,innerQ.CreationDate)) AS DayDuration
FROM
(
SELECT t1.VehichleID,t1.CreationDate,t1.Status,
(SELECT top(1) t2.CreationDate FROM dbo.Test t2
WHERE t1.VehichleID=t2.VehichleID
AND t2.CreationDate<t1.CreationDate
AND t2.Status='Maintenance'
ORDER BY t2.CreationDate Desc) AS PrevMaintenance
FROM
dbo.Test t1 WHERE t1.Status='Ready'
) innerQ
WHERE innerQ.PrevMaintenance IS NOT NULL
GROUP BY innerQ.VehichleID,innerQ.CreationDate,innerQ.Status
In this query first we are finding the most recent 'maintenance' date before each 'ready' date in the inner most query (if exists). Then calculate the time span with DATEDIFF and sum all this spans for each vehicle.

Creating average for specific timeframe

I'm setting up a time series with each row = 1 hr.
The input data has sometimes multiple values per hour. This can vary.
Right now the specific code looks like this:
select
patientunitstayid
, generate_series(ceil(min(nursingchartoffset)/60.0),
ceil(max(nursingchartoffset)/60.0)) as hr
, avg(case when nibp_systolic >= 1 and nibp_systolic <= 250 then
nibp_systolic else null end) as nibp_systolic_avg
from nc
group by patientunitstayid
order by patientunitstayid asc;
and generates this data:
It takes the average of the entire time series for each patient instead of taking it for each hour. How can I fix this?
I'm expecting something like this:
select nc.patientunitstayid, gs.hr,
avg(case when nc.nibp_systolic >= 1 and nc.nibp_systolic <= 250
then nibp_systolic
end) as nibp_systolic_avg
from (select nc.*,
min(nursingchartoffset) over (partition by patientunitstayid) as min_nursingchartoffset,
max(nursingchartoffset) over (partition by patientunitstayid) as max_nursingchartoffset
from nc
) nc cross join lateral
generate_series(ceil(min_nursingchartoffset/60.0),
ceil(max_nursingchartoffset/60.0)
) as gs(hr)
group by nc.patientunitstayid, hr
order by nc.patientunitstayid asc, hr asc;
That is, you need to be aggregating by hr. I put this into the from clause, to highlight that this generates rows. If you are using an older version of Postgres, then you might not have lateral joins. If so, just use a subquery in the from clause.
EDIT:
You can also try:
from (select nc.*,
generate_series(ceil(min(nursingchartoffset) over (partition by patientunitstayid) / 60.0),
ceil(max(nursingchartoffset) over (partition by patientunitstayid)/ 60.0)
) hr
from nc
) nc
And adjust the references to hr in the outer query.

Joining Tables on Time, IF NULL edit time by 1 minute

I have two tables.
Table 1 = My Trades
Table 2 = Market Trades
I want query the market trade 1 minute prior to my trade. If there is no market trade in Table 2 that is 1 minute apart from mine then I want to look back 2 minutes and so on till I have a match.
Right now my query gets me 1 minute apart but I cant figure out how to get 2 minutes apart if NULL or 3 minutes apart if NULL (up to 30 minutes). I think it would best using a variable but im not sure the best way to approach this.
Select
A.Ticker
,a.date_time
,CONVERT(CHAR(16),a.date_time - '00:01',120) AS '1MINCHANGE'
,A.Price
,B.Date_time
,B.Price
FROM
Trade..MyTrade as A
LEFT JOIN Trade..Market as B
on (a.ticker = b.ticker)
and (CONVERT(CHAR(16),a.date_time - '00:01',120) = b.Date_time)
There is no great way to do this in MySQL. But, because your code looks like SQL Server, I'll show that solution here, using APPLY:
select t.Ticker ,
convert(CHAR(16), t.date_time - '00:01', 120) AS '1MINCHANGE',
t.Price,
m.Date_time,
m.Price
from Trade..MyTrade as t outer apply
(select top 1 m.*
from Trade..Market m
where a.ticker = b.ticker and
convert(CHAR(16), t.date_time - '00:01', 120) >= b.Date_time)
order by m.DateTime desc
) m;

sql to select first n unique lines on sorted result

I have query resulting me 1 column of strings, result example:
NAME:
-----
SOF
OTP
OTP
OTP
SOF
VIL
OTP
SOF
GGG
I want to be able to get SOF, OTP, VIL - the first 3 unique top,
I tried using DISTINCT and GROUP BY, but it is not working, the sorting is damaged..
The query building this result is :
SELECT DISTINCT d.adst
FROM (SELECT a.date adate,
b.date bdate,
a.price + b.price total,
( b.date - a.date ) days,
a.dst adst
FROM flights a
JOIN flights b
ON a.dst = b.dst
ORDER BY total) d
I have "flights" table with details, and I need to get the 3 (=n) cheapest destinations.
Thanks
This can easily be done using window functions:
select *
from (
SELECT a.date as adate,
b.date as bdate,
a.price + b.price as total,
dense_rank() over (order by a.price + b.price) as rnk,
b.date - a.date as days,
a.dst as adst
FROM flights a
JOIN flights b ON a.dst = b.dst
) t
where rnk <= 3
order by rnk;
More details on window functions can be found in the manual:
http://www.postgresql.org/docs/current/static/tutorial-window.html
Find a way to do it.
I am selecting the DST and the PRICE, grouping by DST with MIN function on Price and limiting 3.
do I have better way to do it?
SELECT d.adst , min(d.total) mttl
FROM (SELECT a.date adate,
b.date bdate,
a.price + b.price total,
( b.date - a.date ) days,
a.dst adst
FROM flights a
JOIN flights b
ON a.dst = b.dst
ORDER BY total) d
group by adst order by mttl;
select
name
from
testname
where
name in (
select distinct(name) from testname)
group by name order by min(ctid) limit 3
SQLFIDDLE DEMO
You can tweak your query to return the correct result, by adding where days > 0 and limit 3 in the outer query like this:
select *
from
(
select
a.date adate,
b.date bdate,
(a.price + b.price) total,
(b.date - a.date) days ,
a.dst adst
from flights a
join flights b on a.dst = b.dst
order by total
) d
where days > 0
limit 3;
SQL Fiddle Demo
This assuming that the second entry is the return flight with date greater than the first entry. So that you got positive days difference.
Note that, your query without days > 0 will give you a cross join between the table and it self, for each flight you will get 4 rows, two with it self with days = 0 and other row with negative days so I used days > 0 to get the correct row.
I recommend that you add a new column, an Id Flight_Id as a primary key, and another foreign key something like From_Flight_Id. So the primary flight would have a null From_Flight_Id, and the returning flight will have a From_Flight_Id equal to the flight_id of the primary filght, this way you can join them properly instead.
SELECT DISTINCT(`EnteredOn`) FROM `rm_pr_patients` Group By `EnteredOn`
SELECT DISTINCT ON (column_name) FROM table_name order by name LIMIT 3;

Detecting duplicates which fall outside of a date interval

I searched in SO but couldnt find a direct answer.
There are patients, hospitals, medical branches(ER,urology,orthopedics,internal disease etc), medical operation codes (examination,surgical operation, MRI, ultrasound or sth. else) and patient visiting dates.
Patient visits doctor, doctor prescribes medicine and asks to come again for control check.
If patient returns after 10 days, (s)he has to pay another examination fee to the same hospital. Hospitals may appoint a date after 10 days telling there are no available slots in following 10 days, in order to get the examination fee.
Table structure is like:
Patient id.no Hospital Medical Branch Medical Op. Code Date
1 H1 M0 P1 01/05/2011
5 H1 M1 P9 03/05/2011
3 H2 M0 P2 09/05/2011
1 H1 M0 P1 14/05/2011
3 H1 M0 P2 20/05/2011
5 H1 M2 P9 25/05/2011
1 H1 M0 P3 26/05/2011
Here, visiting patients no. 3 and 5 does not constitute a problem as patient no. 3 visits different hospitals and patient no.5 visits different medical branches. They would pay the examination fee even if they visited within 10 days.
Patient no.1, however, visits same hospital, same branch and is subject to same process (P1: examination) on 01/05 and 14/05.
26/05 doesnt count because it is not medical examination.
What I want to flag is same patient, same hospital, same branch and same medical operation code (that is specifically medical examination : P1 ), with date range more than 10 days.
The format of resulting table:
HOSPITAL TOTAL NUM. of PATIENTS NUM. of PATIENTS OUT OF DATE RANGE
H1 x a
H2 y b
H3 z c
Thanks.
Once again, it's analytic functions to the rescue.
This query uses the LAG() function to link a record in YOUR_TABLE with the previous (defined by DATE) matching record (defined by PATIENT_ID) in the table.
select hospital_id
, count(*) as total_num_of_patients
, sum (out_of_range) as num_of_patients_out_of_range
from (
select patient_id
, hospital_id
, case
when hospital_id_1 = hospital_id_0
and visit_1 > visit_0 + 10
and med_op_code_1 = med_op_code_0
then 1
else 0
end as out_of_range
from (
select patient_id
, hospital_id as hospital_id_1
, date as visit_1
, med_op_code as med_op_code_1
, lag (date) over (partition by patient_id order by date) as visit_0
, lag (hopital_id) over (partition by patient_id order by date) as hopital_id_0
, lag (med_op_code) over (partition by patient_id order by date) as med_op_code_0
from your_table
where med_op_code = 'P1'
)
)
group by hospital_id
/
Caveat: I haven't tested this code, so it may contain syntax errors. I will check it the next time I can access an Oracle database.
This is a little rough, as I haven't got an Oracle DB to hand, but the key feature is the same: the analytical function LAG(). Along with its companion function, LEAD(), they're great for helping to deal with things like periods of activity.
Here's my attempt at the code:
select n.hospital, COUNT(n.patient_id) as patients_out_of_date_range
from (
select *
from (
select d.*, lag(date, 1) over (partition by d.patient_id, d.hospital, d.medical_branch, d.medical_op_code order by d.date) as prev_date
from datatable d inner join
(
select d.patient_id, d.hospital, d.medical_branch, d.medical_op_code
from datatable d
where d.medical_op_code = 'P1'
group by d.patient_id, d.hospital, d.medical_branch, d.medical_op_code
having COUNT(d.date) > 1
) t on d.patient_id = t.patient_id and d.hospital = t.hospital and d.medical_branch = t.medical_branch and d.medical_op_code = t.medical_op_code
) m
where date - prev_date > 10
) n
group by n.hospital
Like I say, this isn't tested, but it should at least get you started in the right direction.
Some references:
http://www.adp-gmbh.ch/ora/sql/analytical/lag.html
http://www.oracle-base.com/articles/misc/LagLeadAnalyticFunctions.php
I think this is what you're trying for:
WITH Patient_Visits (Patient_Id, Hospital_Id, Branch_Id, Visit_Date, Visit_Order) as (
SELECT Patient_Id, Hospital_Id, BranchId, Visit_Date,
ROW_NUMBER() OVER(PARTITION BY Patient_ID, Hospital_Id, Branch_Id,
ORDER_BY Patient_Id, Hospital_Id, Branch_Id, Visit_Date)
FROM Hospital_Visits
WHERE Procedure_Id = 'P1'),
Hospital_Recent_Visits (Hospital_Id, Recent_Visitor_Count) as (
SELECT a.Hospital_Id, COUNT(DISTINCT a.Patient_Id)
FROM Patient_Visits as a
JOIN Patient_Visits as b
ON b.Hospital_Id = a.Hospital_Id
AND b.Branch_Id = a.Branch_Id
AND b.Patient_Id = a.Patient_Id
AND b.Visit_Order = a.Visit_Order - 1
AND b.Visit_Date + 10 > a.Visit_Date
GROUP BY a.Hospital_Id, a.Patient_Id),
Hospital_Patient_Count (Hospital_Id, Patient_Count) as (
SELECT Hospital_Id, COUNT(DISTINCT Patient_Id)
FROM Hospital_Visits
GROUP BY Hospital_Id, Patient_Id)
SELECT a.Hospital_Id, b.Patient_Count, c.Recent_Visitor_Count
FROM Hospitals as a
LEFT JOIN Hospital_Patient_Count as b
ON b.Hospital_Id = a.Hospital_Id
LEFT JOIN Hospital_Recent_Visits as c
ON c.Hospital_id = a.Hospital_Id
Please note that this was written and tested against a DB2 system. I think Oracle databases have the relevant functionality, so the query should still work as written. However, DB2 appears to lack some of the OLAP functions Oracle has (my version, at least), which could be useful in knocking out some of the CTEs.