sql query to find customers who order too frequently? - sql

My database isn't actually customers and orders, it's customers and prescriptions for their eye tests (just in case anyone was wondering why I'd want my customers to make orders less frequently!)
I have a database for a chain of opticians, the prescriptions table has the branch ID number, the patient ID number, and the date they had their eyes tested. Over time, patients will have more than one eye test listed in the database. How can I get a list of patients who have had a prescription entered on the system more than once in six months. In other words, where the date of one prescription is, for example, within three months of the date of the previous prescription for the same patient.
Sample data:
Branch Patient DateOfTest
1 1 2007-08-12
1 1 2008-08-30
1 1 2008-08-31
1 2 2006-04-15
1 2 2007-04-12
I don't need to know the actual dates in the result set, and it doesn't have to be exactly three months, just a list of patients who have a prescription too close to the previous prescription. In the sample data given, I want the query to return:
Branch Patient
1 1
This sort of query isn't going to be run very regularly, so I'm not overly bothered about efficiency. On our live database I have a quarter of a million records in the prescriptions table.

Something like this
select p1.branch, p1.patient
from prescription p1, prescription p2
where p1.patient=p2.patient
and p1.dateoftest > p2.dateoftest
and datediff('day', p2.dateoftest, p1.dateoftest) < 90;
should do... you might want to add
and p1.dateoftest > getdate()
to limit to future test prescriptions.

This one will efficiently use an index on (Branch, Patient, DateOfTest) which you of course should have:
SELECT Patient, DateOfTest, pDate
FROM (
SELECT (
SELECT TOP 1 DateOfTest AS last
FROM Patients pp
WHERE pp.Branch = p.Branch
AND pp.Patient = p.Patient
AND pp.DateOfTest BETWEEN DATEADD(month, -3, p.DateOfTest) AND p.DateOfTest
ORDER BY
DateOfTest DESC
) pDate
FROM Patients p
) po
WHERE pDate IS NOT NULL

On way:
select d.branch, d.patient
from data d
where exists
( select null from data d1
where d1.branch = d.branch
and d1.patient = d.patient
and "difference (d1.dateoftest ,d.dateoftest) < 6 months"
);
This part needs changing - I'm not familiar with SQL Server's date operations:
"difference (d1.dateoftest ,d.dateoftest) < 6 months"

Self-join:
select a.branch, a.patient
from prescriptions a
join prescriptions b
on a.branch = b.branch
and a.patient = b.patient
and a.dateoftest > b.dateoftest
and a.dateoftest - b.dateoftest < 180
group by a.branch, a.patient
This assumes you want patients who visit the same branch twice. If you don't, take out the branch part.

SELECT Branch
,Patient
FROM (SELECT Branch
,Patient
,DateOfTest
,DateOfOtherTest
FROM Prescriptions P1
JOIN Prescriptions P2
ON P2.Branch = P1.Branch
AND P2.Patient = P2.Patient
AND P2.DateOfTest <> P1.DateOfTest
) AS SubQuery
WHERE DATEDIFF(day, SubQuery.DateOfTest, SubQuery.DateOfOtherTest) < 90

Related

concurrent bookings

I have a table on booking orders
Bookings (order_no, user_id, booking_time,complete_time)
I try to write a query to return the order_no from all rows where customers made concurrent bookings (customer made a new booking before they completed the previous booking).
Explanation:
Customer X booked #000 at 1:15, and completed it at 1:25.
Customer X booked #001 at 1:20, and completed it at 1:25.
Customer X booked #002 at 5:30, and completed it at 6:00.
Customer Y booked #020 at 1:20, and completed it at 2:10.
Customer Y booked #021 at 6:55, and completed it at 7:16.
Only Customer X had a concurrent booking. The correct query would return order_no #000 and #001.
Output should be
000
001
I have tried using subquery in the criteria, but I still don’t get the logic
I need help with this, Please someone help me
If you want both bookings on separate rows, then one method is window functions:
select b.*
from (select b.*,
lag(booking_time) over (partition by user_id order by booking_time) as prev_booking_time,
lead(booking_time) over (partition by user_id order by booking_time) as next_booking_time,
lag(coalesce(complete_time, cancel_time) over (partition by user_id order by booking_time) as prev_end_time
from bookings b
) b
where (next_booking_time >= booking_time and
next_booking_time < coalesce(complete_time, cancel_time)
) or
(booking_time > prev_booking_time and
booking_time < prev_end_time
);
If you want the overlaps on one row, then you can do:
select b1.*, b2.*
from bookings b1 join
bookings b2
on b2.user_id = b1.user_id and
b2.booking_time >= b1.booking_time and
(b2.booking_time <= b1.complete_time) or
b2.booking_time <= b1.cancel_time
);
Note that for multiple overlaps on the same booking, this produces a row for each pair.
This is just the overlapping date range problem. You may solve this via a self join:
SELECT b1.*
FROM Bookings b1
INNER JOIN Bookings b2
ON b1.user_id = b2.user_id AND
b1.order_no <> b2.order_no
WHERE
b2.booking_time < b1.complete_time AND
b2.complete_time > b1.booking_time;

Count number of transactions for first 30 days of account creation for all accounts

I want to count the number of transactions for the first 30 days from an account's creation for all accounts. The issue is not all accounts were created at the same time.
Example: [Acct_createdTable]
Acct Created_date
909099 01/02/2015
878787 02/03/2003
676767 09/03/2013
I can't Declare a datetime variable since it can only take one datetime.
and I can't do :
Select acctnumber,min,count(*)
from transaction_table
where transactiondate between (
select Created_date from Acct_createdTable where Acct = 909099)
and (
select Created_date from Acct_createdTable where Acct = 909099)+30
Since then it'll only count the number of transaction for only one acct.
What I want for my output is.
Acct First_30_days_count
909099 23
878787 190
676767 23
I think what you're looking for is a basic GROUP BY query.
SELECT
ac.acctnumber,
COUNT(td.id)
FROM Acct_createdTable ac
LEFT JOIN transactiondate td ON
td.acct = ac.acctnumber
AND
td.transaction_date BETWEEN ac.create_date AND DATEADD(30, DAY, ac.create_date)
GROUP BY
ac.acctnumber
This should return number of transactions within first 30 days for each account. This of course is pseudocode as you didn't state your database platform. The left join will ensure that accounts with no transactions in that period will get displayed.
An alternative solution would be to use outer apply like this:
select a.acct, o.First_30_days_count
from acct_createdtable a
outer apply (
select count(*) First_30_days_count
from transaction_table
where acctnumber = a.acct
and transactiondate between a.created_date and dateadd(day, 30, a.created_date)
) o;

Counting concurrent records based on startdate and enddate columns

The table structure:
StaffingRecords
PersonnelId int
GroupId int
StaffingStartDateTime datetime
StaffingEndDateTime datetime
How can I get a list of staffing records, given a date and a group id that employees belong to, where the count of present employees fell below a threshold, say, 3, at any minute of the day?
The way my brain works, I would call a stored proc repeatedly with each minute of the day, but of course this would be horribly inefficient:
SELECT COUNT(PersonnelId)
FROM DailyRosters
WHERE GroupId=#GroupId
AND StaffingStartTime <= #TimeParam
AND StaffingEndTime > #TimeParam
AND COUNT(GroupId) < 3
GROUP BY GroupId
HAVING COUNT(PersonnelId) < 3
Edit: If it helps to refine the question, employees may come and go throughout the day. Personnel may have a staffing record from 0800 - 0815, and another from 1000 - 1045, for example.
Here is a solution where I find all of the distinct start and end times, and then query to see how many other people are clocked in at the time. Everytime the answer is less than 4, you know you are understaffed at that time, and presumably until the NEXT start time.
with meaningfulDtms(meaningfulTime, timeType, group_id)
as
(
select distinct StaffingStartTime , 'start' as timeType, group_id
from DailyRosters
union
select distinct StaffingEndTime , 'end' as timeType, group_id
from DailyRosters
)
select COUNT(*), meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
from DailyRosters dr
inner join meaningfulDtms on dr.group_id = meaningfulDtms.group_id
and (
(dr.StaffingStartTime < meaningfulDtms.meaningfulTime
and dr.StaffingEndTime >= meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'start')
OR
(dr.StaffingStartTime <= meaningfulDtms.meaningfulTime
and dr.StaffingEndTime > meaningfulDtms.meaningfulTime
and meaningfulDtms.timeType = 'end')
)
group by meaningfulDtms.group_id, meaningfulDtms.meaningfulTime
having COUNT(*) < 4
Create a table with all minutes in the day with dt at PK
It will have 1440 rows
this will not give you count of zero - no staff
select allMiuntes.dt, worktime.grpID, count(distinct(worktime.personID))
from allMinutes
join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
group by allMiuntes.dt, worktime.grpID
having count(distinct(worktime.personID)) < 3
for times with zero I think the best way is a master of grpID
but I am not sure about this one
select allMiuntes.dt, grpMaster.grpID, count(distinct(worktime.personID))
from grpMaster
cross join allMinutes
left join worktime
on allMiuntes.dt > worktime.start
and allMiuntes.dt < worktime.end
and worktime.grpID = grpMaster.grpID
group by allMiuntes.dt, grpMaster.grpID
having count(distinct(worktime.personID)) < 3

Detecting duplicates which fall outside of a date interval

I searched in SO but couldnt find a direct answer.
There are patients, hospitals, medical branches(ER,urology,orthopedics,internal disease etc), medical operation codes (examination,surgical operation, MRI, ultrasound or sth. else) and patient visiting dates.
Patient visits doctor, doctor prescribes medicine and asks to come again for control check.
If patient returns after 10 days, (s)he has to pay another examination fee to the same hospital. Hospitals may appoint a date after 10 days telling there are no available slots in following 10 days, in order to get the examination fee.
Table structure is like:
Patient id.no Hospital Medical Branch Medical Op. Code Date
1 H1 M0 P1 01/05/2011
5 H1 M1 P9 03/05/2011
3 H2 M0 P2 09/05/2011
1 H1 M0 P1 14/05/2011
3 H1 M0 P2 20/05/2011
5 H1 M2 P9 25/05/2011
1 H1 M0 P3 26/05/2011
Here, visiting patients no. 3 and 5 does not constitute a problem as patient no. 3 visits different hospitals and patient no.5 visits different medical branches. They would pay the examination fee even if they visited within 10 days.
Patient no.1, however, visits same hospital, same branch and is subject to same process (P1: examination) on 01/05 and 14/05.
26/05 doesnt count because it is not medical examination.
What I want to flag is same patient, same hospital, same branch and same medical operation code (that is specifically medical examination : P1 ), with date range more than 10 days.
The format of resulting table:
HOSPITAL TOTAL NUM. of PATIENTS NUM. of PATIENTS OUT OF DATE RANGE
H1 x a
H2 y b
H3 z c
Thanks.
Once again, it's analytic functions to the rescue.
This query uses the LAG() function to link a record in YOUR_TABLE with the previous (defined by DATE) matching record (defined by PATIENT_ID) in the table.
select hospital_id
, count(*) as total_num_of_patients
, sum (out_of_range) as num_of_patients_out_of_range
from (
select patient_id
, hospital_id
, case
when hospital_id_1 = hospital_id_0
and visit_1 > visit_0 + 10
and med_op_code_1 = med_op_code_0
then 1
else 0
end as out_of_range
from (
select patient_id
, hospital_id as hospital_id_1
, date as visit_1
, med_op_code as med_op_code_1
, lag (date) over (partition by patient_id order by date) as visit_0
, lag (hopital_id) over (partition by patient_id order by date) as hopital_id_0
, lag (med_op_code) over (partition by patient_id order by date) as med_op_code_0
from your_table
where med_op_code = 'P1'
)
)
group by hospital_id
/
Caveat: I haven't tested this code, so it may contain syntax errors. I will check it the next time I can access an Oracle database.
This is a little rough, as I haven't got an Oracle DB to hand, but the key feature is the same: the analytical function LAG(). Along with its companion function, LEAD(), they're great for helping to deal with things like periods of activity.
Here's my attempt at the code:
select n.hospital, COUNT(n.patient_id) as patients_out_of_date_range
from (
select *
from (
select d.*, lag(date, 1) over (partition by d.patient_id, d.hospital, d.medical_branch, d.medical_op_code order by d.date) as prev_date
from datatable d inner join
(
select d.patient_id, d.hospital, d.medical_branch, d.medical_op_code
from datatable d
where d.medical_op_code = 'P1'
group by d.patient_id, d.hospital, d.medical_branch, d.medical_op_code
having COUNT(d.date) > 1
) t on d.patient_id = t.patient_id and d.hospital = t.hospital and d.medical_branch = t.medical_branch and d.medical_op_code = t.medical_op_code
) m
where date - prev_date > 10
) n
group by n.hospital
Like I say, this isn't tested, but it should at least get you started in the right direction.
Some references:
http://www.adp-gmbh.ch/ora/sql/analytical/lag.html
http://www.oracle-base.com/articles/misc/LagLeadAnalyticFunctions.php
I think this is what you're trying for:
WITH Patient_Visits (Patient_Id, Hospital_Id, Branch_Id, Visit_Date, Visit_Order) as (
SELECT Patient_Id, Hospital_Id, BranchId, Visit_Date,
ROW_NUMBER() OVER(PARTITION BY Patient_ID, Hospital_Id, Branch_Id,
ORDER_BY Patient_Id, Hospital_Id, Branch_Id, Visit_Date)
FROM Hospital_Visits
WHERE Procedure_Id = 'P1'),
Hospital_Recent_Visits (Hospital_Id, Recent_Visitor_Count) as (
SELECT a.Hospital_Id, COUNT(DISTINCT a.Patient_Id)
FROM Patient_Visits as a
JOIN Patient_Visits as b
ON b.Hospital_Id = a.Hospital_Id
AND b.Branch_Id = a.Branch_Id
AND b.Patient_Id = a.Patient_Id
AND b.Visit_Order = a.Visit_Order - 1
AND b.Visit_Date + 10 > a.Visit_Date
GROUP BY a.Hospital_Id, a.Patient_Id),
Hospital_Patient_Count (Hospital_Id, Patient_Count) as (
SELECT Hospital_Id, COUNT(DISTINCT Patient_Id)
FROM Hospital_Visits
GROUP BY Hospital_Id, Patient_Id)
SELECT a.Hospital_Id, b.Patient_Count, c.Recent_Visitor_Count
FROM Hospitals as a
LEFT JOIN Hospital_Patient_Count as b
ON b.Hospital_Id = a.Hospital_Id
LEFT JOIN Hospital_Recent_Visits as c
ON c.Hospital_id = a.Hospital_Id
Please note that this was written and tested against a DB2 system. I think Oracle databases have the relevant functionality, so the query should still work as written. However, DB2 appears to lack some of the OLAP functions Oracle has (my version, at least), which could be useful in knocking out some of the CTEs.

Count records with a criteria like "within days"

I have a table as below on sql.
OrderID Account OrderMethod OrderDate DispatchDate DispatchMethod
2145 qaz 14 20/3/2011 23/3/2011 2
4156 aby 12 15/6/2011 25/6/2011 1
I want to count all records that have reordered 'within 30 days' of dispatch date where Dispatch Method is '2' and OrderMethod is '12' and it has come from the same Account.
I want to ask if this all can be achieved with one query or do I need to create different tables and do it in stages as I think I wll have to do now? Please can someone help with a code/query?
Many thanks
T
Try the following, replacing [tablename] with the name of your table.
SELECT Count(OriginalOrders.OrderID) AS [Total_Orders]
FROM [tablename] AS OriginalOrders
INNER JOIN [tablename] AS Reorders
ON OriginalOrders.Account = Reorders.Account
AND OriginalOrders.OrderDate < Reorders.OrderDate
AND DATEDIFF(day, OriginalOrders.DispatchDate, Reorders.OrderDate) <= 30
AND Reorders.DispatchMethod = '2'
AND Reorders.OrderMethod = '12';
By using an inner join you'll be sure to only grab orders that meet all the criteria.
By linking the two tables (which are essentially the same table with itself using aliases) you make sure only orders under the same account are counted.
The results from the join are further filtered based on the criteria you mentioned requiring only orders that have been placed within 30 days of the dispatch date of a previous order.
Totally possible with one query, though my SQL is a little stale..
select count(*) from table
where DispatchMethod = 2
AND OrderMethod = 12
AND DATEDIFF(day, OrderDate, DispatchDate) <= 30;
(Untested, but it's something similar)
One query can do it.
SELECT COUNT(*)FROM myTable reOrder
INNER JOIN myTable originalOrder
ON reOrder.Account = originalOrder.Account
AND reOrder.OrderID <> originalOrder.OrderID
-- all re-orders that are within 30 days or the
-- original orders dispatch date
AND DATEDIFF(d, originalOrder.DispatchDate, reOrder.OrderDate) <= 30
WHERE reOrder.DispatchMethod = 2
AND reOrder.OrderMethod = 12
You need a self-join.
The query below assumes that a given account will have either 1 or 2 records in the table - 2 if they've reordered, else 1.
If 3 records exist for a given account, 2 orders + 1 reorder then this won't work - but we'd then need more information on how to distinguish between an order and a reorder.
SELECT COUNT(*) FROM myTable new, myTable prev
WHERE new.DispatchMethod = 2
AND new.OrderMethod = 12
AND DATEDIFF(day, prev.DispatchDate, new.OrderDate) <=30
AND prev.Account == new.Account
AND prev.OrderDate < new.OrderDate
Can we use GROUP BY in this case, such as the following?
SELECT COUNT(Account)
FROM myTable
WHERE DispatchMethod = 2 AND OrderMethod = 12
AND DATEDIFF(d, DispatchDate, OrderDate) <=30
GROUP BY Account
Will the above work or am I missing something here?