Using JOIN in SQL with a bound - sql

I'm new to SQL and I have a question about JOINs.
The question goes like this, There are 2 tables, The first table stores data about Patients and there is an attribute in the patient table called Field, which stores the medical field under which the patient was treated. The second table is called Doctors, and here there is an attribute called Specialization, which stores the medical field in which the doctor specializes.
Medical fields i.e Cardiology, Virology, and so on.
There can be more doctors who practice in the same specialization.
If I were to join the tables on the basis of the Doctors.Specialization and Patients.Field and a constraint of that each doctor will be matched with a maximum of 5 patients, Then what would be the query?
SELECT *
FROM Patients
inner join Doctors on Patients.Diagnosis = Doctors.Specialization;

I would solve it like this:
Join the two tables using specialization and diagnosis columns.
Rank doctors and patients by specialization using DENSE_RANK() analytic function
Filter the data. Patients' ranks must be in a range which's:
lower bound (exclusive) is: (doctors' rank - 1) * 5.
If doctor's rank is 1, then it's 0.
If doctor's rank is 2, then it's 5.
upper bound (inclusive) is: doctors' rank * 5.
If doctor's rank is 1, then it's 5.
If doctor's rank is 2, then it's 10.
WITH base AS (
SELECT d.specialization,
d.id AS doctor_id,
d.name AS doctor_name,
p.id AS patient_id,
p.name AS patient_name,
-- Rank doctors by specialization.
DENSE_RANK() OVER (
PARTITION BY d.specialization
ORDER BY d.id
) AS doc_spec_rank,
-- Rank patients by specialization
DENSE_RANK() OVER (
PARTITION BY d.specialization
ORDER BY p.id
) AS patient_spec_rank
FROM doctors d
INNER JOIN patients p
ON d.specialization = p.diagnosis
)
SELECT *
FROM base
WHERE (
(doc_spec_rank - 1) * 5 < patient_spec_rank
AND doc_spec_rank * 5 >= patient_spec_rank
)
ORDER BY specialization, doc_spec_rank, patient_spec_rank
;
Since you didn't provide your rbdms and test data, I took the liberty of creating a sample schema in Oracle 18c.
Here's a fiddle with the schema and the solution: https://dbfiddle.uk/4_kikOO7

Related

SQL How to pull in all records that don't contain

This is a bit of a trick question to explain, but I'll try my best.
The essence of the question is that I have a employee salary table and the columns are like so,: Employee ID, Month of Salary, Salary (Currency).
I want to run a select that will show me all of the employees that don't have a record for X month.
I have attached an image to assist in the visualising of this, and here is an example of what UI would want from this data:
Let's say from this small example that I want to see all of the employees that weren't paid on the 1st October 2021. From looking I know that employee 3 was the only one paid and 1 and 2 were not paid. How would I be able to query this on a much larger range of data without knowing which month it could be that they weren't paid?
You need to join your EmployeeSalary table against a list of expected EmployeeID/MonthOfSalary values, and determine the gaps - the instances where there is no matching record in the EmployeeSalary table. A LEFT OUTER JOIN can be used here, whenever there's no matching record / missing record in your EmployeeSalary table, the LEFT OUTER JOIN will give you NULL.
The following query shows how to perform the LEFT OUTER JOIN, however note that I've joined your table on itself to get the list of EmployeeID and MonthOfSalary values. You would be better to join these from other tables, i.e. I assume you have an Employee table with all the IDs in it, which would be more efficient (and more accurate) to use, than building the ID list from the EmployeeSalary table (like I've done).
SELECT EmployeeList.EmployeeID, MonthList.MonthOfSalary
FROM (SELECT DISTINCT MonthOfSalary FROM EmployeeSalary) MonthList
JOIN (SELECT DISTINCT EmployeeID FROM EmployeeSalary) EmployeeList
LEFT OUTER JOIN EmployeeSalary
ON MonthList.MonthOfSalary = EmployeeSalary.MonthOfSalary
AND EmployeeList.EmployeeID = EmployeeSalary.EmployeeID
WHERE EmployeeSalary.EmployeeID IS NULL
You need first to get the latest value, then to calculate the difference and make a filter on it. The filter can be done thanks to having clause.
I propose you the following starting point, that you might need to adapt, at least to cast some formats according to your column types.
with latest_pay as (
-- Filter to get, for each employee, the latest paid month
select Employee_ID, Month, Salary, max(month) as latest_pay_month
from your_table
group by Employee_ID
)
-- Look for employees not paid since more than 'your_treshold' months
select Employee_ID, latest_pay_month, Salary, datediff(latest_pay_month, getdate(), Month) as latest_paid_month_delay
from latest_pay
having datediff(latest_pay_month, getdate(), Month) > your_threshold
Btw, I know it's an example, but avoid using column names such as Month, which would lead to confusions and errors with SQL keywords
This is ideally where you would use a calendar table - having one available is handy for tasks such as this where you need to find missing dates.
You can build one on the fly, I have done so in this example however you would normally have a permanant table to use.
In order to determin which rows are missing you need to generate a list of expected rows, an outer join to your actual data will then reveal the missing rows.
So here we have a CTE that generates a list of dates (based on a date range you can set), followed by another to give a list of all the EmployeeId values.
You expect each employeeId to have a row for each month, so we do a cross join to generate the list of expected results, we then outer join with the actual data and filter to the null rows, these are the employees who have no been paid for that month.
See example DB<>Fiddle
declare #from date='20210101', #to date='20211001';
with dates as (
select DateAdd(month,n,#from) dt from (
select top(100) Row_Number() over(order by (select null))-1 n from master.dbo.spt_values
)v
), e as (select distinct employeeId from t)
select dt, e.EmployeeId
from dates d cross join e
left join t on DatePart(month,d.dt)=DatePart(month,t.PaidDate) and t.EmployeeId=e.EmployeeId
where d.dt<=#to
and t.EmployeeId is null

SQL Select exact rows

I got some tables and now I want to determine the current rank of each customer.
I got a log table that holds all the Information when a customer got a "point" then I created a view that counts the "points" for every customer. Now I'm trying to create another view that matches the customers points with the current Rank he has. Furthermore I got a "rank" table that holds the name of the rank and the min points you need to have to reach that rank. My Problem is now that when I do
SELECT r.neededVisits, r.name, av.customerId
FROM Rank r, amountOfVisits av
WHERE av.amount >= r.neededVisits
I get something like this:
[Table Output]
The left column "besuche" holds the value that is needed for that rank i.e. for the rank "gast" you need 0 visits. For the rank "Stammgast" you need 25 visits.
So I get every rank that a customer ever passed. But I just want to get the last rank for each customer
Is there any way I can do this?
Desired Result would be something like this:
[Deisred Result]
The Table that holds the ranks
[Rank Table]
[Rank Table Values]
The table that holds the counted visits for each user
[Amount of visits for each user]
[Amount of visits Table values]
I assume you are looking for something like
SELECT r.neededVisits, r.name, av.customerId
FROM Rank r, amountOfVisits av
WHERE av.amount >= r.neededVisits
AND NOT EXISTS (SELECT * FROM rank r2 WHERE r2.neededVisits < r.neededVisits AND av.amount >= r2.neededVisits)
This uses your current logic, but the final condition removes the in between ranks.
As pointed out in the comments, you should probably try to rewrite with an inner join which would be more like
SELECT r.neededVisits, r.name, av.customerId
FROM Rank r INNER JOIN amountOfVisits av
ON av.amount >= r.neededVisits
WHERE NOT EXISTS (SELECT * FROM rank r2 WHERE r2.neededVisits < r.neededVisits AND av.amount >= r2.neededVisits)
I think the above is pretty readable, but a more modern method (depending on your DBMS) would be to use window functions. Something like
SELECT neededVisits, name, customerId
FROM
(
SELECT r.neededVisits, r.name, av.customerId, RANK() OVER (PARTITION BY av.customerId ORDER BY r.neededVisits DESC) tr
FROM Rank r INNER JOIN amountOfVisits av
ON av.amount >= r.neededVisits
) iq
WHERE tr=1
The inner query here calculates a column "tr" that is ordered DESC based on the matching ranks for that customer. The outer query gets the first one.

Avg used on count

I have a two tables that have following attributes
DOCTORS OPERATIONS
D_ID DATE
Name TYPE
Specialiation DOCTORS_D_ID
PACIENTS_PACIENT_ID
I want to return name and ID of doctores that operated more than the average number of operations per doctor.
I have created following SQL command
SELECT Name D_ID,COUNT(*) FROM DOCTORS
JOIN OPERATION
ON D_ID = DOCTORS_D_ID
GROUP BY D_ID,Name
HAVING COUNT(*) > ( SELECT AVG(COUNT(DOCTORs_D_ID))
FROM OPERATIONS GROUP by DOCTORS_D_ID )
this result in following table
D_ID COUNTS(*)
Dr. Martin 3
In column D_ID is name instead of ID = only one of two attributes is returned in table. How can I return both - name and D_ID from this command?
I am not a fan of nested aggregation functions. I would just do this by calculating the average directly:
SELECT Name, D_ID, COUNT(*)
FROM DOCTORS JOIN
OPERATION
ON D_ID = DOCTORS_D_ID
GROUP BY D_ID, Name
HAVING COUNT(*) > (SELECT COUNT(*) / COUNT(DISTINCT DOCTORs_D_ID))
FROM OPERATIONS
);
There is an issue of not counting doctors who do no operations in the average (in which case the average from just using the operations table [or an inner join with the operations table] will be higher than the actual answer from taking the number of operations in the operations table and the number of doctors in the doctors table).
To compensate for this you can do:
SELECT Name,
D_ID,
num_operations
FROM ( SELECT Name,
D_ID,
COUNT( 1 ) OVER () AS num_doctors
FROM doctors ) d
LEFT OUTER JOIN
( SELECT DISTINCT
DOCTORS_D_ID,
COUNT( 1 ) OVER ( PARTITION BY DOCTORS_D_ID ) AS num_operations,
COUNT( 1 ) OVER () AS total_operations
FROM operations ) o
ON ( d.d_id = o.doctors_d_id )
WHERE num_operations > total_operations / num_doctors;
It has the added bonus using analytic functions to calculate the counts rather than performing a third table scan.
with num_operations as
select doctors_d_id,count( * ) as operations from operations
group by doctors_d_id and having count(*)>
(select avg(count(doctors_d_id) from operations group by doctors_d_id )
select doctors_d_id,operations,name from num_operation a, doctors b
where a.doctors_d_id=b.d_id

SQL: Get the first value

I have two tables:
patients(ID, Firstname, Lastname, ...)
records(ID, Date, Time, Version)
I want to (inner) join these tables, so I have the records with patient data, but in the column for Version I want always the first value that was recorded for the patient (so with the minimum of date and time dependent on the patient (id)). I tried with subquery but HANA doesn't allow ORDER-BY or LIMIT clause in subqueries.
How can I implement this with SQL? (HANA SQL)
Kind regards and thanks in advance.
HANA supports window functions, so you can join against a derived table that picks the first version:
select p.*, r.id, r.date, r.time, r.version
from patients p
join (
select id, date, time, version, patient_id,
row_number() over (partition by patient_id order by version) as rn
from records
) r on p.id = r.patient_id and r.rn = 1
The above assumes that the records table has a column patient_id that contains the id of the patients table to which that record belongs to.

SQL to group on maximum of two columns

I am having trouble displaying data from two tables, using what I think should be a group method.
I currently have a table containing pupils, and another containing the grades achieved (points) each year and term. See below:
PupilID, FirstName, Surname, DOB
GradeID, PupilID, SchoolYear, Term, Points
I want to query both tables and display all pupils with their latest grade, this should look for the maximum SchoolYear, then the maximum Term, and display the Points alongside the PupilID, FirstName and Surname.
I would appreciate any help anyone can offer with this
This will select the latest grade per pupil based on SchoolYear and Term
select * from (
select p.*, g.schoolyear, g.term,
row_number() over (partition by PupilID order by SchoolYear desc, Term desc) rn
from pupils p
join grades g on g.PupilID = p.PupilID
) t1 where rn = 1
try this
declare varSchoolYear int
declare vartrem int
set varSchoolYear=(select max (SchoolYear) from Grade)
set vartrem=(select max(term) from Pupil where SchoolYear=varSchoolYear)
select a.firstname,b.idgrade
from pupil a
inner join grade b
on a.pupilid = b.pupilid
where b.term=vartrem and b.SchoolYear=varSchoolYear