Time overlaps from Nesting queries - sql

Based on the current schema I have been asked to find
-- people who were untested and exposed to some one infectious
-- Do not list anyone twice and do not list known sick people
-- Exposed = at the same place, and overlap in time (No overlap time needed for simplicity)
From the query below I find my answer except I cannot remove the people who are 'postive' because the second part my query i.e the time lapse depends on the first part i.e the time the positive people went to the same locations.
select * from (
select DISTINCT person.PersonID, Register.LocID, Register.Checkin, Register.CheckOut
from person
join Register on Person.PersonID = Register.PersonID
join testing on person.PersonID = testing.PersonID
where testing.Results is 'Positive' ) a
join (
SELECT DISTINCT Person.PersonID, Register.LocID , Register.Checkin, Register.CheckOut
from person join Register on Person.PersonID = Register.PersonID
where person.PersonID
not in (SELECT DISTINCT testing.PersonID from testing)) b on a.LocID = b.LocID
and b.checkin >= a.CheckIn and b.CheckIn <= a.CheckOut
So my question is, What modification does this query need to show the results of the results of the second part only?
I consider the first part to be
select * from (
select DISTINCT person.PersonID, Register.LocID, Register.Checkin, Register.CheckOut
from person
join Register on Person.PersonID = Register.PersonID
join testing on person.PersonID = testing.PersonID
where testing.Results is 'Positive' ) a
And the second part to be
join (
SELECT DISTINCT Person.PersonID, Register.LocID , Register.Checkin, Register.CheckOut
from person join Register on Person.PersonID = Register.PersonID
where person.PersonID
not in (SELECT DISTINCT testing.PersonID from testing)) b on a.LocID = b.LocID
and b.checkin >= a.CheckIn and b.CheckIn <= a.CheckOut

For readability you can create CTEs like this:
with
-- returns all the untested persons
untested as (select p.* from person p left join testing t on t.personid = p.personid where t.testingid is null),
-- returns all the infected persons
infected as (select * from testing where results = 'Positive'),
-- returns all the locids that infected persons visited and the start and dates of these visits
loc_positive as (
select r.locid, i.timestamp startdate, r.checkout enddate
from register r inner join infected i
on i.personid = r.personid and i.timestamp between r.checkin and r.checkout
)
-- returns the distinct untested persons that visited the same locids with persons tested positive at the same time after they were tested
select distinct u.*
from untested u
inner join register r on r.personid = u.personid
inner join loc_positive lp on lp.locid = r.locid
where lp.startdate <= r.checkout and lp.enddate >= r.checkin

This is a complicated query. Because you do not want duplicates, I am going to suggest exists with the outer query just using persons.
The idea to get people in the same place at the same time is a self-join on register using both location and time overlaps. I think that is the most complex part of the query. The rest is checking if a person is or is not positive:
select p.*
from person p
where not exists (select 1
from testing t
where t.personid = p.personId and t.results = 'positive'
) and
exists (select 1
from register r1 join
register r2
on r1.locid = r2.locid and
r1.checkin < r2.checkout and
r2.checkout > r1.checkin join
testing t2
on r2.personid = t2.personid and
t2.results = 'positive' and
t2.timestamp < r2.checkout
where r1.personid = p.personid
);
The timing is a little tricky, but I think the timing makes sense. Someone needs to test positive before they are in the same place. Of course, you can remove the t2.timestamp < r2.checkout if there is no timing constraint.

The solution to this answer was to add a distinct and a column name to the star in the first line.
select DISTINCT unt.PersonID from (
select person.PersonID, Register.LocID, Register.Checkin, Register.CheckOut
from person join Register on Person.PersonID = Register.PersonID join testing on person.PersonID = testing.PersonID
where testing.Results is 'Positive' ) pos
join (
SELECT Person.PersonID, Register.LocID , Register.Checkin, Register.CheckOut
from person join Register on Person.PersonID = Register.PersonID where person.PersonID
not in (SELECT testing.PersonID from testing)) unt on pos.LocID = unt.LocID
and unt.checkin >= pos.CheckIn and unt.CheckIn <= pos.CheckOut;

Related

Nesting queries

My query from the attached schema is asking me to look for the same location of where the people who tested positive went and were in the same people as the untested people. (Untested means the people not there in the testing table.
--find the same locations of where the positive people and the untested people went
select checkin.LocID, checkin.PersonID
from checkin join testing on checkin.personid = testing.personid
where results = 'Positive'
and (select CheckIn.PersonID
from checkin join testing on checkin.PersonID = testing.PersonID where CheckIn.PersonID
not in (select testing.PersonID from testing));
In my view the query is stating the following
To select a location and person from joining the checking and testing table and the result is positive and to select a person from the check in table who is not there in the testing table.
Since the answer I am getting is zero and I know manually there are people. What am I doing wrong?
I hope this makes sense.
You can get the people tested 'Positive' with this query:
select personid from testing where results = 'Positive'
and the untested people with:
select p.personid
from person p left join testing t
on t.personid = p.personid
where t.testingid is null
You must join to each of these queries a copy of checkin and these copies joined together:
select l.*
from (select personid from testing where results = 'Positive') p
inner join checkin cp on cp.personid = p.personid
inner join checkin cu on cu.lid = cp.lid
inner join (
select p.personid
from person p left join testing t
on t.personid = p.personid
where t.testingid is null
) pu on pu.personid = cu.personid
inner join location l on l.locationid = cu.lid
If what you want are the positive people who are at a location where there is also someone who is not tested, you might consider:
select ch.LocID,
group_concat(case when t.results = 'positive' then ch.PersonID end) as positive_persons
from checkin ch left join
testing t
on ch.personid = t.personid
group by ch.LocId
having sum(case when t.results = 'positive' then 1 else 0 end) > 0 and
count(*) <> count(t.personid); -- at least one person not tested
With this structure, you can get the untested people using:
group_concat(case when t.personid is null then ch.personid)
You have several mistakes (missing exists, independent subquery in exists). I believe that this should do the work
select ch1.LocID, ch1.PersonID
from checkin ch1
join testing t1 on ch1.personid = t1.personid
where results = 'Positive'
and exists (
select 1
from checkin ch2
where ch1.LocID = ch2.LocID and ch2.PersonID not in (
select testing.PersonID
from testing
)
);

Access Subquery On mulitple conditions

This SQL query needs to be done in ACCESS.
I am trying to do a subquery on the total sales, but I want to link the sale to the province AND to product. The below query will work with one or the other: (po.product_name = allp.all_products) AND (p.province = allp.all_province); -- but it will no take both.
I will be including every month into this query, once I can figure out the subquery on with two criteria.
Select
p.province as [Province],
po.product_name as [Product],
all_price
FROM
(purchase_order po
INNER JOIN person p
on p.person_id = po.person_id)
left join
(
select
po1.product_name AS [all_products],
sum(pp1.price) AS [all_price],
p1.province AS [all_province]
from (purchase_order po1
INNER JOIN product pp1
on po1.product_name = pp1.product_name)
INNER JOIN person p1
on po1.person_id = p1.person_id
group by po1.product_name, pp1.price, p1.province
)
as allp
on (po.product_name = allp.all_products) AND (p.province = allp.all_province);
Make the first select sql into a table by giving it an alias and join table 1 to table 2. I don't have your table structure or data to test it but I think this will lead you down the right path:
select table1.*, table2.*
from
(Select
p.province as [Province],
po.product_name as [Product]
--removed this ,all_price
FROM
(purchase_order po
INNER JOIN person p
on p.person_id = po.person_id) table1
left join
(
select
po1.product_name AS [all_products],
sum(pp1.price) AS [all_price],
p1.province AS [all_province]
from (purchase_order po1
INNER JOIN product pp1
on po1.product_name = pp1.product_name)
INNER JOIN person p1
on po1.person_id = p1.person_id
group by po1.product_name, pp1.price, p1.province --check your group by, I dont think you want pp1.price here if you want to aggregate
) as table2 --changed from allp
on (table1.product = table2.all_products) AND (table1.province = table2.all_province);

Count with subselect really slow in postgres

I have this query:
SELECT c.name, COUNT(t.id)
FROM Cinema c
JOIN CinemaMovie cm ON cm.cinema_id = c.id
JOIN Ticket t ON cm.id = cinema_movie_id
WHERE cm.id IN (
SELECT cm1.id
FROM CinemaMovie cm1
JOIN Movie m1 ON m1.id = cm1.movie_id
JOIN Ticket t1 ON t1.cinema_movie_id = cm1.id
WHERE m1.name = 'Hellboy'
AND t1.time >= timestamp '2019-04-18 00:00:00'
AND t1.time <= timestamp '2019-04-18 23:59:59' )
GROUP BY c.id;
and the problem is that this query runs really slow (more than 1 minute) when the table has like 20 million rows. From what I understand, the problem seems to be the inner query, as it takes a long time. Also, I have all indexes on foreign keys. What am I missing ?
Also note that when I select only by name (I omit the date) everything takes like 10 seconds.
EDIT
What I am trying to do, is count number of tickets for each cinema name, based on movie name and the timestamp on ticket.
I don't understand why you are using a subquery. Does this do what you want?
SELECT c.name, COUNT(t.id)
FROM Cinema c JOIN
CinemaMovie cm
ON cm.cinema_id = c.id JOIN
Ticket t
ON cm.id = cinema_movie_id JOIN
Movie m
ON m.id = cm.movie_id
WHERE m.name = 'Hellboy' AND
t.time >= '2019-04-18'::timestamp and
t.time < '2019-04-19'::timestamp
GROUP BY c.id, c.name;

sql subquery join group by

I am trying to get a list of our users from our database along with the number of people from the same cohort as them - which in this case is defined as being from the same medical school at the same time.
medical_school_id is stored in the doctor_record table
graduation_dt is stored in the doctor_record table as well.
I have managed to write this query out using a subquery which does a select statement counting the number of others for each row but this takes forever. My logic is telling me that I ought to run a simple GROUP BY query once first and then somehow JOIN the medical_school_id on to that.
The group by query is as follows
select count(ca.id) , cdr.medical_school_id, cdr.graduation_dt
from account ca
LEFT JOIN doctor cd on ca.id = cd.account_id
LEFT JOIN doctor_record cdr on cd.gmc_number = cdr.gmc_number
GROUP BY cdr.medical_school_id, cdr.graduation_dt
The long select query is
select a.id, a.email , dr.medical_school_id,
(select count(ba.id) from account ba
LEFT JOIN doctor bd on ba.id = bd.account_id
LEFT JOIN doctor_record bdr on bd.gmc_number = bdr.gmc_number
WHERE bdr.medical_school_id = dr.medical_school_id AND bdr.graduation_dt = dr.graduation_dt) AS med_count,
from account a
LEFT JOIN doctor d on a.id = d.account_id
LEFT JOIN doctor_record dr on d.gmc_number = dr.gmc_number
If you could push me in the right direction that would be amazing
I think you just want window functions:
select a.id, a.email, dr.medical_school_id, dr.graduation_dt,
count(*) over (partition by dr.medical_school_id, dr.graduation_dt) as cohort_size
from account a left join
doctor d
on a.id = d.account_id left join
doctor_record dr
on d.gmc_number = dr.gmc_number;
Using your same code for group by:
SELECT * FROM (
(
SELECT acc.[id]
, acc.[email]
FROM
account acc
LEFT JOIN
doctor doc
ON
acc.id = doc.account_id
LEFT JOIN
doctor_record doc_rec
ON
doc.gmc_number = doc_rec.gmc_number
) label
LEFT JOIN
(
SELECT count(acco.id)
, doc_reco.medical_school_id
, doc_reco.graduation_dt
FROM
account acco
LEFT JOIN
doctor doct
ON
acco.id = doct.account_id
LEFT JOIN
doctor_record doc_reco
ON
doct.gmc_number = doc_reco.gmc_number
GROUP BY
doc_reco.medical_school_id,
doc_reco.graduation_dt
) count
ON
count.[medical_school_id]=label.[medical_school_id]
AND
count.[graduation_dt]=label.[graduation_date]
)
how about something like this?
select a.doctor_id
, count(*) - 1
from doctor_record a
left join doctor_record b on a.medical_school_id = b.medical_school_id
and a.graduation_dt = b.graduation_dt
group by a.doctor_id
Subtract 1 from the count so that you're not counting the doctor in the "other folks in same cohort" number
I'm defining "same cohort" as "same medical school & graduation date".
I'm unclear on what GMC number is and how it is related. Is it something to do with cohort?

Extra row to SELECT statement, crosstab? Access 2007

I'm working with some biostats people and of course they love SAS. I have a select statement below that works for testing the presence of certain problems a person can have. It's a binary thing so they either do or they don't. If a person has heart problem and a respiratory problem, then their patientID will be listed twice. How can I add an extra column of a 1 or 0 for every morbidity? So, if I have three problems and they are "HEART", "LUNG" and "UTI", an extra column would be generated that has a 1 or 0 based on the presence of that a person had that problem or not.
I suppose I can use Excel to make it a crosstab, but eventually it will need to be in that format. Below is my SELECT statement. Thanks, folks!
EDITED:
TRANSFORM First(Person.PersonID) AS Morbidity
SELECT Person.PersonID, Person.Age, Person.Sex
FROM tblKentuckyCounties INNER JOIN ((tblComorbidity INNER JOIN comorbidVisits ON tblComorbidity.ID = comorbidVisits.comorbidFK) INNER JOIN (Person INNER JOIN tblComorbidityPerson ON Person.PersonID = tblComorbidityPerson.personID) ON tblComorbidity.ID = tblComorbidityPerson.comorbidityFK) ON tblKentuckyCounties.ID = Person.County
WHERE (((tblComorbidity.comorbidityexplanation)="anxiety and depression" Or (tblComorbidity.comorbidityexplanation)="heart" Or (tblComorbidity.comorbidityexplanation)="hypertension" Or (tblComorbidity.comorbidityexplanation)="pressure sores" Or (tblComorbidity.comorbidityexplanation)="tobacco" Or (tblComorbidity.comorbidityexplanation)="uti"))
GROUP BY Person.PersonID, Person.Age, Person.Sex, tblComorbidity.comorbidityexplanation
PIVOT Person.Race;
This is not tested:
TRANSFORM IIf([c.comorbidityexplanation]=
[c.comorbidityexplanation],1,0) AS Morbidity
SELECT p.PersonID, p.Age, p.Sex, p.Race
FROM tblKentuckyCounties kc
INNER JOIN ((tblComorbidity c
INNER JOIN comorbidVisits cv
ON c.ID = cv.comorbidFK)
INNER JOIN (Person p
INNER JOIN tblComorbidityPerson cp
ON p.PersonID = cp.personID)
ON c.ID = cp.comorbidityFK)
ON kc.ID = p.County
GROUP BY p.PersonID, p.Age, p.Sex, p.Race
PIVOT c.comorbidityexplanation