How to assemble cohort using data from 2 separate tables PostgreSQL - sql

I am trying to assemble a cohort of patients who meet a set of certain criteria (using data from 2 different tables). I am trying to create a list of patients who
Have been seen for a drug overdose
Encounter occur after 07-15-1999
Age is between 18 and 35 at the time of the encounter
Every patient in this table must meet all of these conditions. I have created a new table (dcohort) to insert the information for all of these patients. I have already figured out how to determine which patients meet the first two conditions, but am struggling to figure which meet the age condition because age is not a listed element in either of the 2 provided tables. Age must be calculated using the birthdate from one table (patients) and the encounter date from another other table (encounters). I want to know how to go about altering my code below to filter for patients who meet the age requirement. The code I have written thus far is:
CREATE TABLE dcohort (
PATIENT_ID VARCHAR(50) NULL
,ENCOUNTER_ID VARCHAR(50) NULL
,HOSPITAL_ENCOUNTER_DATE DATE NULL
,AGE_AT_VISIT NUMERIC(2,0) NULL
,DEATH_AT_VISIT_IND BIT NULL
,COUNT_CURRENT_MEDS NUMERIC(2,0) NULL
,CURRENT_OPIOID_IND BIT NULL
,READMISSION_90_DAY_IND BIT NULL
,READMISSION_30_DAY_IND BIT NULL
,FIRST_READMISSION_DATE DATE NULL
);
----------
INSERT INTO dcohort (patient_id, encounter_id, hospital_encounter_date, age_at_visit)
SELECT encounters.patient, encounters.encounterid, encounters.start, [placeholder]
FROM encounters
JOIN patients
ON encounters.patient = patients.id
WHERE reasondescription = 'Drug overdose'
AND start > '1999-7-15'

I would be starting with something like this:
SELECT DISTINCT ON (e.patient) e.patient, e.encounterid, e.start,
age(p.birthdate, e.start)
FROM encounters e JOIN
patients p
ON e.patient = p.id
WHERE e.start > '1999-07-15' AND
e.reasondescription = 'Drug overdose' AND
e.start >= p.birthdate + interval '18 year' AND
e.start < p.birthdate + interval '36 year'
ORDER BY e.patient, e.start;
This uses DISTINCT ON to get only one record per patient -- regardless of the number of eligible encounters that the patent has.
Note that this does not use age() to calculate the age. Instead, it uses direct comparisons between the dates. That is usually more accurate.

Related

SQL query with summed statistical data, grouped by date

I'm trying to wrap my head around a problem with making a query for a statistical overview of a system.
The table I want to pull data from is called 'Event', and holds the following columns (among others, only the necessary is posted):
date (as timestamp)
positionId (as number)
eventType (as string)
Another table that most likely is necessary is 'Location', with, among others, holds the following columns:
id (as number)
clinic (as boolean)
What I want is a sum of events in different conditions, grouped by days. The user can give an input over the range of days wanted, which means the output should only show a line per day inside the given limits. The columns should be the following:
date: a date, grouping the data by days
deliverySum: A sum of entries for the given day, where eventType is 'sightingDelivered', and the Location with id=posiitonId has clinic=true
pickupSum: Same as deliverySum, but eventType is 'sightingPickup'
rejectedSum: A sum over events for the day, where the positionId is 4000
acceptedSum: Same as rejectedSum, but positionId is 3000
So, one line should show the sums for the given day over the different criteria.
I'm fairly well read in SQL, but my experience is quite low, which lead to me asking here.
Any help would be appreciated
SQL Server has neither timestamps nor booleans, so I'll answer this for MySQL.
select date(date),
sum( e.eventtype = 'sightingDelivered' and l.clinic) as deliverySum,
sum( e.eventtype = 'sightingPickup' and l.clinic) as pickupSum,
sum( e.position_id = 4000 ) as rejectedSum,
sum( e.position_id = 3000 ) as acceptedSum
from event e left join
location l
on e.position_id = l.id
where date >= $date1 and date < $date2 + interval 1 day
group by date(date);

Postgres: Unable to determine percent of successful events ending in a completed trip

SQL Gurus,
I'm trying to solve this challenging problem as I'm practicing my SQL skills, however I'm stuck and would appreciate if someone could help.
A signup is defined as an event labelled ‘sign_up_success’ within the events table. For each city (‘A’ and ‘B’) and each day of the week, determine the percentage of signups in the first week of 2016 that resulted in completed a trip within 10 hours of the sign up date.
Table Name: trips
Column Name Datatype
id integer
client_id integer (Foreign keyed to
events.rider_id)
driver_id integer
city_id Integer (Foreign keyed to
cities.city_id)
client_rating integer
driver_rating integer
request_at Timestamp with timezone
predicted_eta Integer
actual_eta Integer
status Enum(‘completed’,
‘cancelled_by_driver’, ‘cancelled_by_client’)
Table Name: cities
Column Name Datatype
city_id integer
city_name string
Table Name: events
Column Name Datatype
device_id integer
rider_id integer
city_id integer
event_name Enum(‘sign_up_success’, ‘attempted_sign_up’,
‘sign_up_failure’)
_ts Timestamp with timezone
Tried something on this lines, however its no where near the expected answer:
SELECT *
FROM trips AS trips
LEFT JOIN cities AS cities ON trips.city_id = cities.city_id
LEFT JOIN events AS events ON events.client_id = events.rider_id
WHERE events.event_name = "sign_up_success"
AND Convert(datetime, trips.request_at') <= Convert(datetime, '2016-01-
07' )
AND DATEDIFF(d, Convert(datetime, events._ts), Convert(datetime,
trips.request_at)) < 7 days
AND events.status = "completed
Desired Results look like below:
Monday A x%
Monday B y%
Tuesday A z%
Tuesday A p%
Can someone please help.
First of all, I assume that "trips"."city_id" is mandatory, so I use INNER JOIN instead of LEFT JOIN when joining with cities.
Then, to specify string constants, you need to use single quotes.
There are some other changes in the query -- hope you'll notice them yourself.
Also, the query might fail, since I didn't run it actually (you didn't provide boilerplate SQL unfortunately).
date_trunc() function with 'week' first parameter converts your timestamp to "first day of the corresponding week, time 00:00:00", based on your current timezone settings (see https://www.postgresql.org/docs/current/static/functions-datetime.html).
I used GROUP BY on that value and second "layer" of grouping was city ID.
Also, I used "filter (where ...)" next to count() -- it allows to count only desired rows.
Finally, I used CTE to improve the query's structure and readability.
Let me know if it fails, I'll fix it. In general, this approach must work.
with data as (
select
left(date_trunc('week', t.request_at)::text, 10) as period,
c.city_id,
count(distinct t.id) as trips_count,
count(*) filter (
where
e.event_name = 'sign_up_success'
and e._ts < t.request_at + interval '10 hour'
) as successes_count
from trips as t
join cities as c on t.city_id = c.city_id
left join events as e on t.client_id = e.rider_id and e._ts
where
t.request_at between '2016-01-01' and '2016-01-08'
group by 1, 2
)
select
*,
round(100 * success_count::numeric / trips_count, 2)::text || '%' as ratio_percent
from data
order by period, city_id
;

SQL Queries in Oracle

I have created a database of a hospital and "management would like to know how many people got diagnosed with cancer in the last year".
CREATE TABLE patients (
ID_patients INTEGER NOT NULL,
Name VARCHAR NOT NULL
);
CREATE TABLE visit(
ID_visit INTEGER NOT NULL,
DATE_visit DATE NOT NULL,
FK_patients INTEGER NOT NULL,
);
CREATE TABLE Diagnosis(
ID_Diagnosis INTEGER NOT NULL,
FK_disease INTEGER NOT NULL
FK_visit INTEGER
);
CREATE TABLE Disease(
ID_disease INTEGER NOT NULL,
Name_disease VARCHAR NOT NULL
);
Now we need to find out: which patients got diagnosed with cancer last year.
I used query below to get patients that have visited last year, but I do not know how to target those with cancer ? I think I should use "VIEW AS" but I'm not sure.
SELECT *
FROM Visit
WHERE Date_Visit BETWEEN
(CURRENT_DATE - interval '2' year) AND CURRENT_DATE - INTERVAL '1' YEAR;
Assuming you only need a patient count and you already know how to define cancer, you'll want to use a JOIN to connect these tables together:
SELECT COUNT(v.FK_patients)
FROM visit v
JOIN Diagnosis d on d.ID_Diagnosis = v.FK_diagnosis --Here is where you connect the tables
WHERE v.Date_Visit BETWEEN
(CURRENT_DATE - interval '2' year) AND CURRENT_DATE - INTERVAL '1' YEAR
AND FK_disease IN(--Your list of cancer ids);
As illustrated very nicely by Dank. Here we can use some clean DATE function instead of using INTERVAL. The code looks cleaner this way and also we want data for past year so i am assuming you need the data for 01/01/2015 to 12/31/2015. Hope below snippet helps.
SELECT COUNT(v.FK_patients)
FROM visit v
JOIN Diagnosis d on d.ID_Diagnosis = v.FK_diagnosis --Here is where you connect the tables
WHERE v.Date_Visit BETWEEN
TRUNC(ADD_MONTHS(SYSDATE,-12),'YEAR') AND TRUNC(SYSDATE,'YEAR')-1 ;
This should be straight forward I guess...:
select pa.ID_patients, pa.Name
from patients pa, visit vi, Diagnosis dia, Disease dis
where vi.FK_patients = pa.ID_patients
and dia.ID_Diagnosis = vi.FK_diagnosis
and dis.ID_disease = dia.FK_disease
and upper(dis.Name_disease) like '%CANCER%'
Just add your date filtering to it and it should show the desired result...

SQL Server - Need to obtain duplicate records based on mutiple criteria of the same column

I work with a huge dataset of hospital activity records. Each record represents something done on behalf of a patient. My focus is on patients that have experienced 'outpatient' activity, such as attended an appointment or clinic.
In the data, we get records that are duplicates in that; a patient is shown to have attended their first out patient appointment more than once in a six month period. This is an error on the part of the hospital who send their data. We have to identify these records to send back as challenges.
I have the following SQL statement which is finding records where the 'Patient Code' appears more than once.
SELECT * FROM dbo.Z_ForQueries a
JOIN (SELECT PatientCode
FROM dbo.Z_ForQueries
GROUP BY PatientCode
HAVING COUNT (*) > 1 ) b
ON a.PatientCode = b.PatientCode
WHERE [Multiple OPFA in month] = 'y'
I cannot for the life of me figure out how to syntax the next bit; For each set of duplicated patient codes, I only want to see the records where one of the records has a 'Month' of 7 (that's the just the current month I'm working on). If non of the groups of duplicated records have '7' in the month, then I don't need to see them.
For example, patient code L000066715 has 4 records, I can see that each record represents the same initial outpatient appointment in the same hospital speciality. Obviously you can only 'first attend' once. Each record has a month number; 3,4,6 & 7. Because this patient code has one of their duplicate records in month 7, I need it to be returned in the results along with the other 3 records.
Other patient codes exist in duplicate but none of their records are from month 7, so they don't need to be returned.
I hope I've set the scene properly for some help! Thanks.
Something like this should work:
SELECT *
FROM dbo.Z_ForQueries a
JOIN (
SELECT PatientCode,
MAX(CASE WHEN MONTH(dateColumn) = 7 THEN 1 ELSE 0 END) As InMonth
FROM dbo.Z_ForQueries
GROUP BY PatientCode
HAVING COUNT (*) > 1
) b ON a.PatientCode = b.PatientCode
And InMonth = 1
WHERE [Multiple OPFA in month] = 'y'
Explanation:
The CASE expression returns 1 for rows where Month=7, and 0 in all other cases. The MAX(..) around this CASE expressions thus returns 1 if any rows in the GROUP had a Month=7 and a 0 only if none of them did.

Window moving average in sql server

I am trying to create a function that computes a windowed moving average in SQLServer 2008. I am quite new to SQL so I am having a fair bit of difficulty. The data that I am trying to perform the moving average on needs to be grouped by day (it is all timestamped data) and then a variable moving average window needs to be applied to it.
I already have a function that groups the data by day (and #id) which is shown at the bottom. I have a few questions:
Would it be better to call the grouping function inside the moving average function or should I do it all at once?
Is it possible to get the moving average for the dates input into the function, but go back n days to begin the moving average so that the first n days of the returned data will not have 0 for their average? (ie. if they want a 7 day moving average from 01-08-2011 to 02-08-2011 that I start the moving average calculation on 01-01-2011 so that the first day they defined has a value?)
I am in the process of looking into how to do the moving average, and know that a moving window seems to be the best option (currentSum = prevSum + todayCount - nthDayAgoCount) / nDays but I am still working on figuring out the SQL implementation of this.
I have a grouping function that looks like this (some variables removed for visibility purposes):
SELECT
'ALL' as GeogType,
CAST(v.AdmissionOn as date) as dtAdmission,
CASE WHEN #id IS NULL THEN 99 ELSE v.ID END,
COUNT(*) as nVisits
FROM dbo.Table1 v INNER JOIN dbo.Table2 t ON v.FSLDU = t.FSLDU5
WHERE v.AdmissionOn >= '01-01-2010' AND v.AdmissionOn < DATEADD(day,1,'02-01-2010')
AND v.ID = Coalesce(#id,ID)
GROUP BY
CAST(v.AdmissionOn as date),
CASE WHEN #id IS NULL THEN 99 ELSE v.ID END
ORDER BY 2,3,4
Which returns a table like so:
ALL 2010-01-01 1 103
ALL 2010-01-02 1 114
ALL 2010-01-03 1 86
ALL 2010-01-04 1 88
ALL 2010-01-05 1 84
ALL 2010-01-06 1 87
ALL 2010-01-07 1 82
EDIT: To answer the first question I asked:
I ended up creating a function which declared a temporary table and inserted the results from the count function into it, then used the example from user662852 to compute the moving average.
Take the hardcoded date range out of your query. Write the output (like your sample at the end) to a temp table (I called it #visits below).
Try this self join to the temp table:
Select list.dtadmission
, AVG(data.nvisits) as Avg
, SUM(data.nvisits) as sum
, COUNT(data.nvisits) as RollingDayCount
, MIN(data.dtadmission) as Verifymindate
, MAX(data.dtadmission) as Verifymaxdate
from #visits as list
inner join #visits as data
on list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission) group by list.dtadmission
EDIT: I didn't have enough room in Comments to say this in response to your question:
My join is "kinda cartesian" because it uses a between in the join constraint. Each record in list is going up against every other record, and then I want the ones where the date I report is between a lower bound of (-7) days and today. Every data date is available to list date, this is the key to your question. I could have written the join condition as
list.dtadmission between DATEADD(DD,-6,data.dtadmission) and data.dtadmission
But what really happened was I tested it as
list.dtadmission between DATEADD(DD,6,data.dtadmission) and data.dtadmission
Which returns no records because the syntax is "Between LOW and HIGH". I facepalmed on 0 records and swapped the arguments, that's all.
Try the following, see what I mean: This is the cartesian join for just one listdate:
SELECT
list.[dtAdmission] as listdate
,data.[dtAdmission] as datadate
,data.nVisits as datadata
,DATEADD(dd,6,list.dtadmission) as listplus6
,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data
on
1=1
where list.dtAdmission = '5-Jan-2011'
Compare this to the actual join condition
SELECT
list.[dtAdmission] as listdate
,data.[dtAdmission] as datadate
,data.nVisits as datadata
,DATEADD(dd,6,list.dtadmission) as listplus6
,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data
on
list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission)
where list.dtAdmission = '5-Jan-2011'
See how list date is between datadate and dataplus6 in all the records?