SQL Queries in Oracle - sql

I have created a database of a hospital and "management would like to know how many people got diagnosed with cancer in the last year".
CREATE TABLE patients (
ID_patients INTEGER NOT NULL,
Name VARCHAR NOT NULL
);
CREATE TABLE visit(
ID_visit INTEGER NOT NULL,
DATE_visit DATE NOT NULL,
FK_patients INTEGER NOT NULL,
);
CREATE TABLE Diagnosis(
ID_Diagnosis INTEGER NOT NULL,
FK_disease INTEGER NOT NULL
FK_visit INTEGER
);
CREATE TABLE Disease(
ID_disease INTEGER NOT NULL,
Name_disease VARCHAR NOT NULL
);
Now we need to find out: which patients got diagnosed with cancer last year.
I used query below to get patients that have visited last year, but I do not know how to target those with cancer ? I think I should use "VIEW AS" but I'm not sure.
SELECT *
FROM Visit
WHERE Date_Visit BETWEEN
(CURRENT_DATE - interval '2' year) AND CURRENT_DATE - INTERVAL '1' YEAR;

Assuming you only need a patient count and you already know how to define cancer, you'll want to use a JOIN to connect these tables together:
SELECT COUNT(v.FK_patients)
FROM visit v
JOIN Diagnosis d on d.ID_Diagnosis = v.FK_diagnosis --Here is where you connect the tables
WHERE v.Date_Visit BETWEEN
(CURRENT_DATE - interval '2' year) AND CURRENT_DATE - INTERVAL '1' YEAR
AND FK_disease IN(--Your list of cancer ids);

As illustrated very nicely by Dank. Here we can use some clean DATE function instead of using INTERVAL. The code looks cleaner this way and also we want data for past year so i am assuming you need the data for 01/01/2015 to 12/31/2015. Hope below snippet helps.
SELECT COUNT(v.FK_patients)
FROM visit v
JOIN Diagnosis d on d.ID_Diagnosis = v.FK_diagnosis --Here is where you connect the tables
WHERE v.Date_Visit BETWEEN
TRUNC(ADD_MONTHS(SYSDATE,-12),'YEAR') AND TRUNC(SYSDATE,'YEAR')-1 ;

This should be straight forward I guess...:
select pa.ID_patients, pa.Name
from patients pa, visit vi, Diagnosis dia, Disease dis
where vi.FK_patients = pa.ID_patients
and dia.ID_Diagnosis = vi.FK_diagnosis
and dis.ID_disease = dia.FK_disease
and upper(dis.Name_disease) like '%CANCER%'
Just add your date filtering to it and it should show the desired result...

Related

Quickly querying with effective dates on rows themselves with dates

Suppose I have a table called measurement. This table's purpose to measure a numeric "value" (which itself is calculated from other data) for a "series_id" at a particular "date".
Now let's add effective dating to this table with "effective_start" (inclusive) and "effective_end" (inclusive) fields.
DDL:
CREATE TABLE public.measurement
(
date date NOT NULL,
effective_end date NOT NULL,
effective_start date NOT NULL,
series_id character varying(255) COLLATE pg_catalog."default" NOT NULL,
value numeric,
CONSTRAINT measurement_pkey PRIMARY KEY (date, effective_end, effective_start, series_id)
)
My challenge is to now quickly, and with SQL only (I have Java code and a partial query that solves this), construct a query that results the following:
For all series, at a particular date in time (query parameter), return back the measurement that is the most recent (maximum "date") that was effective at the particular date in time being queried.
My current "all-SQL" solution is a view, combined with a query over the view:
DDL for the view:
CREATE OR REPLACE VIEW public.known_at AS
SELECT o.date,
o.effective_end,
o.effective_start,
o.series_id,
o.value
FROM measurement o
JOIN ( SELECT o_1.series_id,
min(o_1.effective_start) AS effective_start,
o_1.date
FROM measurement o_1
GROUP BY o_1.series_id, o_1.date) x ON o.series_id::text = x.series_id::text AND o.effective_start = x.effective_start AND o.date = x.date
JOIN ( SELECT o_1.series_id,
o_1.effective_start,
max(o_1.date) AS date
FROM measurement o_1
GROUP BY o_1.series_id, o_1.effective_start) y ON x.series_id::text = y.series_id::text AND x.effective_start = y.effective_start AND x.date = y.date
WHERE o.date <= o.effective_start
ORDER BY o.date DESC, o.series_id DESC;
Query:
select k.* from known_at k
inner join (
select
k.series_id,
max(k.date) as date
from known_at k
-- the passed in date here is a parameter as described above
where k.date <= '2020-03-26'
group by k.series_id) as mx
on k.series_id = mx.series_id and k.date = mx.date
order by k.series_id;
Unfortunately, the combination view and query is slow (~400ms) despite btree indices on series_id, date, effective_end, and effective_start. How can I do better?
I think this query should give you the results you want, though without having your dataset it's hard to say what its performance would be like. For this query, I'd recommend a multi-column index on (effective_start, effective_end, series_id, date DESC).
SELECT DISTINCT ON (series_id) *
FROM measurement
WHERE effective_start <= '2020-03-26' -- the passed-in date
AND effective_end >= '2020-03-26' -- the passed-in date
ORDER BY series_id, date DESC;
Explanation: The query filters for rows that include the passed-in date within the effective period, then for each series_id in the filtered rows, the row with the max date is taken.
Also, you may want to consider using a daterange type for the effective dates. Range types come with some useful range operators.

How to assemble cohort using data from 2 separate tables PostgreSQL

I am trying to assemble a cohort of patients who meet a set of certain criteria (using data from 2 different tables). I am trying to create a list of patients who
Have been seen for a drug overdose
Encounter occur after 07-15-1999
Age is between 18 and 35 at the time of the encounter
Every patient in this table must meet all of these conditions. I have created a new table (dcohort) to insert the information for all of these patients. I have already figured out how to determine which patients meet the first two conditions, but am struggling to figure which meet the age condition because age is not a listed element in either of the 2 provided tables. Age must be calculated using the birthdate from one table (patients) and the encounter date from another other table (encounters). I want to know how to go about altering my code below to filter for patients who meet the age requirement. The code I have written thus far is:
CREATE TABLE dcohort (
PATIENT_ID VARCHAR(50) NULL
,ENCOUNTER_ID VARCHAR(50) NULL
,HOSPITAL_ENCOUNTER_DATE DATE NULL
,AGE_AT_VISIT NUMERIC(2,0) NULL
,DEATH_AT_VISIT_IND BIT NULL
,COUNT_CURRENT_MEDS NUMERIC(2,0) NULL
,CURRENT_OPIOID_IND BIT NULL
,READMISSION_90_DAY_IND BIT NULL
,READMISSION_30_DAY_IND BIT NULL
,FIRST_READMISSION_DATE DATE NULL
);
----------
INSERT INTO dcohort (patient_id, encounter_id, hospital_encounter_date, age_at_visit)
SELECT encounters.patient, encounters.encounterid, encounters.start, [placeholder]
FROM encounters
JOIN patients
ON encounters.patient = patients.id
WHERE reasondescription = 'Drug overdose'
AND start > '1999-7-15'
I would be starting with something like this:
SELECT DISTINCT ON (e.patient) e.patient, e.encounterid, e.start,
age(p.birthdate, e.start)
FROM encounters e JOIN
patients p
ON e.patient = p.id
WHERE e.start > '1999-07-15' AND
e.reasondescription = 'Drug overdose' AND
e.start >= p.birthdate + interval '18 year' AND
e.start < p.birthdate + interval '36 year'
ORDER BY e.patient, e.start;
This uses DISTINCT ON to get only one record per patient -- regardless of the number of eligible encounters that the patent has.
Note that this does not use age() to calculate the age. Instead, it uses direct comparisons between the dates. That is usually more accurate.

Postgres: Unable to determine percent of successful events ending in a completed trip

SQL Gurus,
I'm trying to solve this challenging problem as I'm practicing my SQL skills, however I'm stuck and would appreciate if someone could help.
A signup is defined as an event labelled ‘sign_up_success’ within the events table. For each city (‘A’ and ‘B’) and each day of the week, determine the percentage of signups in the first week of 2016 that resulted in completed a trip within 10 hours of the sign up date.
Table Name: trips
Column Name Datatype
id integer
client_id integer (Foreign keyed to
events.rider_id)
driver_id integer
city_id Integer (Foreign keyed to
cities.city_id)
client_rating integer
driver_rating integer
request_at Timestamp with timezone
predicted_eta Integer
actual_eta Integer
status Enum(‘completed’,
‘cancelled_by_driver’, ‘cancelled_by_client’)
Table Name: cities
Column Name Datatype
city_id integer
city_name string
Table Name: events
Column Name Datatype
device_id integer
rider_id integer
city_id integer
event_name Enum(‘sign_up_success’, ‘attempted_sign_up’,
‘sign_up_failure’)
_ts Timestamp with timezone
Tried something on this lines, however its no where near the expected answer:
SELECT *
FROM trips AS trips
LEFT JOIN cities AS cities ON trips.city_id = cities.city_id
LEFT JOIN events AS events ON events.client_id = events.rider_id
WHERE events.event_name = "sign_up_success"
AND Convert(datetime, trips.request_at') <= Convert(datetime, '2016-01-
07' )
AND DATEDIFF(d, Convert(datetime, events._ts), Convert(datetime,
trips.request_at)) < 7 days
AND events.status = "completed
Desired Results look like below:
Monday A x%
Monday B y%
Tuesday A z%
Tuesday A p%
Can someone please help.
First of all, I assume that "trips"."city_id" is mandatory, so I use INNER JOIN instead of LEFT JOIN when joining with cities.
Then, to specify string constants, you need to use single quotes.
There are some other changes in the query -- hope you'll notice them yourself.
Also, the query might fail, since I didn't run it actually (you didn't provide boilerplate SQL unfortunately).
date_trunc() function with 'week' first parameter converts your timestamp to "first day of the corresponding week, time 00:00:00", based on your current timezone settings (see https://www.postgresql.org/docs/current/static/functions-datetime.html).
I used GROUP BY on that value and second "layer" of grouping was city ID.
Also, I used "filter (where ...)" next to count() -- it allows to count only desired rows.
Finally, I used CTE to improve the query's structure and readability.
Let me know if it fails, I'll fix it. In general, this approach must work.
with data as (
select
left(date_trunc('week', t.request_at)::text, 10) as period,
c.city_id,
count(distinct t.id) as trips_count,
count(*) filter (
where
e.event_name = 'sign_up_success'
and e._ts < t.request_at + interval '10 hour'
) as successes_count
from trips as t
join cities as c on t.city_id = c.city_id
left join events as e on t.client_id = e.rider_id and e._ts
where
t.request_at between '2016-01-01' and '2016-01-08'
group by 1, 2
)
select
*,
round(100 * success_count::numeric / trips_count, 2)::text || '%' as ratio_percent
from data
order by period, city_id
;

Next available Date

I have a tubular model that has a standard star schema
On my dim date table there is a column that flags UK holidays
I would like to not included this date if a key chooses a date that has been flagged but the next availble date
I don't have much access to the database to build a function for this as Ive seen others do
Could anyone suggest some Dax or a method of doing this
Thanks so much in advance
sample
You can create a calculated column to get the next working dateKey if date is flagged as non working date. In case date is not flagged the column contains the dateKey value.
Use this DAX expression in the calculated column:
=
IF (
[isDefaultCalendarNonWorkingDay] = 1,
CALCULATE (
MIN ( [dateKey] ),
FILTER (
DimDate,
[dateKey] > EARLIER ( [dateKey] )
&& [isDefaultCalendarNonWorkingDay] = 0
)
),
[dateKey]
)
I've recreated you DimDate table with some sample data:
Let me know if this helps.

Can SQL view have infinite number of rows? (Repeating schedule, each row a day?)

Can I have a view with an infinite number of rows? I don't want to
select all the rows at once, but is it possible to have a view that
represents a repeating weekly schedule, with rows for any date?
I have a database with information about businesses, their hours on
different days of the week. Their names:
# SELECT company_name FROM company;
company_name
--------------------
Acme, Inc.
Amalgamated
...
(47 rows)
Their weekly schedules:
# SELECT days, open_time, close_time
FROM hours JOIN company USING(company_id)
WHERE company_name='Acme, Inc.';
days | open_time | close_time
---------+-----------+-----------
1111100 | 08:30:00 | 17:00:00
0000010 | 09:00:00 | 12:30:00
Another table, not shown, has holidays they're closed.
So I can trivially create a user-defined function in the form of a
stored procedure that takes a particular date as an argument and
returns the business hours of each company:
SELECT company_name,open_time,close_time FROM schedule_for(current_date);
But I want to do it as a table query, in order that any
SQL-compatible host-language library will have no problem
interfacing with it, like this:
SELECT company_name, open_time, close_time
FROM schedule_view
WHERE business_date=current_date;
Relational database theory tells me that tables (relations) are
functions in the sense of being a unique mapping from each
primary key to a row (tuple). Obviously if the WHERE clause on
the above query were omitted it would result in a table (view)
having an infinite number of rows, which would be a practical issue. But
I'm willing to agree never to query such a view without a WHERE
clause that restricts the number of rows.
How can such a view be created (in PostgreSQL)? Or is a view even the way to do what I want?
Update
Here are some more details about my tables. The days of the week are saved as bits, and I select the appropriate row using a bit mask that has a single bit shifted once for each day of the requested week. To wit:
The company table:
# \d company
Table "company"
Column | Type | Modifiers
----------------+------------------------+-----------
company_id | smallint | not null
company_name | character varying(128) | not null
timezone | timezone | not null
The hours table:
# \d hours
Table "hours"
Column | Type | Modifiers
------------+------------------------+-----------
company_id | smallint | not null
days | bit(7) | not null
open_time | time without time zone | not null
close_time | time without time zone | not null
The holiday table:
# \d holiday
Table "holiday"
Column | Type | Modifiers
---------------+----------+-----------
company_id | smallint | not null
month_of_year | smallint | not null
day_of_month | smallint | not null
The function I currently have that does what I want (besides invocation) is defined as:
CREATE FUNCTION schedule_for(requested_date date)
RETURNS table(company_name text, open_time timestamptz, close_time timestamptz)
AS $$
WITH field AS (
/* shift the mask as many bits as the requested day of the week */
SELECT B'1000000' >> (to_char(requested_date,'ID')::int -1) AS day_of_week,
to_char(requested_date, 'MM')::int AS month_of_year,
to_char(requested_date, 'DD')::int AS day_of_month
)
SELECT company_name,
(requested_date+open_time) AT TIME ZONE timezone AS open_time,
(requested_date+close_time) AT TIME ZONE timezone AS close_time
FROM hours INNER JOIN company USING (company_id)
CROSS JOIN field
CROSS JOIN holiday
/* if the bit-mask anded with the DOW is the DOW */
WHERE (hours.days & field.day_of_week) = field.day_of_week
AND NOT EXISTS (SELECT 1
FROM holiday h
WHERE h.company_id = hours.company_id
AND field.month_of_year = h.month_of_year
AND field.day_of_month = h.day_of_month);
$$
LANGUAGE SQL;
So again, my goal is to be able to get today's schedule by doing this:
SELECT open_time, close_time FROM schedule_view
wHERE company='Acme,Inc.' AND requested_date=CURRENT_DATE;
and also be able to get the schedule for any arbitrary date by doing this:
SELECT open_time, close_time FROM schedule_view
WHERE company='Acme, Inc.' AND requested_date=CAST ('2013-11-01' AS date);
I'm assuming this would require creating the view here referred to as schedule_view but maybe I'm mistaken about that. In any event I want to keep any messy SQL code hidden from usage at the command-line-interface and client-language database libraries, as it currently is in the user-defined function I have.
In other words, I just want to invoke the function I already have by passing the argument in a WHERE clause instead of inside parentheses.
You could create a view with infinite rows by using a recursive CTE. But even that needs a starting point and a terminating condition or it will error out.
A more practical approach with set returning functions (SRF):
WITH x AS (SELECT '2013-10-09'::date AS day) -- supply your date
SELECT company_id, x.day + open_time AS open_ts
, x.day + close_time AS close_ts
FROM (
SELECT *, unnest(arr)::bool AS open, generate_subscripts(arr, 1) AS dow
FROM (SELECT *, string_to_array(days::text, NULL) AS arr FROM hours) sub
) sub2
CROSS JOIN x
WHERE open
AND dow = EXTRACT(ISODOW FROM x.day);
-- AND NOT EXISTS (SELECT 1 FROM holiday WHERE holiday = x.day)
-> SQLfiddle demo. (with constant day)
Expanding SRFs side-by-side is generally frowned upon (and for good reason, it's not in the SQL standard and show surprising behavior if the number of elements is not the same). The new feature WITH ORDINALITY in the upcoming Postgres 9.4 will allow cleaner syntax. Consider this related answer on dba.SE or similarly:
PostgreSQL unnest() with element number
I am assuming bit(7) as most effective data type for days. To work with it, I am converting it to an array in the first subquery sub.
Note the difference between ISODOW and DOW as field pattern for EXTRACT().
Updated question
Your function looks good, except for this line:
CROSS JOIN holiday
Otherwise, if I take the bit-shifting route, I end up with a similar query:
WITH x AS (SELECT '2013-10-09'::date AS day) -- supply your date
,y AS (SELECT day, B'1000000' >> (EXTRACT(ISODOW FROM day)::int - 1) AS dow
FROM x)
SELECT c.company_name, y.day + open_time AT TIME ZONE c.timezone AS open_ts
, y.day + close_time AT TIME ZONE c.timezone AS close_ts
FROM hours h
JOIN company c USING (company_id)
CROSS JOIN y
WHERE h.days & y.dow = y.dow;
AND NOT EXISTS ...
EXTRACT(ISODOW FROM requested_date)::int is just a faster equivalent of to_char(requested_date,'ID')::int
"Pass" day in WHERE clause?
To make that work you would have to generate a huge temporary table covering all possible days before selecting rows for the day in the WHERE clause. Possible (I would employ generate_series()), but very expensive.
My answer to your first draft is a smaller version of this: I expand all rows only for a pattern week before selecting the day matching the date in the WHERE clause. The tricky part is to display timestamps built from the input in the WHERE clause. Not possible. You are back to the huge table covering all days. Unless you have only few companies and a decently small date range, I would not go there.
This is built off the previous answers.
The sample data:
CREATE temp TABLE company (company_id int, company text);
INSERT INTO company VALUES
(1, 'Acme, Inc.')
,(2, 'Amalgamated');
CREATE temp TABLE hours(company_id int, days bit(7), open_time time, close_time time);
INSERT INTO hours VALUES
(1, '1111100', '08:30:00', '17:00:00')
,(2, '0000010', '09:00:00', '12:30:00');
create temp table holidays(company_id int, month_of_year int, day_of_month int);
insert into holidays values
(1, 1, 1),
(2, 1, 1),
(2, 1, 12) -- this was a saturday in 2013
;
First, just matching a date's day of week against the hours table's day of week, using the logic you provided:
select *
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
where (b.days & (B'1000000' >> (to_char(current_date,'ID')::int -1)))
= (B'1000000' >> (to_char(current_date,'ID')::int -1))
;
Postgres lets you create custom operators to simplify expressions like in that where clause, so you might want an operator that matches the day of week between a bit string and a date. First the function that performs the test:
CREATE FUNCTION match_day_of_week(bit, date)
RETURNS boolean
AS $$
select ($1 & (B'1000000' >> (to_char($2,'ID')::int -1))) = (B'1000000' >> (to_char($2,'ID')::int -1))
$$
LANGUAGE sql IMMUTABLE STRICT;
You could stop there make in your where clause look something like "where match_day_of_week(days, some-date)". The custom operator makes this look a little prettier:
CREATE OPERATOR == (
leftarg = bit,
rightarg = date,
procedure = match_day_of_week
);
Now you've got syntax sugar to simplify that predicate. Here I've also added in the next test (that the month_of_year and day_of_month of a holiday don't correspond with the supplied date):
select *
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
where b.days == current_date
and extract(month from current_date) != month_of_year
and extract(day from current_date) != day_of_month
;
For simplicity I start by adding an extra type (another awesome postgres feature) to encapsulate the month and day of the holiday.
create type month_day as (month_of_year int, day_of_month int);
Now repeat the process above to make another custom operator.
CREATE FUNCTION match_day_of_month(month_day, date)
RETURNS boolean
AS $$
select extract(month from $2) = $1.month_of_year
and extract(day from $2) = $1.day_of_month
$$
LANGUAGE sql IMMUTABLE STRICT;
CREATE OPERATOR == (
leftarg = month_day,
rightarg = date,
procedure = match_day_of_month
);
Finally, the original query is reduced to this:
select *
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
where b.days == current_date
and not ((c.month_of_year, c.day_of_month)::month_day == current_date)
;
Reducing that down to a view looks like this:
create view x
as
select b.days,
(c.month_of_year, c.day_of_month)::month_day as holiday,
a.company_id,
b.open_time,
b.close_time
from company a
left join hours b
on a.company_id = b.company_id
left join holidays c
on b.company_id = c.company_id
;
And you could use that like this:
select company_id, open_time, close_time
from x
where days == current_date
and not (holiday == current_date)
;
Edit: You'll need to work on this logic a bit, by the way - this was more about showing the idea of how to do it with custom operators. For starters, if a company has multiple holidays defined you'll likely get multiple results back for that company.
I posted a similar response on PostgreSQL mailing list. Basically, avoiding the use of a function-invocation API in this situation is likely a foolish decision. The function call is the best API for this use-case. If you have a concrete scenario that you need to support where a function will not work then please provide that and maybe that scenario can be solved without having to compromise the PostgreSQL API. All your comments so far are about planning for an unknown future that very well may never come to be.