Fill missing dates in PostgreSQL with zero - sql

I've a query like this in PostgreSQL:
select count(id_student) students, date_beginning_course from
data.sessions_courses
left join my_schema.students on id_session_course=id_sesion
where course_name='First course'
group by date_beginning_course
What I obtain with this query is the number of students that have attended a session of "First course" in several dates, for example:
Students Date_beginning_course
____________________________________
5 2019-06-26
1 2019-06-28
5 2019-06-30
6 2019-07-01
2 2019-07-02
I'd like to fill this table with the missing date values, and, for each missing value, assign a '0' in Students column, because there are no students for this date. Example:
Students Date_beginning_course
____________________________________
5 2019-06-26
0 2019-06-27 <--new row
1 2019-06-28
0 2019-06-29 <--new row
5 2019-06-30
6 2019-07-01
2 2019-07-02
Could you help me? Thanks! :)

You could generate a list of dates with the handy Postgres set-returning function generate_series() and LEFT JOIN it with the sessions_courses and students table:
SELECT
COUNT(s.id_student) students,
d.dt
FROM
(
SELECT dt::date
FROM generate_series('2019-06-26', '2019-07-02', '1 day'::interval) dt
) d
LEFT JOIN data.sessions_courses c
ON c.date_beginning_course = d.dt
AND c.course_name='First course'
LEFT JOIN my_schema.students s
ON s.id_session_course = c.id_session
GROUP BY d.dt
You can change the date range by modifying the first two parameters of generate_series().
NB: it is a general good practive to index the column names in the query with the relevant table names (or table alias), so it is explicit to which table each column belongs. I changed your query accordingly, and had to make a few assumptions, that you might need to adapt.

Related

Selecting data from two different tables with case statement

I have 2 tables.
tab1:
dates.
2021-09-30
2021-10-01
2021-10-02
2021-10-04
2021-11-15
buckets:
bucket_dates
2021-10-01.
2021-10-02.
2021-10-03.
2021-11-03.
I want to join these two table to get final result like below. (If there is a matching date then it will match and if there is no matching date, then it should read the next highest date from tab1 table. e.g. 2021-10-03 & 2021-11-03 dates).
Result table:
bucket_dates
final_dates
2021-10-01.
2021-10-01
2021-10-02.
2021-10-02
2021-10-03.
2021-10-04
2021-11-03.
2021-11-15
I tried to do this by using join query
select a.bucket_dates,
case when b.dates is null then (select min(c.dates) from tab1 c where c.dates > a.bucket_dates)
else b.dates end as final_dates
from buckets a left join tab1 b
on a.bucket_dates = b.dates;
but this query is giving below error
Correlated column is not allowed in a non-equality predicate
Any suggestion will be really helpful.
If pyspark won't allow > or >= in a correlated sub-query, just use the MIN() in the main query...
SELECT
b.bucket_dates,
MIN(t.dates) AS final_dates
FROM
buckets AS b
LEFT JOIN
tab1 AS t
ON t.dates >= b.bucket_dates
GROUP BY
b.bucket_dates

Finding a min() date for one column and then using this to join with other tables that have a date LESS than this date

In short, I have two tables:
(1) pharmacy_claims (columns: user_id, date_service, claim_id, record_id, prescription)
(2) medical_claims (columns: user_id, date_service, provider, npi, cost)
I want to find user_id's in (1) that have a certain prescription value, find their earliest date_service (e.g. min(date_service)) and then use these user_id's with their earliest date of service as a cohort to pull all of their associated data from (2). Basically I want to find all of their medical_claims data PRIOR to the first time they were prescribed a given prescription in pharmacy_claims.
pharmacy_claims looks something like this:
user_id | prescription | date_service
1 a 2018-05-01
1 a 2018-02-11
1 a 2019-10-11
1 b 2018-07-12
2 a 2019-01-02
2 a 2019-03-10
2 c 2018-04-11
3 c 2019-05-26
So for instance, if I was interested in prescription = 'a', I would only want user_id 1 and 2 returned, with dates 2018-02-11 and 2019-01-02, respectively. Then I would want to pull user_id 1 and 2 from the medical_claims, and get all of their data PRIOR to these respective dates.
The way I tried to go about this was to build out a temp table in the pharmacy_claims table to query the user_id's that have a given medication, and then left join this back to the table to create a cohort of user_id's with a date_service
Here's what I did:
(1) Pulled all of the relevant data from the main pharmacy claims table:
CREATE TABLE user.temp_pharmacy_claims AS
SELECT user_id, claim_id, record_id, date_service
FROM dw.pharmacyclaims
WHERE date_service between '2018-01-01' and '2019-08-31'
This results in ~50,000 user_id's
(2) Created a table with just the user_id's a min(date_service):
CREATE TABLE user.temp_pharmacy_claims_index AS
SELECT distinct user_id, min(date_service) AS Min_Date
FROM user.temp_pharmacy_claims
GROUP BY 1
(3) Created a final table (to get the desired cohort):
CREATE TABLE user.temp_pharmacy_claims_final_index AS
SELECT a.userid
FROM user.temp_pharmacy_claims a
LEFT JOIN user.temp_pharmacy_claims_index b
ON a.user = b.user
WHERE a.date_service < Min_Date
However, this gets me 0 results when there should be a few thousand. Is this set up correctly? It's probably not the most efficient approach, but it looks sound to me, so not sure what's going on.
I think you just want a correlated subquery:
select mc.*
from medical_claims mc
where mc.date_service < (select min(pc.date)
from pharmacy_claims pc
where pc.user_id = mc.user_id and
pc.prescription = ?
);

Create row_num column in Access SQL, order by field first

I am trying to write a simple query that will take a table, order by the date field of that table and add a column that includes a row count. This is the easiest thing in T-SQL, but Access does not support the Row_Num() function.
So, let's say my "Dates" table looks like this:
ID Date
1 02/01/2017
2 02/03/2017
3 01/27/2017
4 02/05/2017
5 02/01/2017
6 02/03/2017
And the result of my Access query should look like this:
ID Date RowNum
3 01/27/2017 1
1 02/01/2017 2
5 02/01/2017 3
2 02/03/2017 4
6 02/03/2017 5
4 02/05/2017 6
I have tried to find an answer to this question, but all the answers I have found seem to rely on the difference in the values of the ID field from one row to the next. So then I tried to apply the concepts I found (creating a column with a dcount where A.ID > ID) to the Date field, but then I get a count per date. But I need a count for every single date, even if there might be multiple dates that are the same.
Thanks in advance
One method is a correlated subquery:
select d.*,
(select count(*) from dates as d2 where d2.date <= d.date) as rownum
from dates as d
order by d.date;
This is not very efficient, but on a small table it does accomplish what you want. The simplest way, though, is probably to use a cursor over the table.
This assumes that the dates are distinct, as in the example data in the question.
EDIT:
On closer inspection, the dates are not unique. So you can use multiple conditions:
select d.*,
(select count(*)
from dates as d2
where d2.date < d.date or
(d2.date = d.date and d2.id <= d.id)
) as rownum
from dates as d
order by d.date;

how to perform date calculations from different tables?

Please forgive me if this is a basic question, I'm a beginner in SQL and need some help performing date calculations from 2 tables in SQL.
I have two tables (patient and chd) they look like this:
Patient:
ID|Age|date |Alive
--------------------------
1 50 01/09/2013 Y
2 52 11/05/2015 N
3 19 20/07/2016 N
CHD:
ID|Age|indexdate
--------------------
1 50 01/08/2012
2 52 11/11/2013
3 19 10/07/2015
The patient table contains about 500,000 records from 2010-2016 and the CHD table contains about 350,000 records from 2012-2013. What I want to do is see how many CHD patients have died from 2012-2016, and if they have died has 12months passed?
I'm not sure how to do this but I know a join is needed on the ID and we set the where condition with alive as NOT 'Y'
The final output should look like this based on the sample above:
ID|Age|indexdate| deathdate
---------------------------
2 52 11/11/2013 11/05/2015
3 19 10/07/2016 20/07/2016
Any questions let me know!
EDIT: just to make it clear, patients can appear multiple times in the patient table until they die.
Thanks
Let me assume that this query gets the date of death from the patient table:
select p.id, min(p.date) as deathdate
from patient p
where p.Alive = 'N'
group by p.id;
Then, you can get what you want with a join:
select count(*)
from chd c join
(select p.id, min(p.date) as deathdate
from patient p
where p.Alive = 'N'
group by p.id
) pd
on c.id = pd.id;
You can then address your questions with a where clause in the outer query. For instance:
where deathdate >= current_date - interval '1 year'

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.