Selecting data from two different tables with case statement - sql

I have 2 tables.
tab1:
dates.
2021-09-30
2021-10-01
2021-10-02
2021-10-04
2021-11-15
buckets:
bucket_dates
2021-10-01.
2021-10-02.
2021-10-03.
2021-11-03.
I want to join these two table to get final result like below. (If there is a matching date then it will match and if there is no matching date, then it should read the next highest date from tab1 table. e.g. 2021-10-03 & 2021-11-03 dates).
Result table:
bucket_dates
final_dates
2021-10-01.
2021-10-01
2021-10-02.
2021-10-02
2021-10-03.
2021-10-04
2021-11-03.
2021-11-15
I tried to do this by using join query
select a.bucket_dates,
case when b.dates is null then (select min(c.dates) from tab1 c where c.dates > a.bucket_dates)
else b.dates end as final_dates
from buckets a left join tab1 b
on a.bucket_dates = b.dates;
but this query is giving below error
Correlated column is not allowed in a non-equality predicate
Any suggestion will be really helpful.

If pyspark won't allow > or >= in a correlated sub-query, just use the MIN() in the main query...
SELECT
b.bucket_dates,
MIN(t.dates) AS final_dates
FROM
buckets AS b
LEFT JOIN
tab1 AS t
ON t.dates >= b.bucket_dates
GROUP BY
b.bucket_dates

Related

SQL query on denormalized tables

I have these two below mentioned denormalized tables with out any data constraints. Records_audit will not have duplicate audit_id based rows though table doesn't have any constraints.
I will need SQL query to extract all fields of records_audit with an addtional matching column refgroup_Name from second table using matching condition of AuditID from both tables, printedCount greater than 1 and R_status as 'Y'. I tried to do with left join but it is selecting all records.
Can you help to correct my query? I tried with this below query but its selecting all unwanted from second table:
SELECT a.*, d.refgroup_Name
from Records_audit a
left join Patients_audit d ON ( (a.AUDITID=d.AUDITID )
and (a.printedCount> 1)
AND (a.R_status='Y')
)
ORDER BY 3 DESC
Records_audit:
AuditID
record_id
created_d_t
patient_ID
branch_ID
R_status
printedCount
1
Img77862
2020-02-01 08:40:12.614
xq123
aesop96
Y
2
2
Img87962
2021-02-01 08:40:12.614
xy123
aesop96
Y
1
Patients_audit:
AuditID
dept_name
visited_d_t
patient_ID
branch_ID
emp_No
refgroup_Name
1
Imaging
2020-02-01 11:41:12.614
xq123
aesop96
976581
finnyTown
1
EMR
2020-02-01 12:42:12.614
xq123
aesop96
976581
finnyTown
2
Imaging
2021-02-01 12:40:12.614
xy123
himpo77
976581
georgeTown
2
FrontOffice
2021-02-01 13:41:12.614
xy123
himpo77
976581
georgeTown
2
EMR
2021-02-01 14:42:12.614
xy123
himpo77
976581
georgeTown
A left join will give you all records in the "left" table, that is the from table. Since you have no where clause to constrain the query you're going to get all records in Records_audit.
See Visual Representation of SQL Joins for more about joins.
If your intent is to get all records in Records_audit which have an R_status of Y and a printedCount > 1, put those into a where clause.
select ra.*, pa.refgroup_name
from records_audit ra
left join patients_audit pa on ra.auditId = pa.auditId
where ra.printedCount > 1
and ra.r_status = 'Y'
order by ra.created_d_t desc
This will match all records in Records_audit which match the where clause. The left join ensures they match even if they do not have a matching Patients_audit record.
Other notes:
Your order by 3 relies on the order in which columns were declared in Records_audit. If you mean to order by records_audit.created_d_t write order by a.created_d_t.
If your query is making an assumption about the data, add a constraint to make sure it is true and remains true.

Fill missing dates in PostgreSQL with zero

I've a query like this in PostgreSQL:
select count(id_student) students, date_beginning_course from
data.sessions_courses
left join my_schema.students on id_session_course=id_sesion
where course_name='First course'
group by date_beginning_course
What I obtain with this query is the number of students that have attended a session of "First course" in several dates, for example:
Students Date_beginning_course
____________________________________
5 2019-06-26
1 2019-06-28
5 2019-06-30
6 2019-07-01
2 2019-07-02
I'd like to fill this table with the missing date values, and, for each missing value, assign a '0' in Students column, because there are no students for this date. Example:
Students Date_beginning_course
____________________________________
5 2019-06-26
0 2019-06-27 <--new row
1 2019-06-28
0 2019-06-29 <--new row
5 2019-06-30
6 2019-07-01
2 2019-07-02
Could you help me? Thanks! :)
You could generate a list of dates with the handy Postgres set-returning function generate_series() and LEFT JOIN it with the sessions_courses and students table:
SELECT
COUNT(s.id_student) students,
d.dt
FROM
(
SELECT dt::date
FROM generate_series('2019-06-26', '2019-07-02', '1 day'::interval) dt
) d
LEFT JOIN data.sessions_courses c
ON c.date_beginning_course = d.dt
AND c.course_name='First course'
LEFT JOIN my_schema.students s
ON s.id_session_course = c.id_session
GROUP BY d.dt
You can change the date range by modifying the first two parameters of generate_series().
NB: it is a general good practive to index the column names in the query with the relevant table names (or table alias), so it is explicit to which table each column belongs. I changed your query accordingly, and had to make a few assumptions, that you might need to adapt.

Modify left join clause [duplicate]

This question already has answers here:
Joining tables that compute values between dates
(2 answers)
Closed 4 years ago.
I am trying to write an impala query that does the follows with two tables provided below:
Table A
Date num
01-16 10
02-20 12
03-20 13
Table B contains everyday between 02-20 and 03-20 exclusively, i.e.
Date val
02-21 100
02-22 101
02-23 102
. .
. .
03-19 110
And now we want to calculate everyday the total value between 02-20 and 03-20 exclusively using the A.num of date 02-20(starting date of the period). So for example, the total value of 02-21 is 100*12, 02-22 is 101*12, and 03-19 is 110*12.
I have written the query
SELECT A.Date,A.num*B.val AS total
FROM TableA A
LEFT JOIN Tableb B
ON B.Date >= A.Date
GROUP BY A.Date,A.num,B.val
But it returns me two entries for each day. For instance, on 02-20, it will return 101*12 and 101*10, but I only want 101*12. I have noticed that this is caused by the join on B.Date >= A.Date where 02-21 is indeed greater than 01-16, so it takes both value of num at 01-16 and 02-20 to compute the total value.
Anyone know how should I modify this join clause so it would only use the num on 02-20 only instead of 02-20 and 01-16?
EDIT
Sample output
Date total
02-21 1200
02-22 1212
02-23 1224
. .
. .
03-19 1320
This should work. If need be, change the SUM to either MIN or MAX.
SELECT A.`Date`,SUM(A.`num`*B.`val`) AS `total`
FROM `TableA` A
LEFT JOIN `Tableb` B
ON B.`Date` >= A.`Date`
GROUP BY A.`Date`
This produces the results you need. I didn't see a need for GROUP BY, and you said you only wanted results for the date '02-20', so I just added a WHERE and changed the SELECT to grab Table B's date.
SELECT
B.Date,
A.num * B.val AS total
FROM TableA A
LEFT JOIN Tableb B ON B.Date >= A.Date
WHERE A.Date = '02-20'
NOTE : You should avoid using reserved keywords as table/column name.
Well here you go..
SELECT b.`date`,max(a.`num`*b.`val`) AS `total`
FROM test a
LEFT JOIN `test2` b
ON b.`date` >= a.`date`
WHERE b.date is not null
GROUP BY b.`dates`;
sqlfiddle : http://www.sqlfiddle.com/#!9/55c0e1/1

divide column from one table to column from another table

I've two tables
TB_1 TB_3
Month Total Month Total
2012-01 6 2012-01 12
2013-02 6 2013-02 12
2014-03 10 2014-03 20
2015-04 10 2015-04 20
In result table I need follow result:
RESULT_TB
Month Total
2012-01 2
2013-02 2
2014-03 2
2015-04 2
I tried the following:
Select TB_3.total / TB_1.total
From TB_3, TB_1
But it does not work, tell me please, how to do?
Try this:
select t1.month,
case
when t1.total = 0
then null
else t2.total / t1.total
end total
from tb_1 t1
join tb_3 t2 on t1.month = t2.month
I used CASE to ensure if there is zero total in tb_1 table, the output comes as NULL instead of throwing a divide by zero error.
You only seem to need a proper JOIN condition:
Select t3.month, t3.total / t1.total
From TB_3 t3 JOIN
TB_1 t1
ON t3.month = t1.month;
Note: If the tables have different sets of months, then you might want some sort of outer join.
Also, Postgres does integer division when both operands are integers. It is unclear whether you want that or not. But if the values were "15" and "2", then the result would be "7" rather than "7.5".
You can try this code:
SELECT ISNULL(T1.Month,T2.Month) AS Month
, CASE WHEN ISNULL(T1.Total,0) = 0 THEN ISNULL(T2.Total,0)
ELSE ISNULL(T2.Total,0)/ISNULL(T1.Total,0)
END AS Total
FROM TB_1 AS T1
FULL JOIN TB_2 AS T2 ON T1.Month = T2.Month
I have used FULL JOIN to catch all dates from both tables. I have checked for all NULL values and replaced the NULLs by divisible values.
For matching months only, use this -
select TB_1.month as month
,TB_3.total / nullif(TB_1.total,0) as total
from TB_1
join TB_3
on TB_3.month = TB_1.month
order by month
If you wish to also display results for nonmatching months, use this -
select coalesce(TB_1.month,TB_3.month) as month
,TB_3.total / nullif(TB_1.total,0) as total
from TB_1
full join TB_3
on TB_3.month = TB_1.month
order by month
nullif(X,0) is a clean way to express-
Return X unless it is equal to 0, where in that case return a NULL
If the total column are of integer type, replace TB_3.total with TB_3.total::numeric(12,2) (or other precision as you like)

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.