Write 1 SQL query to combine 3 independent tables with functions: group by and case when - sql

I have 3 tables as follows:
Table school_a with 4 columns:
date: (2020-11-31,...)
note: (20, 30, 40,...)
home: (23.45, 45.34, 65.67, ...)
id: (54326, 87332, ...)
Table school_b with 4 columns:
time: (2020-11-31,...)
grade: (34, 54, 34,...)
homework: (12.32, 34.65,...)
user: ('student', 'professor', 'student',...)
Table school_c with 4 columns:
day: (2020-11-31,...)
number: (34, 54, 34,...)
amount: (10.24 AGE, 11.25 AGE, 12.63 AGE, ...)
title: ('54934-ST-string-student-str.st', '54934-ST-string-teacher-str.st',....)
Sorry for the table presentation but it is not easy to put in a table format.
I created a SQL query to calculate what I need for each table but I do not succeed to combine the 3 queries in one. I cannot figure out the logic that I need to use to combine it.
Here is my SQL code for each table:
SELECT
SUM(home/note) AS kpi,
CASE
WHEN id IN (34564, 87423, 89076, 32145, 87653) THEN 'Student'
WHEN id IN (67543, 87413, 78996, 34215 ) THEN 'Teacher'
ELSE 'Other'
END AS role
FROM school_a
WHERE date >= '2020-08-01' AND date <= '2020-08-31'
GROUP BY role
SELECT
SUM(grade)/COUNT(user) AS kpi,
CASE
WHEN user = 'Student' THEN 'Student'
WHEN user = 'Professor' THEN 'Teacher'
ELSE 'Other'
END AS role
FROM school_b
WHERE time >= '2020-08-01' AND time <= '2020-08-31'
GROUP BY role
SELECT
SUM((REPLACE(amount,' AGE',''))/number) AS kpi,
CASE
WHEN title IN ('41320 - ST-STtr-Student-str.st', '89064 - ST-stRst-str-strr user-strr.str/blablabla/strstr') THEN 'Student'
WHEN title IN ('43789 - ST-STred-Teacher-stee.str', '65283-CH-strstrs-teacher-strr.str--STR') THEN 'Teacher'
ELSE 'Other'
END AS role
FROM school_c
WHERE day >= '2020-08-01' AND day <= '2020-08-31'
GROUP BY role
As you understand I need to measure the kpi for each table that have different columns names and different columns meaning for the full month of August 2020.
When I run separately each query I got what I need now I would like to combine all the 3 queries into one. If I create only one query I got a message such as
Error: ambiguous column name: role
Any feedback to improve my current queries is welcome. Thanks for reading.
----- Edited to clarify the result
The expected result is a table with 2 columns (role and kpi) and 3 rows (Student, Teacher, Other).
Using "union" I got almost what I want: 2 columns (role and kpi) and more than 3 rows as the grouping is school and then role. I want only the role and sum the kpi per role.

Have you tried to simply UNION the different outputs?
SELECT SUM(home/note) AS kpi,
CASE
WHEN id IN (34564, 87423, 89076, 32145, 87653) THEN 'Student'
WHEN id IN (67543, 87413, 78996, 34215 ) THEN 'Teacher'
ELSE 'Other'
END AS role
FROM school_a
WHERE date >= '2020-08-01' AND date <= '2020-08-31'
GROUP BY role
UNION
SELECT SUM(grade)/COUNT(user) AS kpi,
CASE
WHEN user = 'Student' THEN 'Student'
WHEN user = 'Professor' THEN 'Teacher'
ELSE 'Other'
END AS role
FROM school_b
WHERE time >= '2020-08-01' AND time <= '2020-08-31'
GROUP BY role
UNION
SELECT SUM((REPLACE(amount,' AGE',''))/number) AS kpi,
CASE
WHEN title IN ('41320 - ST-STtr-Student-str.st', '89064 - ST-stRst-str-strr user-strr.str/blablabla/strstr') THEN 'Student'
WHEN title IN ('43789 - ST-STred-Teacher-stee.str', '65283-CH-strstrs-teacher-strr.str--STR') THEN 'Teacher'
ELSE 'Other'
END AS role
FROM school_c
WHERE day >= '2020-08-01' AND day <= '2020-08-31'
GROUP BY role
I haven't done any checks in any way but as long as the columns match it shouldn't be a problem.

Related

How to count certain the ages of people who have a log record from another table in sql?

I want to get a count of how many people who are 18 are recorded in the logs table only once. Now if I have the same person who entered 2 times, I can see that there are 2 people with age 18. I can't make it appear only once. How do I do this???
My logs table and people table are connected by card_id.
My logs table has the login date and card_id.
While my members' table has the birthdate and card_id columns.
HERE is the query I made
select
card_id, sum("18") as "18"
from
( select logs.login, members.card_id,
count(distinct (case when 0 <= age and age <= 18 then age end)) as "18",
count( (case when 19 <= age and age <= 30 then age end)) as "30",
count ( (case when 31 <= age and age <= 50 then age end)) as "50"
from
(select login, date_part('year', age(birthdate)) as age, members.card_id as card_id,
logs.login
from members
left join logs on logs.card_id=members.card_id
) as members
left join logs on logs.card_id=members.card_id
group by logs.login, members.card_id
) as members
where login <= '20221029' group by card_id;
I want to create a table like this:
18 | 30 | 50 |
---------------
2 | 0 | 0
Count the distinct card_id-s.
select count(distinct card_id)
from members join logs using (card_id)
where extract('year' from age(birthdate)) = 18
and login <= '20221029';
Unrelated but it seems that you are storing login as text. This is not a good idea. Use type date instead.
Addition afer the question update
select count(*) filter (where user_age = 18) as age_18,
count(*) filter (where user_age between 19 and 30) as age_30,
count(*) filter (where user_age between 31 and 50) as age_50
from
(
select distinct on (card_id)
extract('year' from age(birthdate)) user_age
from members inner join logs using (card_id)
where login <= '20221029'
order by card_id, login desc -- pick the latest login
) AS t;

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

Using two Select statements in one query to bring back desired data set

Please help in correcting this query.
I am trying to bring back the TRADE_AMT for the last week (w1) and then for the year (YTD). But the query keeps failing, I know there is a simpler solution but that is not I am looking for as I would like to learn this idea of using multiple select statements to bring back one data set as per the very first Select statement.
Also, I would like to group by login IDs but I know I can have only one from statement.
Here is my query:
SELECT login, W1.TRADE_AMT, YTD.TRADE_AMT
FROM
(SELECT sum(TRADE_AMT) FROM CORPS
WHERE DATE > 20180903 AND DATE_ID <= 20180910
AND REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12') ) AS W1
UNION ALL
(SELECT sum(TRADE_AMT) FROM CORPS
WHERE DATE > 20180101 AND DATE_ID < 20180911
AND REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12') ) AS YTD
GROUP BY login
Sample Date From Corps Table:
Login Trade_AMT Date Rating
ITI1 100 20180509 High
RAB0 150 20180910 High
RR12 25 20180104 High
YTRT 100 20180225 Low
ACE1 123 20180908 Low
ITI1 354 20180903 Low
RAB0 254 20180331 High
RR12 245 20180314 High
RR12 5236 20180505 High
Desired Result:
Login W1_Volume YTD_Volume
ITI1 100 2000
RAB0 150 2500
RR12 25 3000
could be using an inner join on subquery group by login
SELECT W1.login, W1.TRADE_AMT, YTD.TRADE_AMT
FROM
(SELECT login, sum(TRADE_AMT) AS TRADE_AMT FROM CORPS
WHERE DATE > 20180903 AND DATE_ID <= 20180910
AND REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12')
GROUP BY LOGIN) AS W1
LEFT JOIN (
SELECT login, sum(TRADE_AMT) AS TRADE_AMT FROM CORPS
WHERE DATE > 20180101 AND DATE_ID < 20180911
AND REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12')
GROUP BY LOGIN ) AS YTD ON YTD.LOGIN = W1.LOGIN
left join show also not matching login values but you can use inner join if you need oly matching login between week and year
According to sample i got that you need min() of TRADE_AMT for 1st sub-query but not got the 2nd sub-query aggregate function properly
SELECT login, W1.TRADE_AMT, YTD.TRADE_AMT
FROM
(SELECT login,min(TRADE_AMT) as TRADE_AMT FROM CORPS
WHERE DATE > '20180903' AND DATE_ID <= '20180910'
AND REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12')
group by login) AS W1
join
(SELECT login, sum(TRADE_AMT) as TRADE_AMT FROM CORPS
WHERE DATE > '20180101' AND DATE_ID < '20180911'
AND REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12')
group by login
) AS YTD
on W1.login=YTD.login
Simply use case expressions to do conditional aggregation:
SELECT login,
sum(case when DATE > 20180903 AND DATE_ID <= 20180910 then TRADE_AMT end) W1_Volume,
sum(case when DATE > 20180101 AND DATE_ID < 20180911 then TRADE_AMT end) YTD_Volume
FROM CORPS
WHERE REGION = 'London'
AND Rating = 'High'
AND LOGIN IN ('ITI1','RAB0','RR12')
GROUP BY login
Optionally you can put back a date condition to the WHERE clause to speed things up:
AND DATE > 20180101 AND DATE_ID < 20180911
Bottom line is your outer query can't be selective (W1.something, YTD.something), from the inner queries when you put them together with a union. They get melded into a single result set. When you union, you have to leave "clues" in the result set to figure why the rows are interesting:
SELECT [login], TRADE_AMT, amt_type
FROM
(
SELECT
[login], sum(TRADE_AMT), 'week' as amt_type
FROM
CORPS
WHERE
DATE > 20180903 AND DATE_ID <= 20180910
AND REGION = 'London'
AND Rating = 'High'
AND [LOGIN] IN ('ITI1','RAB0','RR12')
group by
[login]
UNION ALL
SELECT
[login], sum(TRADE_AMT), 'ytd' as amt_type
FROM
CORPS
WHERE
DATE > 20180101 AND DATE_ID < 20180911
AND REGION = 'London'
AND Rating = 'High'
AND [LOGIN] IN ('ITI1','RAB0','RR12')
group by
[login]
)
...where the amt_type indicates whether it was from the ytd query or the week query.
That's the issue with union queries. You can pivot the result set, but that's unnecessary...unless you just need the results in a particular format. It depends on the code consuming the result set.
(Also - LOGIN is a reserved word...probably ought to bracket it where you don't mean the reserved word)
You can get your result sets on one-line-per-login with a join (rather than a union), as per scaisEdge answer, but I assumed you were trying to figure out the union approach.

List all months with a total regardless of null

I have a very small SQL table that lists courses attended and the date of attendance. I can use the code below to count the attendees for each month
select to_char(DATE_ATTENDED,'YYYY/MM'),
COUNT (*)
FROM TRAINING_COURSE_ATTENDED
WHERE COURSE_ATTENDED = 'Fire Safety'
GROUP BY to_char(DATE_ATTENDED,'YYYY/MM')
ORDER BY to_char(DATE_ATTENDED,'YYYY/MM')
This returns a list as expected for each month that has attendees. However I would like to list it as
January 2
February 0
March 5
How do I show the count results along with the nulls? My table is very basic
1234 01-JAN-15 Fire Safety
108 01-JAN-15 Fire Safety
1443 02-DEC-15 Healthcare
1388 03-FEB-15 Emergency
1355 06-MAR-15 Fire Safety
1322 09-SEP-15 Fire Safety
1234 11-DEC-15 Fire Safety
I just need to display each month and the total attendees for Fire Safety only. Not used SQL developer for a while so any help appreciated.
You would need a calendar table to select a period you want to display. Simplified code would look like this:
select to_char(c.Date_dt,'YYYY/MM')
, COUNT (*)
FROM calendar as c
left join TRAINING_COURSE_ATTENDED as tca
on tca.DATE_ATTENDED = c.Date_dt
WHERE tca.COURSE_ATTENDED = 'Fire Safety'
and c.Date_dt between [period_start_dt] and [period_end_dt]
GROUP BY to_char(c.Date_dt,'YYYY/MM')
ORDER BY to_char(c.Date_dt,'YYYY/MM')
You can create your own set required year month's on-fly with 0 count and use query as below.
Select yrmth,sum(counter) from
(
select to_char(date_attended,'YYYYMM') yrmth,
COUNT (1) counter
From TRAINING_COURSE_ATTENDED Where COURSE_ATTENDED = 'Fire Safety'
Group By Y to_char(date_attended,'YYYYMM')
Union All
Select To_Char(2015||Lpad(Rownum,2,0)),0 from Dual Connect By Rownum <= 12
)
group by yrmth
order by 1
If you want to show multiple year's, just change the 2nd query to
Select To_Char(Year||Lpad(Month,2,0)) , 0
From
(select Rownum Month from Dual Connect By Rownum <= 12),
(select 2015+Rownum-1 Year from Dual Connect By Rownum <= 3)
Try this :
SELECT Trunc(date_attended, 'MM') Month,
Sum(CASE
WHEN course_attended = 'Fire Safety' THEN 1
ELSE 0
END) Fire_Safety
FROM training_course_attended
GROUP BY Trunc(date_attended, 'MM')
ORDER BY Trunc(date_attended, 'MM')
Another way to generate a calendar table inline:
with calendar (month_start, month_end) as
( select add_months(date '2014-12-01', rownum)
, add_months(date '2014-12-01', rownum +1) - interval '1' second
from dual
connect by rownum <= 12 )
select to_char(c.month_start,'YYYY/MM') as course_month
, count(tca.course_attended) as attended
from calendar c
left join training_course_attended tca
on tca.date_attended between c.month_start and c.month_end
and tca.course_attended = 'Fire Safety'
group by to_char(c.month_start,'YYYY/MM')
order by 1;
(You could also have only the month start in the calendar table, and join on trunc(tca.date_attended,'MONTH') = c.month_start, though if you had indexes or partitioning on tca.date_attended that might be less efficient.)

More than 1 appointment on the same day for a patient and display both appointments

I am trying to find patients that have more than 1 appointment on the same day. I want to then display all the appointments the patient may have. Do I need to use a subquery to do this? Here is what I have so far:
Select
Appt.ID-PatNm as Patient,
ApptNum,
Sched_ApptType.Prov.Mnemonic as Type,
Appt.Provider-Name as Provider,
Appt.Dt,
Appt.Tm,
Appt.Department-Mnemonic As Dept,
Appt.SchedulerInits,
Case $EXTRACT(Appt.InternalStatus,1)
when 'P' then 'Pending'
when 'A' then 'Arrived'
when 'R' then 'Rescheduled'
End as Status
From Sched.Appointment Appt
JOIN Sched_ApptType.Prov ON
Appt.Department = Sched_ApptType.Prov.Department
and
Appt.Provider = Sched_ApptType.Prov.Provider
and
Appt.Type = Sched_ApptType.Prov.ApptType
Where (Appt.Dt) > DATEADD('DD',-120,CURRENT_DATE)
AND Appt.InternalStatus IN ('P','R','A')
AND Appt.Department-Mnemonic= 'EYE'
Group By
Appt.ID-PatNm,
Appt.Dt
You get the patients having more than one appointment in a day by grouping by patient and day:
select distinct a.id_patnm
from sched.appointment a
group by a.id_patnm, a.dt
having count(*) > 1
So yes, you need a subquery:
Where (Appt.Dt) > DATEADD('DD',-120,CURRENT_DATE)
AND Appt.InternalStatus IN ('P','R','A')
AND Appt.Department_Mnemonic= 'EYE'
AND Appt.ID_PatNm IN
(
select a.id_patnm
from sched.appointment a
group by a.id_patnm, a.dt
having count(*) > 1
)
(BTW: I used id_patnm instead of id-patnm here, for I don't know any DBMS that would allow the hyphen. When using a hyphen in a column name you have to use quotes on the name, e.g. "id-patnm".)
I suppose you could add a column for Appointment_id which would then allow you to get the desired result.