How do you join a table with a different WHERE condition after you already used a join - sql

Hi i have 2 tables employees and medical leaves related through the employee ID, basically i want to make a result set where there is one column that filters by month and year, and another column that filters by year only
EMPLOYEES MEDICAL
|employee|ID| |ID|DateOfLeave|
A 1 1 2019/1/3
B 2 1 2019/4/15
C 3 2 2019/5/16
D 4
select employees.employee,Employees.ID,count(medical.dateofleave) as
NumberofLeaves
from employees
left outer join Medical on employees.emp = MedBillInfo.emp
and month(medbillinfo.date) in(1) and year(medbillinfo.date) in (2019)
group by Employees.employee,employees.ID
RESULT SET
|Employee|ID|NumberOfLeaves|YearlyLeaves|--i want to join this column
A 1 1 2
B 2 0 1
C 3 0 0
D 4 0 0
But i have no idea how to write inside the current sql statement to join a yearly leaves column to my current result set which is only employee,id and numberofleaves

I think you want conditional aggregation:
select e.employee, e.ID,
count(*) as num_leaves,
sum(case when month(m.date) = 1 then 1 else 0 end) as num_leaves_in_month_1
from employees e left join
Medical m
on e.emp = m.emp
where m.date >= '2019-01-01' and m.date < '2020-01-01'
group by e.employee, e.ID;
Notes:
This removes the where clause which seems to refer to a non-existent table alias.
The date arithmetic uses direct comparisons rather than functions.
This introduces table aliases so the question is easier to write and to read.

Your question probably needs to be corrected as the group by condition does not match with select columns. But based on what you asked, I think you need to use truncate date function in order to group the leaves by year. For SQL Server, there is YEAR(date) function which returns the year of the given date. This date would be MEDICAL.DateOfLeave in your case.

Related

Inner join + group by - select common columns and aggregate functions

Let's say i have two tables
Customer
---
Id Name
1 Foo
2 Bar
and
CustomerPurchase
---
CustomerId, Amount, AmountVAT, Accountable(bit)
1 10 11 1
1 20 22 0
2 5 6 0
2 2 3 0
I need a single record for every joined and grouped Customer and CustomerPurchase group.
Every record would contain
columns from table Customer
some aggregation functions like SUM
a 'calculated' column. For example difference of other columns
result of subquery to CustomerPurchase table
An example of result i would like to get
CustomerPurchases
---
Name Total TotalVAT VAT TotalAccountable
Foo 30 33 3 10
Bar 7 9 2 0
I was able to get a single row only by grouping by all the common columns, which i dont think is the right way to do. Plus i have no idea how to do the 'VAT' column and 'TotalAccountable' column, which filters out only certain rows of CustomerPurchase, and then runs some kind of aggregate function on the result. Following example doesn't work ofc but i wanted to show what i would like to achieve
select C.Name,
SUM(CP.Amount) as 'Total',
SUM(CP.AmountVAT) as 'TotalVAT',
diff? as 'VAT',
subquery? as 'TotalAccountable'
from Customer C
inner join CustomerPurchase CR
on C.Id = CR.CustomerId
group by C.Id
I would suggest you just need the follow slight changes to your query. I would also consider for clarity, if you can, to use the terms net and gross which is typical for prices excluding and including VAT.
select c.[Name],
Sum(cp.Amount) as Total,
Sum(cp.AmountVAT) as TotalVAT,
Sum(cp.AmountVAT) - Sum(CP.Amount) as VAT,
Sum(case when cp.Accountable = 1 then cp.Amount end) as TotalAccountable
from Customer c
join CustomerPurchase cp on cp.CustomerId = c.Id
group by c.[Name];

Re-coding/transforming SQL values into new columns from linked data: why is CASE WHEN returning multiple values?

I work with a lot of linked data from multiple tables. As a result, I'm running into some challenges with deduplication and re-coding values into new columns in a more meaningful way.
My core data set is a list of person-level records as rows. However, the linked data include multiple rows per person based on the dates they've been booked into events, whether they've showed up or not, and whether they're a member of our organisation. There are usually multiple bookings. It is possible to lose membership status and continue to attend events/cancel/etc, but we are interested in whether or not they have ever been a member and if not, which is the highest level of contact they have ever had with our organisation.
In short: If they have ever been a member, that needs to take precedence.
select distinct
a.ticketnumber
a.id
-- (many additional columns from multiple tables here)
case
when b.Went_Member >=1 then 'Member'
when b.Went_NonMember >=1 then 'Attended but not member'
when b.Going_NonMember >=1 then 'Going but not member'
when b.OptOut='1' then 'Opt Out'
when b.Cancelled >=1 then 'Cancelled'
when c.MemberStatus = '9' then 'Member'
when c.MemberStatus = '6' then 'Attended but not member'
when c.DateBooked > current_timestamp then 'Going but not member'
when c.OptOut='1' then 'Opt out'
when c.MemberStatus = '8' then 'Cancelled'
end [NewMemberStatus]
from table1 a
left join TableWithMemberStatus1 b on a.id = b.id
left join TableWithMemberStatus2 c on a.id = c.id
-- (further left joins to additional tables here)
order by a.ticketnumber
Table b is more accurate because these are our internal records, whereas table c is from a third party. Annoyingly, the numbers in C aren't in the same meaningful order as we've decided so I can't have it select the highest value for each ID.
I was under the impression that CASE goes down the list of WHEN statements and returns the first matching value, but this will produce multiple rows. For example:
ID
NewMemberStatus
989898
NULL
989898
Cancelled
777777
Member
111111
Cancelled
111111
Member
I feel like maybe there is something missing in terms of ORDER BY or GROUP BY that I should be adding? I tried COALESCE with CASE inside and it didn't work. Should I be nesting some things in parentheses?
In your query you are showing all rows (all bookings), because there is no WHERE clause and no aggregation. But you only want one result row per person.
You want a person's best status from the internal table. If there is no entry for the person in the internal table, you want their best status from the third party table. You get the best statuses by aggregating the rows in the internal and third party tables by person. Then join to the person.
I am using status numbers, because these can be ordered (I use 1 for the best status (member), so I look for the minimum status). In the end I replace the number found with the related text (e.g. 'Member' for status 1).
select
p.*,
case coalesce(i.best_status, tp.best_status)
when 1 then 'Member'
when 2 then 'Attended but not member'
when 3 then 'Going but not member'
when 4 then 'Opt out'
when 5 then 'Cancelled'
else 'unknown'
end as status
from person p
left join
(
select
person_id,
min(case when went_member >= 1 then 1
when went_nonmember >= 1 then 2
when going_nonmember >= 1 then 3
when optout = 1 then 4
when cancelled >= 1 then 5
end) as best_status
from internal_table
group by person_id
) i on i.person_id = p.person_id
left join
(
select
person_id,
min(case when MemberStatus = 9 then 1
when MemberStatus = 6 then 2
when DateBooked > current_timestamp then 3
when optout = 1 then 4
when memberstatus = 8 then 5
end) as best_status
from thirdparty_table
group by person_id
) tp on tp.person_id = p.person_id
order by p.person_id;

Oracle SQL Remove Duplicates on 2 of 4 fields

I am using Oracle SQL to extract the data;
I have supply periods for IDs in 2 systems. I have this working with the below code:
select distinct b.ID_Code, b.supply_start_date, b.supply_end_date, b.system_id
from (
select ID_Code, max(supply_start_date) as max_dt
from tmp_mmt_sup
group by ID_Code) a
inner join tmp_mmt_sup b
on a.ID_Code=b.ID_Code and a.max_dt=b.SUPPLY_START_DATE;
However, I have several records that are on the 2 different systems, but have the same start date/end dates. I only want to keep one of them - not bothered which!
So instead of
ID_Code Start End System
123 01-04-2018 30-04-2018 ABC
123 01-04-2018 30-04-2018 DEF
I only have one of these records.
Many thanks
D
If you don't care which one to return, then one of aggregate functions (such as MIN or MAX) does the job. For example:
select b.id_code,
b.supply_start_date,
b.supply_end_date,
max(b.system_id) system_id --> added MAX here ...
from (select id_code,
max(supply_start_date) as max_dt
from tmp_mmt_sup
group by id_code
) a
inner join tmp_mmt_sup b
on a.id_code = b.id_code and a.max_dt = b.supply_start_date
group by b.id_code, --> ... and GROUP BY here
b.supply_start_date,
b.supply_end_date;

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

Identify open cases for each week during a year

I am trying to produce a report which identifies client cases which were open during each week of a year. Currently I have the following SQL which returns all clients with an indicator on whether their case was open during week 1 of our calendar. A client has two aspects which identifies if their case is open - their MOV_START_DATE and their ESU_START DATE should be greater than end date of the period, and their MOV_END_DATE/ESU_START DATE should be either null or greater than the start date of the period.
The below code works, but I thought I could just copy the left join WK1 and rename it WK2 to return information for week 2 but I'm getting an error relating to ambiguously named columns. Additionally, I'm guessing that having 52 (one for each week) left joins on a report isn't particularly advisable, so again I'm wondering if there is a better way of achieving this?
SELECT
A.ESU_PER_GRO_ID,
A.ESU_ID,
A.STATUS,
B.MOV_ID,
B.MOV_START_DATE,
B.MOV_END_DATE,
A.ESU_START_DATE,
A.ESU_END_DATE,
LS.CLS_DESC,
nvl2(wk1.PRD_PERIOD_NUM,'Y','N') as "Week1"
FROM
A
LEFT JOIN B ON B.MOV_PER_GRO_ID = A.ESU_PER_GRO_ID
LEFT JOIN LS ON LS.CLS_CODE = A.STATUS
LEFT JOIN O_PERIODS WK1 ON B.MOV_START_DATE < WK1.PRD_END_DATE
AND (B.MOV_END_DATE IS NULL OR B.MOV_END_DATE > WK1.PRD_START_DATE)
AND A.ESU_START_DATE < WK1.PRD_END_DATE
AND (A.ESU_END_DATE IS NULL OR A.ESU_END_DATE > WK1.PRD_START_DATE)
AND PRD_CAL_ID = 'E1190' AND WK1.PRD_PERIOD_NUM = 1 AND WK1.PRD_YEAR = 2012
WHERE
B.MOV_START_DATE Is Not Null
AND A.STATUS <> ('X')
Hopefully I have provided enough information, but if not, I am happy to answer questions. Thanks!
Sample Data (Produced by above query)
P ID ESU_ID STATUS MOV_ID M_START M_END DESC Week1
1 ESU1 New 1M 01/01/2012 Boo Y
2 ESU2 New 2M 01/03/2012 Boo N
Desired output (Week1 - Week 52)
P ID ESU_ID STATUS MOV_ID M_START M_END DESC Week1 Week2
1 ESU1 New 1M 01/01/2012 Boo Y Y
2 ESU2 New 2M 01/03/2012 Boo N N
I suspect that the reason creating a WK2 join like WK1 didn't work was that the column PRD_CAL_ID didn't have a table alias on it. However, as you guessed, 52 joins is probably not going to perform very well. Try the following:
SELECT A.ESU_PER_GRO_ID,
A.ESU_ID,
A.STATUS,
B.MOV_ID,
B.MOV_START_DATE,
B.MOV_END_DATE,
A.ESU_START_DATE,
A.ESU_END_DATE,
LS.CLS_DESC,
'Week' || TRIM(TO_CHAR(pd.PRD_PERIOD_NUM)) WEEK_DESC
FROM A
LEFT JOIN B
ON B.MOV_PER_GRO_ID = A.ESU_PER_GRO_ID
LEFT JOIN LS
ON LS.CLS_CODE = A.STATUS
LEFT JOIN O_PERIODS pd
ON B.MOV_START_DATE < pd.PRD_END_DATE AND
(B.MOV_END_DATE IS NULL OR
B.MOV_END_DATE > pd.PRD_START_DATE) AND
A.ESU_START_DATE < pd.PRD_END_DATE AND
(A.ESU_END_DATE IS NULL OR
A.ESU_END_DATE > pd.PRD_START_DATE)
WHERE B.MOV_START_DATE Is Not Null AND
A.STATUS <> ('X') AND
pd.PRD_CAL_ID = 'E1190' AND
pd.PRD_YEAR = 2012
ORDER BY WEEK_DESC
This produces slightly different results than your original query, having a WEEK_DESC instead of trying to create 52 different columns, one for each week, but I think it will perform better.
Share and enjoy.