Re-coding/transforming SQL values into new columns from linked data: why is CASE WHEN returning multiple values? - sql

I work with a lot of linked data from multiple tables. As a result, I'm running into some challenges with deduplication and re-coding values into new columns in a more meaningful way.
My core data set is a list of person-level records as rows. However, the linked data include multiple rows per person based on the dates they've been booked into events, whether they've showed up or not, and whether they're a member of our organisation. There are usually multiple bookings. It is possible to lose membership status and continue to attend events/cancel/etc, but we are interested in whether or not they have ever been a member and if not, which is the highest level of contact they have ever had with our organisation.
In short: If they have ever been a member, that needs to take precedence.
select distinct
a.ticketnumber
a.id
-- (many additional columns from multiple tables here)
case
when b.Went_Member >=1 then 'Member'
when b.Went_NonMember >=1 then 'Attended but not member'
when b.Going_NonMember >=1 then 'Going but not member'
when b.OptOut='1' then 'Opt Out'
when b.Cancelled >=1 then 'Cancelled'
when c.MemberStatus = '9' then 'Member'
when c.MemberStatus = '6' then 'Attended but not member'
when c.DateBooked > current_timestamp then 'Going but not member'
when c.OptOut='1' then 'Opt out'
when c.MemberStatus = '8' then 'Cancelled'
end [NewMemberStatus]
from table1 a
left join TableWithMemberStatus1 b on a.id = b.id
left join TableWithMemberStatus2 c on a.id = c.id
-- (further left joins to additional tables here)
order by a.ticketnumber
Table b is more accurate because these are our internal records, whereas table c is from a third party. Annoyingly, the numbers in C aren't in the same meaningful order as we've decided so I can't have it select the highest value for each ID.
I was under the impression that CASE goes down the list of WHEN statements and returns the first matching value, but this will produce multiple rows. For example:
ID
NewMemberStatus
989898
NULL
989898
Cancelled
777777
Member
111111
Cancelled
111111
Member
I feel like maybe there is something missing in terms of ORDER BY or GROUP BY that I should be adding? I tried COALESCE with CASE inside and it didn't work. Should I be nesting some things in parentheses?

In your query you are showing all rows (all bookings), because there is no WHERE clause and no aggregation. But you only want one result row per person.
You want a person's best status from the internal table. If there is no entry for the person in the internal table, you want their best status from the third party table. You get the best statuses by aggregating the rows in the internal and third party tables by person. Then join to the person.
I am using status numbers, because these can be ordered (I use 1 for the best status (member), so I look for the minimum status). In the end I replace the number found with the related text (e.g. 'Member' for status 1).
select
p.*,
case coalesce(i.best_status, tp.best_status)
when 1 then 'Member'
when 2 then 'Attended but not member'
when 3 then 'Going but not member'
when 4 then 'Opt out'
when 5 then 'Cancelled'
else 'unknown'
end as status
from person p
left join
(
select
person_id,
min(case when went_member >= 1 then 1
when went_nonmember >= 1 then 2
when going_nonmember >= 1 then 3
when optout = 1 then 4
when cancelled >= 1 then 5
end) as best_status
from internal_table
group by person_id
) i on i.person_id = p.person_id
left join
(
select
person_id,
min(case when MemberStatus = 9 then 1
when MemberStatus = 6 then 2
when DateBooked > current_timestamp then 3
when optout = 1 then 4
when memberstatus = 8 then 5
end) as best_status
from thirdparty_table
group by person_id
) tp on tp.person_id = p.person_id
order by p.person_id;

Related

How to write oracle sql query for selecting single record which is having highest status

Assume that have a scenario like in request table with same request Id I may have multiple records with different statuses
status like Draft, InProgress, Approved, Completed . we need to fetch single highest status record. Here preferred order is Completed -> Approved -> InProgress -> Draft.
if have three records like one is with InProgress, one with Approved and another one is with Completed, then among these three in need fetch only one record which have highest status Completed.
if have two records like one is with InProgress and another one is with Draft, then among these two in need fetch only one record which have highest status InProgress.
Could any one please suggest me on this ?
Use the ROW_NUMBER analytic function to order the rows based on a CASE expression that converts your string values to priorities:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY request_id
ORDER BY CASE status
WHEN 'Completed' THEN 1
WHEN 'Approved' THEN 2
WHEN 'InProgress' THEN 3
WHEN 'Draft' THEN 4
ELSE 5
END
) as rn
FROM table_name t
)
WHERE rn = 1;
Its a bit of an heinous solution (tested on postgresql) - but you can convert your textual status into a number with a CASE statement and then use that plus a subquery to get the highest status:
SELECT rt.*
FROM
(SELECT
id,
MAX(CASE
WHEN status = 'Draft' THEN 0
WHEN status = 'InProgress' THEN 10
WHEN status = 'Approved' THEN 20
WHEN status = 'Completed' THEN 30
END) AS msid
FROM
request_table
GROUP BY
id) max_per_id
INNER JOIN
request_table rt ON max_per_id.id = rt.id
AND max_per_id.msid = CASE WHEN rt.status='Draft' THEN 0 WHEN rt.status='InProgress' then 10 WHEN rt.status='Approved' THEN 20 WHEN rt.status='Completed' then 30 END
The subquery
SELECT
id,
MAX(CASE
WHEN status = 'Draft' THEN 0
WHEN status = 'InProgress' THEN 10
WHEN status = 'Approved' THEN 20
WHEN status = 'Completed' THEN 30
END) AS msid
FROM
request_table
GROUP BY
id) max_per_id
provides the highest numeric status for each id. That then gets joined on the id and the numeric version of the status with the original table

Creating row with different where

I have this code to get the number of users of all items in the list and the average level.
select itemId,count(c.characterid) as numberOfUse, avg(maxUpgrade) as averageLevel
from items i inner join characters c on i.characterId=c.characterId
where itemid in (22001,22002,22003,22004,22005,22006,22007,22008,22009,22010,22011,22012,22013,22014,22015,22016,22030,22031,22032,22033,22034,22035,22036,22037,22038,22039,22040,22041,22042,22050,22051,22052,22053,22054,22055,22056,22057,22058,22059,22060,22070,22071,22072,22073,22074,22075,22076,22077,22085,22086,22087,22091,22092)
and attached>0
group by itemId
It does is creating a row for the rune id, one for the number of users, and one for the average-level people who upgrade it, and it does that for all players of the server.
I would like to create a new column every 10 levels to have stats every 10 levels, so I can see what item is more used depending on player level. The item level depending on the level, so the way I do to select only a certain level is using WHERE itemid>0 and itemid<10, and I do that every 10 levels, copy data, and push them in a google sheet.
So I would like a result with columns :
itemid use_1-10 avg_level_1-10 use_11-20 avg_level_21-30 etc...
So I could copy all the results at once and not having to do the same process 15 times.
If I am following this correctly, you can do conditional aggregation. Assuming that a "level" is stored in column level in table characters, you would do:
select i.itemId,
sum(case when c.level between 1 and 10 then 1 else 0 end) as use_1_10,
avg(case when c.level between 1 and 10 then maxUpgrade end) as avg_level_1_10,
sum(case when c.level between 11 and 20 then 1 else 0 end) as use_11_20,
avg(case when c.level between 11 and 20 then maxUpgrade end) as avg_level_11_20,
...
from items i
inner join characters c on i.characterId = c.characterId
where i.itemid in (...) and attached > 0
group by i.itemId
Note: consider prefixing column attached in the where clause with the table it belongs to, in order to avoid ambiguity.

How do you join a table with a different WHERE condition after you already used a join

Hi i have 2 tables employees and medical leaves related through the employee ID, basically i want to make a result set where there is one column that filters by month and year, and another column that filters by year only
EMPLOYEES MEDICAL
|employee|ID| |ID|DateOfLeave|
A 1 1 2019/1/3
B 2 1 2019/4/15
C 3 2 2019/5/16
D 4
select employees.employee,Employees.ID,count(medical.dateofleave) as
NumberofLeaves
from employees
left outer join Medical on employees.emp = MedBillInfo.emp
and month(medbillinfo.date) in(1) and year(medbillinfo.date) in (2019)
group by Employees.employee,employees.ID
RESULT SET
|Employee|ID|NumberOfLeaves|YearlyLeaves|--i want to join this column
A 1 1 2
B 2 0 1
C 3 0 0
D 4 0 0
But i have no idea how to write inside the current sql statement to join a yearly leaves column to my current result set which is only employee,id and numberofleaves
I think you want conditional aggregation:
select e.employee, e.ID,
count(*) as num_leaves,
sum(case when month(m.date) = 1 then 1 else 0 end) as num_leaves_in_month_1
from employees e left join
Medical m
on e.emp = m.emp
where m.date >= '2019-01-01' and m.date < '2020-01-01'
group by e.employee, e.ID;
Notes:
This removes the where clause which seems to refer to a non-existent table alias.
The date arithmetic uses direct comparisons rather than functions.
This introduces table aliases so the question is easier to write and to read.
Your question probably needs to be corrected as the group by condition does not match with select columns. But based on what you asked, I think you need to use truncate date function in order to group the leaves by year. For SQL Server, there is YEAR(date) function which returns the year of the given date. This date would be MEDICAL.DateOfLeave in your case.

Using Count distinct case in sql and group by multiple columns

I have a query that works great (listed below). The issue I am having is we have run into a patient that has had event on two different days and because I am grouping by the PATNUM, it is only showing it as one.
How can I get it to count 1 for each time if the PATNUM and SCHDT are different
Example:
PATNUM SCHDT
12345 30817
12345 30817
54321 30817
54321 30717
PATNUM 12345 should only count once while PATNUM 54321 should count twice.
My count statement is this:
SELECT ph.*, pi.*,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='7' THEN pi.PATNUM ELSE NULL END) AS count1,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='8' THEN pi.PATNUM ELSE NULL END) AS count2
FROM patientinfo as pi
INNER JOIN physicians as ph ON pi.SURGEON=ph.PName
WHERE PID NOT IN ('1355','988','767','1289','484','2784')
GROUP BY SURGEON
ORDER BY Dept,SURGEON ASC
Which columns do you want to see?
You can adjust your GROUP BY:
SELECT
ph.pname,
ph.specialty,
SUM(CASE WHEN complete = 7 THEN 1 ELSE 0 END) count1,
SUM(CASE WHEN complete = 8 THEN 1 ELSE 0 END) count2
FROM
(
SELECT
DISTINCT
surgeon,
patnum,
schdt,
complete,
servtype
FROM patientinfo
WHERE complete IN (7,8)
AND servtype IN ('INPT','INPFOP','INFOBS','IP')
AND pid NOT IN ('1355','988','767','1289','484','2784')
) pisub
INNER JOIN physicians ph ON pisub.surgeon = ph.pname
GROUP BY ph.pname, ph.specialty
ORDER BY ph.pname, ph.specialty;
Also, I would make a few suggestions:
If you're going to give your tables an alias, then use the alias when referring to any column in your query. I've made a guess here about some of your columns as to which table they come from (e.g. dept), so feel free to change it if it is not correct
You don't need to select all records from both tables if you don't need them
The query won't run if you don't GROUP BY all columns you're selecting. I've written about this for Oracle and SQL in general, but actually in MySQL I think it does run but show incorrect results.

Excluding null entries from multiples values with SQL

From 3 different tables, I want to know if a person (table1), with multiple visit in a store (table2), have purchased toys and enjoyed them (table3). In table3, 0 stand as either negative (so not enjoyed) or not bought. 1 stands for positive. Every visit has its own identification number.
My problem is that for every ID in table1, I have multiple entries for table2 for which I have multiple entries for table3 and only one of them is null.
Person Visit Toy
ID age Number Visit ID number name value
1 12 1 1 1 1 Plane
2 10 2 1 2 1 Train 1
3 2 1 2 Plane 1
4 2 2 2 Train 0
3 Plane 0
3 Train 1
(goes on for every id) (goes on for every visit)
I want to if know how many people have enjoyed a certain toy. However, since I have some null info, I have some trouble having those for which I only have value for both of their visit. For instance, the following code works only if the null condition is placed only on one of the visits
Select p.id, max(toy.value) as value
from person p
join visit v on p.id = v.id
join toy t on v.number = t.number
where
((t.name='plane' and v.visit=1)
or (t.name='plane' and v.visit=2))
and (
(v.visit=1 and ((t.value=1 or t.value=0) is not null))
---and (v.visit=2 and ((t.value=1 or t.value=0) is not null))
)
group by p.id
order by p.id
I have tried many ways of writing this. It does work if I try with both of null condition independently, but if I remove the -- and try for the condition on both the visit 1 and 2, it doesn't work. Note that I am using max on the value because I want a positive value is possible.
If you want to know how many people have enjoyed a certain toy, Then you may simply write this:
select count(*) from toy t where t.name='TOY NAME' and t.level=1;
If you want something else. Then kindly clarify.
Edited Query,
Select p.id, max(toy.value) as value
from person p
join visit v on p.id = v.id
join toy t on v.number = t.number
where
t.name='plane'
and t.value is not null
group by p.id
order by p.id
I used count as a way to eliminate all the null entries. The sum of null and a value is always null, so by adding restriction count=2 it eliminate the null