Using Count distinct case in sql and group by multiple columns

Using Count distinct case in sql and group by multiple columns - sql

I have a query that works great (listed below). The issue I am having is we have run into a patient that has had event on two different days and because I am grouping by the PATNUM, it is only showing it as one.
How can I get it to count 1 for each time if the PATNUM and SCHDT are different
Example:
PATNUM SCHDT
12345 30817
12345 30817
54321 30817
54321 30717
PATNUM 12345 should only count once while PATNUM 54321 should count twice.
My count statement is this:
SELECT ph.*, pi.*,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='7' THEN pi.PATNUM ELSE NULL END) AS count1,
COUNT(DISTINCT CASE WHEN `SERVTYPE` IN ('INPT','INPFOP','INFOBS','IP') AND Complete ='8' THEN pi.PATNUM ELSE NULL END) AS count2
FROM patientinfo as pi
INNER JOIN physicians as ph ON pi.SURGEON=ph.PName
WHERE PID NOT IN ('1355','988','767','1289','484','2784')
GROUP BY SURGEON
ORDER BY Dept,SURGEON ASC

Which columns do you want to see?
You can adjust your GROUP BY:
SELECT
ph.pname,
ph.specialty,
SUM(CASE WHEN complete = 7 THEN 1 ELSE 0 END) count1,
SUM(CASE WHEN complete = 8 THEN 1 ELSE 0 END) count2
FROM
(
SELECT
DISTINCT
surgeon,
patnum,
schdt,
complete,
servtype
FROM patientinfo
WHERE complete IN (7,8)
AND servtype IN ('INPT','INPFOP','INFOBS','IP')
AND pid NOT IN ('1355','988','767','1289','484','2784')
) pisub
INNER JOIN physicians ph ON pisub.surgeon = ph.pname
GROUP BY ph.pname, ph.specialty
ORDER BY ph.pname, ph.specialty;
Also, I would make a few suggestions:
If you're going to give your tables an alias, then use the alias when referring to any column in your query. I've made a guess here about some of your columns as to which table they come from (e.g. dept), so feel free to change it if it is not correct
You don't need to select all records from both tables if you don't need them
The query won't run if you don't GROUP BY all columns you're selecting. I've written about this for Oracle and SQL in general, but actually in MySQL I think it does run but show incorrect results.

Related

Re-coding/transforming SQL values into new columns from linked data: why is CASE WHEN returning multiple values?

I work with a lot of linked data from multiple tables. As a result, I'm running into some challenges with deduplication and re-coding values into new columns in a more meaningful way.
My core data set is a list of person-level records as rows. However, the linked data include multiple rows per person based on the dates they've been booked into events, whether they've showed up or not, and whether they're a member of our organisation. There are usually multiple bookings. It is possible to lose membership status and continue to attend events/cancel/etc, but we are interested in whether or not they have ever been a member and if not, which is the highest level of contact they have ever had with our organisation.
In short: If they have ever been a member, that needs to take precedence.
select distinct
a.ticketnumber
a.id
-- (many additional columns from multiple tables here)
case
when b.Went_Member >=1 then 'Member'
when b.Went_NonMember >=1 then 'Attended but not member'
when b.Going_NonMember >=1 then 'Going but not member'
when b.OptOut='1' then 'Opt Out'
when b.Cancelled >=1 then 'Cancelled'
when c.MemberStatus = '9' then 'Member'
when c.MemberStatus = '6' then 'Attended but not member'
when c.DateBooked > current_timestamp then 'Going but not member'
when c.OptOut='1' then 'Opt out'
when c.MemberStatus = '8' then 'Cancelled'
end [NewMemberStatus]
from table1 a
left join TableWithMemberStatus1 b on a.id = b.id
left join TableWithMemberStatus2 c on a.id = c.id
-- (further left joins to additional tables here)
order by a.ticketnumber
Table b is more accurate because these are our internal records, whereas table c is from a third party. Annoyingly, the numbers in C aren't in the same meaningful order as we've decided so I can't have it select the highest value for each ID.
I was under the impression that CASE goes down the list of WHEN statements and returns the first matching value, but this will produce multiple rows. For example:
ID
NewMemberStatus
989898
NULL
989898
Cancelled
777777
Member
111111
Cancelled
111111
Member
I feel like maybe there is something missing in terms of ORDER BY or GROUP BY that I should be adding? I tried COALESCE with CASE inside and it didn't work. Should I be nesting some things in parentheses?

In your query you are showing all rows (all bookings), because there is no WHERE clause and no aggregation. But you only want one result row per person.
You want a person's best status from the internal table. If there is no entry for the person in the internal table, you want their best status from the third party table. You get the best statuses by aggregating the rows in the internal and third party tables by person. Then join to the person.
I am using status numbers, because these can be ordered (I use 1 for the best status (member), so I look for the minimum status). In the end I replace the number found with the related text (e.g. 'Member' for status 1).
select
p.*,
case coalesce(i.best_status, tp.best_status)
when 1 then 'Member'
when 2 then 'Attended but not member'
when 3 then 'Going but not member'
when 4 then 'Opt out'
when 5 then 'Cancelled'
else 'unknown'
end as status
from person p
left join
(
select
person_id,
min(case when went_member >= 1 then 1
when went_nonmember >= 1 then 2
when going_nonmember >= 1 then 3
when optout = 1 then 4
when cancelled >= 1 then 5
end) as best_status
from internal_table
group by person_id
) i on i.person_id = p.person_id
left join
(
select
person_id,
min(case when MemberStatus = 9 then 1
when MemberStatus = 6 then 2
when DateBooked > current_timestamp then 3
when optout = 1 then 4
when memberstatus = 8 then 5
end) as best_status
from thirdparty_table
group by person_id
) tp on tp.person_id = p.person_id
order by p.person_id;

SQL - joining two queries against same table for grid output

I should probably be able to figure this out from other questions/answers I've read here, but I just can't get anything to work today. Any help is really appreciated.
I have two queries, counting the instances of "GOOD" feedback, and "BAD" feedback from a single table. I just want to join them so that I can see something like below
ID | GOOD | BAD
121 | 0 | 7
123 | 5 | 0
287 | 32 | 8
I'm running numerous queries from VBA, if that matters, and the 0's can just be null. I can clean that stuff up in VBA.
Query 1:
select ID, count(*)
from HLFULL
where DEPT= 'HLAK'
and feedback = 'GOOD'
group by ID
Query 2:
select ID, count(*)
from HLFULL
where DEPT= 'HLAK'
and feedback = 'BAD'
group by ID
I've tried UNION, UNION ALL, JOIN, INNER JOIN, OUTER JOIN, aggregations, etc.

You can do conditional aggregation like this:
select ID,
count(case when feedback = 'GOOD' then 1 end) as Good,
count(case when feedback = 'BAD' then 1 end) as Bad
from HLFULL
where DEPT = 'HLAK'
and feedback in ('GOOD', 'BAD')
group by ID

You should be able to get the result using conditional aggregation. This type of query uses a CASE expression along with your aggregate function to get multiple columns:
select ID,
count(case when feedback = 'GOOD' then Id end) as Good,
count(case when feedback = 'BAD' then Id end) as Bad
from HLFULL
where DEPT= 'HLAK'
group by ID

SQL Nested Select Statement

I have the following SQL Code which is not giving me my desired results.
SELECT
POLICIES.CLIENTS_ID,
POLICIES.CLIENTCODE,
COUNT(POLICIES.POLICIES_ID) as [Total Policies],
(
SELECT
COUNT(POLICIES.POLICIES_ID)
FROM
POLICIES
WHERE
POLICIES.COVCODE = 'AUT'
) as [Auto Policies]
FROM
POLICIES
LEFT JOIN CLIENTS
ON CLIENTS.CLIENTS_ID = POLICIES.CLIENTS_ID
WHERE
POLICIES.CNR IS NULL
GROUP BY
POLICIES.CLIENTS_ID,
POLICIES.CLIENTCODE
ORDER BY
POLICIES.CLIENTS_ID
I get a result like this:
ID CODE Total Auto
3 ABCDE1 1 999999
4 ABCDE2 1 999999
5 ABCDE3 2 999999
6 ABCDE4 2 999999
I would like for the last column to COUNT the total auto policies that exists for that clientid rather than all of the auto policies that exist. I believe I need a nested select statement that somehow groups all like results on the clientid, but it ends up returning more than 1 row and throws the error.
If I add:
GROUP BY
POLICIES.CLIENTS_ID
I get:
Subquery returned more than 1 value. This is not permitted when the....
Any help would be appreciated greatly!
Thank you

You can use a CASE statement to do this. Instead of your subquery in the SELECT clause use:
SUM(CASE WHEN POLICIES.COVCODE = 'AUT' THEN 1 ELSE 0 END) as [AUTO POLICIES]
As Martin Smith pointed out. If client_id has multiple client_codes then this will give you the count of records for each combination of client_id/client_code. If client_id is 1:1 with client_code then this will give you a count of records for each distinct client_id, which I suspect is the case from your example and question.
Unrelated: You have a LEFT JOIN to your Clients table, but you don't use your Clients table anywhere int he query. Consider removing it if you don't need to select or filter by any its fields, since it's just unused overhead.

What if you modify the inner query for getting count to something like
SUM(CASE WHEN POLICIES.COVCODE = 'AUT' THEN 1 ELSE 0 END) as [Auto Policies]

How can I combine 3 queries into one query and the result form look like schedule table?

I have 3 select queries :
the result of first for heading of my table.(like : select id, name from cars)
the second result show left side of my schedule table shows the date of sales (select date from dates inner join car on date.carid = car.carid where date.date1 > XXX/XX/XX for example)
the third result returns the data for inside the table. and it is the price of each car in each date.
But I don't know how to combine them?

I guess you need something like this Working SQL Server fiddle here
You need either of the following
Pivot feature of SQL Server
Aggregate function with group-by
Query: Pivot feature of SQL Server
SELECT *
FROM
(
SELECT [SALE_DATE], [CAR_NAME], [COST]
FROM CARS_SALES
) AS source
PIVOT
(
MAX(COST)
FOR [CAR_NAME] IN ([BENZ] , [BMW], [RENAULT])
) as pvt;
Query: Aggregate function with group-by
SELECT SALE_DATE,
MAX(CASE WHEN CAR_NAME = 'BENZ' THEN COST ELSE NULL END) [BENZ],
MAX(CASE WHEN CAR_NAME = 'BMW' THEN COST ELSE NULL END) [BMW],
MAX(CASE WHEN CAR_NAME = 'RENAULT' THEN COST ELSE NULL END) [RENAULT]
FROM CARS_SALES
GROUP BY SALE_DATE
Both the Queries give an
output result
as below:
SALE_DATE BENZ BMW RENAULT
09/07/2014 (null) (null) 900
09/08/2014 100 200 300
09/09/2014 400 600 (null)
09/10/2014 700 500 800

It's really unclear, but based on that you've posted, the solution would be something like this:
select cars.name, dates.date, dates.price
from dates
left join cars on (cars.carid=dates.carid)
order by cars.name, dates.date;
This gets the car's name, price and the date in one query. But I don't understand what your third query is for. If you provide more information I'll update this answer.

How to count 2 different data in one query

I need to calculate sum of occurences of some data in two columns in one query. DB is in SQL Server 2005.
For example I have this table:
Person: Id, Name, Age
And I need to get in one query those results:
1. Count of Persons that have name 'John'
2. Count of 'John' with age more than 30 y.
I can do that with subqueries in this way (it is only example):
SELECT (SELECT COUNT(Id) FROM Persons WHERE Name = 'John'),
(SELECT COUNT (Id) FROM Persons WHERE Name = 'John' AND age > 30)
FROM Persons
But this is very slow, and I'm searching for faster method.
I found this solution for MySQL (it almost solve my problem, but it is not for SQL Server).
Do you know better way to calculate few counts in one query than using subqueries?

Using a CASE statement lets you count whatever you want in a single query:
SELECT
SUM(CASE WHEN Persons.Name = 'John' THEN 1 ELSE 0 END) AS JohnCount,
SUM(CASE WHEN Persons.Name = 'John' AND Persons.Age > 30 THEN 1 ELSE 0 END) AS OldJohnsCount,
COUNT(*) AS AllPersonsCount
FROM Persons

Use:
SELECT COUNT(p.id),
SUM(CASE WHEN p.age > 30 THEN 1 ELSE 0 END)
FROM PERSONS p
WHERE p.name = 'John'
It's always preferable when accessing the same table more than once, to review for how it can be done in a single pass (SELECT statement). It won't always be possible.
Edit:
If you need to do other things in the query, see Chris Shaffer's answer.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas