Select Top 1 with multiple Group By - sql

I've got a table in SQL Server 2016 (I believe it was originally from 2008 or 2012 and it's just moved to the 2016 cluster) with patient events, the type of event, and the severity of the event (called a grade). There are several instances where the same patient will have multiple events occur but with varying grades. So, a sample of data will look something like this:
| Pt_id | Event | Grade |
+-------+----------------+-------+
| 01 | Pain | 2 |
| 01 | Pain | 4 |
| 01 | Nausea | 2 |
| 02 | Headache | 2 |
| 02 | Headache | 3 |
| 03 | Blurred Vision | 3 |
| 03 | Blurred Vision | 4 |
| 03 | Bluured Vision | 3 |
| 03 | Nausea | 4 |
| 03 | Nausea | 2 |
I'm trying to get the highest grade for each of the different events per patient. My desired output for that data would be as follows:
| Pt_id | Event | Grade |
+-------+----------------+-------+
| 01 | Pain | 4 |
| 01 | Nausea | 2 |
| 02 | Headache | 3 |
| 03 | Blurred Vision | 4 |
| 03 | Nausea | 4 |
I've tried using the the Top 1 incorporated into the query, the ROW_Number, Partition, and everything else Google has thrown at me but I get either too restricted of results (I'm getting around 30 rows but I actually went through the excel (I'm trying to do some QA here) and I should have just under 400 rows. I think that when I do these functions I'm missing something and it's grouping either all Pt_ids and just picking 1 row for all the Events for that Pt_id or it's doing that with the Event - and no matter what I try it won't give me one row per patient, per event, with the highest grade for that event and patient.
Although I've used SQL throughout the years, it's never been my primary function so your assistance is greatly appreciated!

Isn't this enough with use of GROUP BY with MAX() ?
SELECT Pt_id, Event, MAX(Grade)
FROM table t
GROUP BY Pt_id, Event;
If the table has more column other than only 3 columns, then use ROW_NUMBER() with TIES clause :
SELECT TOP (1) WITH TIES t.*
FROM table t
ORDER BY ROW_NUMBER() OVER (PARTITION BY Pt_id, Event ORDER BY Grade DESC);

use row_number window function
with cte as
(
select *,
row_number() over(partition by Pt_id ,Event by order by Grade desc) rn
from your_table
)select * from cte where rn=1
You have to use order by Grade descfor getting max value

Related

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

SELECT Top values for each records

I have been battling through this query/query design for sometime now and I thought it's time to ask the experts! Here's my table results:
ID | Status | date |
---------------------------------
05 | Returned | 20/6/2018 |
03 | Sent | 12/5/2018 |
01 | Pending | 07/6/2018 |
01 | Engaged | 11/4/2018 |
03 | Contacted | 16/4/2018 |
05 | Surveyed | 04/3/2017 |
05 | No Contact | 05/3/2017 |
How do I get it to return top/newest value for each ID:
ID | Status | date |
---------------------------------
05 | Returned | 20/6/2018 |
03 | Sent | 12/5/2018 |
01 | Pending | 07/6/2018 |
I've tried group by, TOP 1, Distinct and results still not what I wanted. Also, displaying the results by top 5% is won't do either as the ID can be more than just 3 types.
My QUERY below:
INSERT INTO TmpAllcomsEmployee ( StatusID, EmployeeID, CommunicationDate )
SELECT DISTINCT CommunicationLog.StatusID, TmpAllcomsEmployee.EmployeeID,
Max(CommunicationLog.CommunicationDate) AS MaxOfCommunicationDate
FROM CommunicationLog RIGHT JOIN TmpAllcomsEmployee ON
CommunicationLog.EmployeeID = TmpAllcomsEmployee.EmployeeID
GROUP BY CommunicationLog.StatusID, TmpAllcomsEmployee.EmployeeID
ORDER BY Max(CommunicationLog.CommunicationDate) DESC;
One method is a correlated subquery:
select cl.*
from CommunicationLog as cl
where cl.date = (select max(cl2.date)
from CommunicationLog as cl2
where cl2.EmployeeID = cl.EmployeeID
);
This gets the most recent record for each employee in CommunicationLog. You can join in the other table if you really need it. It does not seem unnecessary unless you are using it for filtering.

How to exclude rows that have matching fields in other rows

I have a table in MS Access 2013 that has a number of different columns. As part of the data that is entered into the main table, there are duplicates in certain columns. However when I 'pot up' the volumes of rows based on their status, I need to be able to exclude those with the same values in other columns.
------------------------------------------------------------
HeaderID | Date | Number | EffectiveDate | Reg | Status
------------------------------------------------------------
2 | 01/01/2016| 100001 | 01/12/2015 | 01 | Ready
3 | 01/01/2016| 100001 | 01/12/2015 | 02 | Ready
4 | 02/02/2016| 100002 | 12/11/2015 | R | Pending
5 | 02/02/2016| 100002 | 12/11/2015 | T | Pending
6 | 02/02/2016| 100002 | 12/11/2015 | N | Pending
7 | 15/09/2015| 100003 | 30/11/2015 | 01 | Ready
8 | 14/09/2015| 100004 | 20/02/2016 | 01 | New
I have the basic below code already:
Select
tbl_Progression.Status,
Count(tbl_Progression.HeaderID) AS CountofHeaderID
From tbl_Progression
Group By tbl_Progression.Status
I'm looking to be able to get the results to look like the below using the example data above, whereby the Status is counted by HeaderID but only counts once those records that have the same Date, Number and EffectiveDate (but different Reg) to look like this:
------------------------
Status | CountofHeaderID
------------------------
Pending | 1
Ready | 2
New | 1
Instead of what the current code is doing:
------------------------
Status | CountofHeaderID
------------------------
Pending | 3
Ready | 3
New | 1
MS Access doesn't support COUNT(DISTINCT). You can, however, use a subquery with DISTINCT (or GROUP BY):
Select p.Status, Count(*) as new_CountofHeaderID
From (select distinct p.status, p.Date, p.Number, pEffectiveDate
from tbl_Progression as p
) as p
Group By p.Status;

SQL Group by one column and decide which column to choose

Let's say I have data like this :
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 2 | 20 | B | 20 |
| 3 | 10 | C | 30 |
| 4 | 10 | D | 80 |
I would like to group rows by code value, but get real rows back (not some aggregate function).
I know that just
select *
from table
group by code
won't work because database don't know which row to return where code is the same.
So my question is how to tell database to select (for example) the lower number column so in my case
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 3 | 10 | C | 30 |
P.S.
I know how to do this by PARTITION but this is only allowed in Oracle databases and can't be created in JPA criteria builder (what is my ultimate goal).
Why You don't use code like this?
SELECT
id,
code,
name,
number
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY number ASC) AS RowNo
FROM table
) s
WHERE s.RowNo = 1
You can look at this site;
Data Partitioning

LEFT JOINing the max/top

I have two tables from which I'm trying to run a query to return the maximum (or top) transaction for each person. I should note that I cannot change the table structure. Rather, I can only pull data.
People
+-----------+
| id | name |
+-----------+
| 42 | Bob |
| 65 | Ted |
| 99 | Stu |
+-----------+
Transactions (there is no primary key)
+---------------------------------+
| person | amount | date |
+---------------------------------+
| 42 | 3 | 9/14/2030 |
| 42 | 4 | 7/02/2015 |
| 42 | *NULL* | 2/04/2020 |
| 65 | 7 | 1/03/2010 |
| 65 | 7 | 5/20/2020 |
+---------------------------------+
Ultimately, for each person I want to return the highest amount. If that doesn't work then I'd like to look at the date and return the most recent date.
So, I'd like my query to return:
+----------------------------------------+
| person_id | name | amount | date |
+----------------------------------------+
| 42 | Bob | 4 | 7/02/2015 | (<- highest amount)
| 65 | Ted | 7 | 5/20/2020 | (<- most recent date)
| 99 | Stu | *NULL* | *NULL* | (<- no records in Transactions table)
+----------------------------------------+
SELECT People.id, name, amount, date
FROM People
LEFT JOIN (
SELECT TOP 1 person_id
FROM Transactions
WHERE person_id = People.id
ORDER BY amount DESC, date ASC
)
ON People.id = person_id
I can't figure out what I am doing wrong, but I know it's wrong. Any help would be much appreciated.
You are almost there but since there are duplicate Id in the Transaction table ,so you need to remove those by using Row_number() function
Try this :
With cte as
(Select People,amount,date ,row_number() over (partition by People
order by amount desc, date desc) as row_num
from Transac )
Select * from People as a
left join cte as b
on a.ID=b.People
and b.row_num=1
The result is in Sql Fiddle
Edit: Row_number() from MSDN
Returns the sequential number of a row within a partition of a result set,
starting at 1 for the first row in each partition.
Partition is used to group the result set and Over by clause is used
Determine the partitioning and ordering of the rowset before the
associated window function is applied.