SQL conditional GROUP BY: how to do it?

SQL conditional GROUP BY: how to do it? - sql

Let's say I have the following SQL query:
SELECT Meeting.id AS meetingId, Bill.id AS billId
FROM Meeting
LEFT JOIN Bill ON Meeting.FK_BillId = Bill.id
That outputs the following:
meetingId | billId
------------------
a | NULL
b | NULL
c | 1
d | 1
e | 1
f | 2
g | 2
And I would like the following output, that groups by billId's that aren't NULL:
meetingId | billId
------------------
a | NULL
b | NULL
c | 1
f | 2
How can I achieve that? By the way, I am not concerned by the ambiguous meetingId of the grouped results.
Thanks for your help!

In SQL Server:
SELECT meetingId, billid
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY billId ORDER BY meetingID) AS rn,
m.*
FROM mytable m
) q
WHERE rn = 1 OR billid IS NULL
ANSI:
SELECT MIN(meetingId), billid
FROM mytable
WHERE billid IS NOT NULL
GROUP BY
billId
UNION ALL
SELECT meetingId, billId
FROM mytable
WHERE billid IS NULL
MySQL:
SELECT meetingId, billid
FROM mytable
WHERE billid IS NOT NULL
GROUP BY
billId
UNION ALL
SELECT meetingId, billId
FROM mytable
WHERE billid IS NULL
This is a trifle more efficient than MIN if you really don't care about what meetingID will be returned as long as it belongs to the right group.

You could union 2 queries, one of which does the groups in the non-null entries, and the other that gets the null ones.

Related

Select first rows where condition [duplicate]

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a

How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.

I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?

To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.

I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma

This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1

If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4

SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.

This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.

The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM

select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3

Extract Historical Status for list of dates

I have two tables
TableA has a list of historical statuses for an entity. These can also be reversed, if reversed the row needs to be ignored.
EntityID | StatusDate | Status | IsReversed
-------------------------------------------------
1 | 2014-01-15 | A | NULL
1 | 2014-06-17 | B | Y
1 | 2015-01-19 | C | NULL
TableB has a list of maintenance dates for the entity
EntityID | MaintDate
-----------------------
1 | 2014-02-20
1 | 2014-03-30
1 | 2015-11-22
I would like to produce a list of the maintenance dates which also lists the status of the entity as at the maintenance date.
I've been shown how to retrieve an individual status for one individual date
SELECT TOP 1
Status
FROM
TableA
WHERE
StatusDate < '2014-03-30' AND IsReversed != 'Y'
ORDER BY
StatusDate DESC
But I can't work out how to integrate this into a query to retrieve the status for every date.
Thanks in advance for your help!
Here is the actual query #SITELIST is TableB and "LNSP_UAT.dbo.INSTALLSTATUS" is TableA:
SELECT DISTINCT
I.INSTALL [INSTALL]
,RD.GBR$CDATE [ENDDATE]
,RD.GBR$PDATE [STARTDATE]
INTO #SITELIST
FROM
LNSP_UAT.dbo.INSTALL I
LEFT JOIN LNSP_UAT.dbo.GBBILLREG RG ON RG.GBB$INSTALL = I.INSTALL
LEFT JOIN LNSP_UAT.dbo.GBREGISTER RD ON RD.GBR$BREGKEY = RG.GBB$REGKEY AND RD.GBR$STATUS = 25 AND RD.GBR$CTYPE IN ('N','D','A','G','S')
SELECT
SL.INSTALL
,SL.STARTDATE
,SL.ENDDATE
,ST.ISSTATUS
FROM
#SITELIST SL
OUTER APPLY (
SELECT TOP 1
ST.ISSTATUS
FROM
LNSP_UAT.dbo.INSTALLSTATUS ST
WHERE
ST.ISINSTALL = SL.INSTALL
AND ST.ISREVERSE != 'Y'
AND ST.ISEFFDATE <= SL.STARTDATE
ORDER BY
ISEFFDATE DESC
) A
ORDER BY ENDDATE DESC
DROP TABLE #SITELIST

What you're looking for is APPLY:
SELECT
b.*, a.Status
FROM TableB b
OUTER APPLY (
SELECT TOP 1 Status
FROM TableA
WHERE
EntityID = b.EntityID
AND IsReversed != 'Y'
AND StatusDate <= b.MaintDate
ORDER BY StatusDate DESC
) a

SELECT
b.*,
(SELECT TOP 1
Status
FROM
TableA
WHERE
EntityID = b.EntityID
AND IsReversed != 'Y' AND StatusDate < b.MaintDate
ORDER BY
StatusDate DESC) as status
FROM TableB b

SQL Joins . One to many relationship

I have two tables as below
Table 1
-----------------------------------
UserID | UserName | Age | Salary
-----------------------------------
1 | foo | 22 | 33000
-----------------------------------
Table 2
------------------------------------------------
UserID | Age | Salary | CreatedDate
------------------------------------------------
1 | NULL | 35000 | 2015-01-01
------------------------------------------------
1 | 28 | NULL | 2015-02-01
------------------------------------------------
1 | NULL | 28000 | 2015-03-01
------------------------------------------------
I need the result like this.
Result
-----------------------------------
UserID | UserName | Age | Salary
-----------------------------------
1 | foo | 28 | 28000
-----------------------------------
This is just an example. In my real project I have around 6 columns like Age and Salary in above tables.
In table 2 , each record will have only have one value i.e if Age has value then Salary will be NULL and viceversa.
UPDATE :
Table 2 has CreatedDate Column. So i want to get latest "NOTNULL" CELL Value instead of maximum value.

You can get this done using a simple MAX() and GROUP BY:
select t1.userid,t1.username, MAX(t2.Age) as Age, MAX(t2.Salary) as Salary
from table1 t1 join
table2 t2 on t1.userid=t2.userid
group by t1.userid,t1.username
Result:
userid username Age Salary
--------------------------------
1 foo 28 35000
Sample result in SQL Fiddle

Note: I'm giving you the benefit of the doubt that you know what you're doing, and you just haven't told us everything about your schema.
It looks like Table 2 is actually an "updates" table, in which each row contains a delta of changes to apply to the base entity in Table 1. In which case you can retrieve each column's data with a correlated join (technically an outer-apply) and put the results together. Something like the following:
select a.UserID, a.UserName,
coalesce(aAge.Age, a.Age),
coalesce(aSalary.Salary, a.Salary)
from [Table 1] a
outer apply (
select Age
from [Table 2] x
where x.UserID = a.UserID
and x.Age is not null
and not exists (
select 1
from [Table 2] y
where x.UserID = y.UserID
and y.Id > x.Id
and y.Age is not null
)
) aAge,
outer apply (
select Salary
from [Table 2] x
where x.UserID = a.UserID
and x.Salary is not null
and not exists (
select 1
from [Table 2] y
where x.UserID = y.UserID
and y.Id > x.Id
and y.Salary is not null
)
) aSalary
Do note I am assuming you have at minimum an Id column in Table 2 which is monotonically increasing with each insert. If you have a "change time" column, use this instead to get the latest row, as it is better.

To get the latest value based on CreatedDate, you can use ROW_NUMBER to filter for latest rows. Here the partition is based UserID and the other columns, Age and Salary.
SQL Fiddle
;WITH Cte AS(
SELECT
UserID,
Age = MAX(Age),
Salary = MAX(Salary)
FROM(
SELECT *, Rn = ROW_NUMBER() OVER(
PARTITION BY
UserID,
CASE
WHEN Age IS NOT NULL THEN 1
WHEN Salary IS NOT NULL THEN 2
END
ORDER BY CreatedDate DESC
)
FROM Table2
)t
WHERE Rn = 1
GROUP BY UserID
)
SELECT
t.UserID,
t.UserName,
Age = ISNULL(c.Age, t.Age),
Salary = ISNULL(c.Salary, t.Salary)
FROM Table1 t
LEFT JOIN Cte c
ON t.UserID = c.UserID

following query should work(working fine in MSSQL) :
select a.userID,a.username,b.age,b.sal from <table1> a
inner join
(select userID,MAX(age) age,MAX(sal) sal from <table2> group by userID) b
on a.userID=b.userID

Compare Multiple rows In SQL Server

I have a SQL Server database full of the following (fictional) data in the following structure:
ID | PatientID | Exam | (NON DB COLUMN FOR REFERENCE)
------------------------------------
1 | 12345 | CT | OK
2 | 11234 | CT | OK(Same PID but Different Exam)
3 | 11234 | MRI | OK(Same PID but Different Exam)
4 | 11123 | CT | BAD(Same PID, Same Exam)
5 | 11123 | CT | BAD(Same PID, Same Exam)
6 | 11112 | CT | BAD(Conflicts With ID 8)
7 | 11112 | MRI | OK(SAME PID but different Exam)
8 | 11112 | CT | BAD(Conflicts With ID 6)
9 | 11123 | CT | BAD(Same PID, Same Exam)
10 | 11123 | CT | BAD(Same PID, Same Exam)
I am trying to write a query with will go through an identify everything that isn't bad as per my example above.
Overall, a patient (identified by PatientId) can have many rows, but may not have 2 or more rows with the same exam!
I have attempted various modifications of exams I found on here but still with no luck.
Thanks.

You seem to want to identify duplicates, ranking them as good or bad. Here is a method using window functions:
select t.id, t.patientid, t.exam,
(case when cnt > 1 then 'BAD' else 'OK' end)
from (select t.*, count(*) over (partition by patientid, exam) as cnt
from table t
) t;

use Count() over() :
select *,case when COUNT(*) over(partition by PatientID, Exam) > 1 then 'bad' else 'ok'
from yourtable

You can also use:
;WITH CTE_Patients
(ID, PatientID, Exam, RowNumber)
AS
(
SELECT ID, PatientID, Exam
ROW_NUMBER() OVER (PARTITION BY PatientID, Exam ORDER BY ID)
FROM YourTableName
)
SELECT TableB.ID, TableB.PatientID, TableB.Exam, [DuplicateOf] = TableA.ID
FROM CTE_Patients TableB
INNER JOIN CTE_Patients TableA
ON TableB.PatientID = TableA.PatientID
AND TableB.Exam = TableA.Exam
WHERE TableB.RowNumber > 1 -- Duplicate rows
AND TableA.RowNumber = 1 -- Unique rows
I have a sample here: SQL Server – Identifying unique and duplicate rows in a table, you can identify unique rows as well as duplicate rows

If you don't want to use a CTE or Count Over, you can also group the Source table, and select from there...(but I'd be surprised if #Gordon was too far off the mark with the original answer :) )
SELECT a.PatientID, a.Exam, CASE WHEN a.cnt > 1 THEN 'BAD' ELSE 'OK' END
FROM ( SELECT PatientID
,Exam
,COUNT(*) AS cnt
FROM tableName
GROUP BY Exam
,PatientID
) a

Select those patients that never have 2 or more exams of same type.
select * from patients t1
where not exists (select 1 from patients t2
where t1.PatientID = t2.PatientID
group by exam
having count(*) > 1)
Or, if you want all rows, like in your example:
select ID,
PatientID,
Exam,
case when exists (select 1 from patients t2
where t1.PatientID = t2.PatientID
group by exam
having count(*) > 1) then 'BAD' else 'OK' end
from patients

Group by minimum value in one field while selecting distinct rows

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a

How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.

I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?

To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.

I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma

This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1

If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4

SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.

This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.

The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM

select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL conditional GROUP BY: how to do it? - sql

You could union 2 queries, one of which does the groups in the non-null entries, and the other that gets the null ones.

Related

Select first rows where condition [duplicate]

Extract Historical Status for list of dates

SQL Joins . One to many relationship

Compare Multiple rows In SQL Server

Group by minimum value in one field while selecting distinct rows

Categories

Resources