Compare Multiple rows In SQL Server - sql

I have a SQL Server database full of the following (fictional) data in the following structure:
ID | PatientID | Exam | (NON DB COLUMN FOR REFERENCE)
------------------------------------
1 | 12345 | CT | OK
2 | 11234 | CT | OK(Same PID but Different Exam)
3 | 11234 | MRI | OK(Same PID but Different Exam)
4 | 11123 | CT | BAD(Same PID, Same Exam)
5 | 11123 | CT | BAD(Same PID, Same Exam)
6 | 11112 | CT | BAD(Conflicts With ID 8)
7 | 11112 | MRI | OK(SAME PID but different Exam)
8 | 11112 | CT | BAD(Conflicts With ID 6)
9 | 11123 | CT | BAD(Same PID, Same Exam)
10 | 11123 | CT | BAD(Same PID, Same Exam)
I am trying to write a query with will go through an identify everything that isn't bad as per my example above.
Overall, a patient (identified by PatientId) can have many rows, but may not have 2 or more rows with the same exam!
I have attempted various modifications of exams I found on here but still with no luck.
Thanks.

You seem to want to identify duplicates, ranking them as good or bad. Here is a method using window functions:
select t.id, t.patientid, t.exam,
(case when cnt > 1 then 'BAD' else 'OK' end)
from (select t.*, count(*) over (partition by patientid, exam) as cnt
from table t
) t;

use Count() over() :
select *,case when COUNT(*) over(partition by PatientID, Exam) > 1 then 'bad' else 'ok'
from yourtable

You can also use:
;WITH CTE_Patients
(ID, PatientID, Exam, RowNumber)
AS
(
SELECT ID, PatientID, Exam
ROW_NUMBER() OVER (PARTITION BY PatientID, Exam ORDER BY ID)
FROM YourTableName
)
SELECT TableB.ID, TableB.PatientID, TableB.Exam, [DuplicateOf] = TableA.ID
FROM CTE_Patients TableB
INNER JOIN CTE_Patients TableA
ON TableB.PatientID = TableA.PatientID
AND TableB.Exam = TableA.Exam
WHERE TableB.RowNumber > 1 -- Duplicate rows
AND TableA.RowNumber = 1 -- Unique rows
I have a sample here: SQL Server – Identifying unique and duplicate rows in a table, you can identify unique rows as well as duplicate rows

If you don't want to use a CTE or Count Over, you can also group the Source table, and select from there...(but I'd be surprised if #Gordon was too far off the mark with the original answer :) )
SELECT a.PatientID, a.Exam, CASE WHEN a.cnt > 1 THEN 'BAD' ELSE 'OK' END
FROM ( SELECT PatientID
,Exam
,COUNT(*) AS cnt
FROM tableName
GROUP BY Exam
,PatientID
) a

Select those patients that never have 2 or more exams of same type.
select * from patients t1
where not exists (select 1 from patients t2
where t1.PatientID = t2.PatientID
group by exam
having count(*) > 1)
Or, if you want all rows, like in your example:
select ID,
PatientID,
Exam,
case when exists (select 1 from patients t2
where t1.PatientID = t2.PatientID
group by exam
having count(*) > 1) then 'BAD' else 'OK' end
from patients

Related

Select first rows where condition [duplicate]

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?
To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.
I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma
This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1
If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4
SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.
This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.
The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM
select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3

Getting distinct result with Oracle SQL

I have the following data structure
ID | REFID | NAME
1 | 100 | A
2 | 101 | B
3 | 101 | C
With
SELECT DISTINCT REFID, ID, NAME
FROM my_table
ORDER BY ID
I would like to have the following result:
1 | 100 | A
2 | 101 | B
Colum NAME and ID should contain the MIN or FIRST value.
But actually I get stuck at using MIN/FIRST here.
I welcome every tipps :-)
select id,
refid,
name
from (select id,
refid,
name,
row_number() over(partition by refid order by name) as rn
from my_table)
where rn = 1
order by id
You can use a subquery to do this.
WITH Q AS
( SELECT MIN(NAME) AS NAME, REFID FROM T GROUP BY REFID )
SELECT T.ID, T.REFID, T.NAME
FROM T
JOIN Q
ON (T.NAME = Q.NAME)
Also, note that SQL tables have no order. So there's no "First" value.

Getting percentage value of records using the max count of records in SQL Server

I'm a newbie to SQL learner and I have an issue I'd like you all to help me with. I've got a table User_Activity_Log that contains the names of students with their ID (user_id), Date of Attendance in the year (User_timestamp) in the format (February 25,2015).
Say the User_Activity_Log table contains
| user_id | user_timestamp |
| jude | February 22 |
| jude | February 24 |
| annie | February 1 |
| sam | January |
I'd like to know how to get a table showing the User Id, the number of counts a student is seen in the month and the percentage count, which should be gotten from the max(count) of a student.
Here's what I've done so far, this gives me error.
USE FinalYearProject
declare #maxval int
select
#maxval = (SELECT MAX(fromsubq.SM) as PA
FROM
(SELECT COUNT (user_Id) as SM
FROM dbo.User_Activity_Log
WHERE user_Timestamp LIKE 'February%'
GROPU BY User_Id) fromsubq
)
(SELECT COUNT
FROM dbo.User_Activity_Log
WHERE user_Timestamp like 'February%'
GROUP BY user_Id) * 100.0 / #maxval
Expected output should be
| User_id | Count | PercentageCount |
| Jude | 2 | 100 % |
| annie | 1 | 50 % |
| sam | 0 | 0 % |
Please help me point out the problem and possible solutions
Thanks in advance.
You can do this by using conditional aggregation in a subquery/cte and adding OVER() to an aggregate:
;with cte AS (SELECT User_ID
,SUM(CASE WHEN user_timestamp LIKE 'February%' THEN 1 ELSE 0 END) as CT
FROM User_Activity_Log
GROUP BY User_ID
)
SELECT User_ID
,CT
,CT*100.0 / MAX(CT) OVER() AS PercentageCount
FROM cte
ORDER BY CT DESC
Demo: SQL Fiddle
Note: It's bad practice to store dates as strings, if you can avoid it at all you should.
Edit: Here's how it would be done with a subquery instead of cte:
SELECT User_ID
,CT
,CT*100.0 / MAX(CT) OVER() AS PercentageCount
FROM (SELECT User_ID
,SUM(CASE WHEN user_timestamp LIKE 'February%' THEN 1 ELSE 0 END) as CT
FROM User_Activity_Log
GROUP BY User_ID
) AS Sub
ORDER BY CT DESC
UPDATE: To use the PercentageCount in a CASE expression, something like:
;with cte AS (SELECT User_ID
,SUM(CASE WHEN user_timestamp LIKE 'February%' THEN 1 ELSE 0 END) as CT
FROM User_Activity_Log
GROUP BY User_ID
)
,cte2 AS (SELECT User_ID
,CT
,CT*100.0 / MAX(CT) OVER() AS PercentageCount
FROM cte
)
SELECT *,CASE WHEN PercentageCount > .5 THEN 'Qualified' ELSE 'NotQualified' END AS Qualified
FROM cte2
ORDER BY CT DESC
First find the Count per user_id in sub-select then find the the percentage in outer query.
Use max over() to find the max value in count then divide each count by max count to get the percentage. Try this.
SELECT user_Id,
Cnt AS [Count],
( Cnt / Max(Cnt) OVER() ) * 100 AS PercentageCount
FROM (SELECT Count(user_Id) AS Cnt,
user_Id,
FROM dbo.User_Activity_Log
WHERE user_Timestamp LIKE 'February%'
GROUP BY User_Id) A

SQL group by with a count

I have a table (simplified below)
|company|name |age|
| 1 | a | 3 |
| 1 | a | 3 |
| 1 | a | 2 |
| 2 | b | 8 |
| 3 | c | 1 |
| 3 | c | 1 |
For various reason the age column should be the same for each company. I have another process that is updating this table and sometimes it put an incorrect age in. For company 1 the age should always be 3
I want to find out which companies have a mismatch of age.
Ive done this
select company, name age from table group by company, name, age
but dont know how to get the rows where the age is different. this table is a lot wider and has loads of columns so I cannot really eyeball it.
Can anyone help?
Thanks
You should not be including age in the group by clause.
SELECT company
FROM tableName
GROUP BY company, name
HAVING COUNT(DISTINCT age) <> 1
SQLFiddle Demo
If you want to find the row(s) with a different age than the max-count age of each company/name group:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
select * from cte
where age <> maxAge
Demontration
If you want to update the incorrect with the correct ages you just need to replace the SELECT with UPDATE:
WITH CTE AS
(
select company, name, age,
maxAge=(select top 1 age
from dbo.table1 t2
group by company,name, age
having( t1.company=t2.company and t1.name=t2.name)
order by count(*) desc)
from dbo.table1 t1
)
UPDATE cte SET AGE = maxAge
WHERE age <> maxAge
Demonstration
Since you mentioned "how to get the rows where the age is different" and not just the comapnies:
Add a unique row id (a primary key) if there isn't already one. Let's call it id.
Then, do
select id from table
where company in
(select company from table
group by company
having count(distinct age)>1)

Group by minimum value in one field while selecting distinct rows

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?
To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.
I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma
This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1
If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4
SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.
This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.
The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM
select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3