Remove duplicates in SQL Result set of ONE table

Remove duplicates in SQL Result set of ONE table - sql

Afternoon/Evening all,
I'm looking for the final touches to the below query. I need to remove the duplicate occurrences of a column in a particular row. Currently using the below SQL:
SELECT CBNEW.*
FROM CallbackNewID CBNEW
INNER JOIN (SELECT IDNEW, MAX(CallbackDate) AS MaxDate
FROM CallbackNewID
GROUP BY IDNEW) AS groupedCBNEW
ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate) AND (CBNEW.IDNEW = groupedCBNEW.IDNEW);
My result set looks like the below
ID RecID Comp Rem Date_ IDNEW IDOLD CB? CallbackDate
138618 83209 1 0 2012-03-16 12:40:00 83209 83209 2 16-Mar-12
138619 83209 1 0 2012-03-16 12:40:00 83209 83209 2 16-Mar-12
110470 83799 1 0 2011-07-27 11:46:00 83799 83799 10 27-Jul-11
110471 83799 1 0 2011-07-27 11:46:00 83799 83799 10 27-Jul-11
This however gives me duplicate values in the CallBackDate and IDNEW Column because in the table there are some different Primary Keys with the same IDNEW and CallbackDate values.
If I dump this result into Excel, I can just use remove duplicates on the first ID column, and the problem's solved.
But what I want to do is make sure my result only includes the FIRST instance of the ID column, where IDNEW and CallbackDate are duplicated.
I'm sure I just need to append a tiny piece of SQL, but I'm stuck if I can find the answer so far.
Your help is very much appreciated.

Try adding MIN(ID) to the inner query and then adding it also on the ON clause:
SELECT CBNEW.*
FROM CallbackNewID CBNEW
INNER JOIN (SELECT IDNEW, MIN(ID) AS MinId, MAX(CallbackDate) AS MaxDate
FROM CallbackNewID
GROUP BY IDNEW) AS groupedCBNEW
ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate)
AND (CBNEW.IDNEW = groupedCBNEW.IDNEW)
AND (CBNEW.ID = groupedCBNEW.MinId) ;
sqlfiddle demo

Here is a rather "brute force" approach. It just takes the results of your original query and does Min() on [ID], Max() on [Comp] and [Rem], and GROUP BY on everything else:
SELECT
Min(t.ID) AS MinOfID,
t.RecID,
Max(t.Comp) AS MaxOfComp,
Max(t.Rem) AS MaxOfRem,
t.Date_,
t.IDNEW,
t.IDOLD,
t.[CB?],
t.CallbackDate
FROM
(
SELECT CBNEW.*
FROM
CallbackNewID CBNEW
INNER JOIN
(
SELECT IDNEW, MAX(CallbackDate) AS MaxDate
FROM CallbackNewID
GROUP BY IDNEW
) AS groupedCBNEW
ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate)
AND (CBNEW.IDNEW = groupedCBNEW.IDNEW)
) t
GROUP BY
t.RecID,
t.Date_,
t.IDNEW,
t.IDOLD,
t.[CB?],
t.CallbackDate;
It might not be terribly elegant, but if it works....

In MS SQL Server, I think you are looking for the ROW_NUMBER() function.
Something like this should help you get what you are looking for:
SELECT
X.*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY DBNEW.IDNEW, DBNEW.MaxDate) [row_num]
FROM
CallbackNewID CBNEW
INNER JOIN
(
SELECT
IDNEW,
MAX(CallbackDate) AS MaxDate
FROM
CallbackNewID
GROUP BY
IDNEW
) AS groupedCBNEW ON (CBNEW.CallbackDate = groupedCBNEW.MaxDate) AND (CBNEW.IDNEW = groupedCBNEW.IDNEW)
) X
WHERE
X.row_num = 1

SELECT
A.*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY IDNEW ORDER BY CallbackDate DESC)
AS [row_num]
FROM CallbackNewID
) A
WHERE
A.row_num = 1

Related

How to pivot two rows into two columns

I have the following SQL Query:
select
distinct
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
from
Equipment,
Studies,
Equipment_Reserved
where
Studies.Study = 'MAINT19-01'
and
Equipment.idEquipment = Equipment_Reserved.Equipment_idEquipment
and
Studies.idStudies = Equipment_Reserved.Studies_idStudies
and
Equipment.Type = 'Probe'
This query produces the following results:
Equipment_Attached_To Name
2297 R1-P1
2297 R1-P2
2299 R1-P3
I would like to change it to the following:
Equipment_Attached_To Name1 Name2
2297 R1-P1 R1-P2
2299 R1-P3 NULL
Thanks for your help!

I'd first change your query from the old, legacy JOIN syntax to an explicit join as it makes the query easier to understand:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
I don't think you actually need a PIVOT - I think you can do this with a nested query with the ROW_NUMBER function. I've seen that PIVOT queries often have worse query execution plans than nested-queries.
Let's add ROW_NUMBER (which require an ORDER BY as it's a windowing-function) and a matching ORDER BY in the whole query to make it consistent). Let's also use PARTITION BY so it resets the row-number for each Equipment_Attached_To value:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
ORDER BY
Equipment_Attached_To,
[Name]
This will give output like this:
Equipment_Attached_To Name RowNumber
2297 R1-P1 1
2297 R1-P2 2
2299 R1-P3 1
This can then be split out into explicit columns like so below. The use of MAX() is arbitrary (we could use MIN() instead) and only because we're dealing with a GROUP BY and because the CASE WHEN... restricts the input set to just 1 row anyway.
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
-- the query from above
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
So the final query is:
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2

Let's start with some basics.
To facilitate reading the code, I added alias to the tables using their initials.
Then, I converted the old join syntax which is partly deprecated to use the standard syntax since 1992 (27 years and people still use the old syntax).
Finally, since there are only 2 possible values, we can use MIN and MAX to separate them in 2 columns.
And because we're using aggregate functions, we remove the DISTINCT and use GROUP BY
The code now looks like this:
SELECT er.Equipment_Attached_To,
--Gets the first row for the id
MIN( e.Name) AS Name1,
--If the MAX is equal to the MIN, returns a NULL. If not, it returns the second value.
NULLIF( MAX(e.Name), MIN( e.Name)) AS Name2
FROM Equipment e
JOIN Studies s ON s.idStudies = er.Studies_idStudies
JOIN Equipment_Reserved er ON e.idEquipment = er.Equipment_idEquipment
WHERE s.Study = 'MAINT19-01'
AND e.Type = 'Probe'
GROUP BY er.Equipment_Attached_To;

Update table with another column in the same table

I have a table like this
Test_order
Order Num Order ID Prev Order ID
987Y7OP89 919325 0
987Y7OP90 1006626 919325
987Y7OP91 1029350 1006626
987Y7OP92 1756689 0
987Y7OP93 1756690 0
987Y7OP94 1950100 1756690
987Y7OP95 1977570 1950100
987Y7OP96 2160462 1977570
987Y7OP97 2288982 2160462
Target table should be like below,
Order Num Order ID Prev Order ID
987Y7OP89 919325 0
987Y7OP90 1006626 919325
987Y7OP91 1029350 1006626
987Y7OP92 1756689 1029350
987Y7OP93 1756690 1756689
987Y7OP94 1950100 1756690
987Y7OP95 1977570 1950100
987Y7OP96 2160462 1977570
987Y7OP97 2288982 2160462
987Y7OP97 2288900 2288982
Prev Order ID should be updated with the Order ID from the previous record from the same table.
I'm trying to create a dummy data set and update..but it's not working..
WITH A AS
(SELECT ORDER_NUM, ORDER_ID, PRIOR_ORDER_ID,ROWNUM RID1 FROM TEST_ORDER),B AS (SELECT ORDER_NUM, ORDER_ID, PRIOR_ORDER_ID,ROWNUM+1 RID2 FROM TEST_ORDER)
SELECT A.ORDER_NUM,B.ORDER_ID,A.PRIOR_ORDER_ID,B.PRIOR_ORDER_ID FROM A,B WHERE RID1 = RID2

You could use Oracles Analytical Functions (also called Window functions) to pick up the value from the previous order:
UPDATE Test_Order
SET ORDERID = LAG(ORDERID, 1, 0) OVER (ORDER BY ORDERNUM ASC)
WHERE PrevOrderId = 0
See here for the documentation on LAG()

In sql-server you cannot use window function in update statement, not positive but don't think so in Oracle either. Anyway to get around that you can just update a cte as follows.
WITH cte AS (
SELECT
*
,NewPreviousOrderId = LAG(OrderId,1,0) OVER (ORDER BY OrderNum)
FROM
TableName
)
UPDATE cte
SET PrevOrderId = NewPreviousOrderId
And if you want to stick with the ROW_NUMBER route you were going this would be the way of doing it.
;WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (ORDER BY OrderNum) AS RowNum
FROM
TableName
)
UPDATE c1
SET PrevOrderId = c2.OrderId
FROM
cte c1
INNER JOIN cte c2
ON (c1.RowNum - 1) = c2.RowNum

Grouping by date range combined with another field?

In SQL Server 2008, I have something like the following:
Create table #RateHistory (RatePlan char(1), EventDate datetime)
Insert into #RateHistory (RatePlan, EventDate)
VALUES
('a','10/01/2013')
,('a','10/04/2013')
,('a','10/06/2013')
,('a','10/08/2013')
,('b','10/21/2013')
,('b','11/05/2013')
,('b','11/12/2013')
,('b','12/05/2013')
,('a','12/08/2013')
,('a','12/09/2013')
,('a','12/10/2013')
,('a','12/15/2013')
I'd like to see an output like this:
Rateplan MinDate MaxDate
-------- ----------- -----------
a 2013-10-01 2013-10-08
b 2013-10-21 2013-12-05
a 2013-12-08 2013-12-15
(originally this was a bit different, but I believe this result set makes it clearer what I actually need, which is the correct grouping)
Note that RatePlan "a" shows up twice, and that I want it to be grouped separately - once for the 10/1/2013 to 10/8/2013 data, and once for the 12/8/2013 to 12/15/2013 data. I've got the solution I need with this :
-- Get initial row numbers
;with Test as (
Select
*
,RowNumber = ROW_NUMBER() over (order by EventDate)
from #RateHistory
)
-- Get initial row numbers
, Test2 as (
SelecT
Main.RowNumber
,Main.EventDate
,Main.RatePlan
,FollowingRatePlan = Following.RatePlan
,NewGroup =
case
when Main.RatePlan <> Following.RatePlan
-- if Following RatePlan is null, that means this is the last record
or (Following.RatePlan is null )
then Main.EventDate
else null
end
from Test Main
left join Test following
on Following.RowNumber = Main.RowNumber + 1
)
, Test3 as (
select
#RateHistory.RatePlan
,#RateHistory.EventDate
,MaxDate = min(Test2.NewGroup)
from #RateHistory
join Test2
on #RateHistory .RatePlan = Test2.RatePlan
and #RateHistory .EventDate <= Test2.NewGroup
where Test2.NewGroup is not null
group by
#RateHistory.RatePlan
,#RateHistory.EventDate
)
select Rateplan, MinDate = MIN(EventDate) , MaxDate
from Test3
group by RatePlan,MaxDate
...but I'm thinking - there's GOT to be a better, more elegant way of doing this. Thoughts? If nobody has anything better, I'll just go ahead and put this in as an answer...
Thanks!

I can think of a solution using correlated scalar sub-queries. You tell me if it's more elegant. Or better performing.
select distinct
rh0.RatePlan,
(
select min(EventDate)
from RateHistory rh1
where rh1.RatePlan = rh0.RatePlan
and rh1.EventDate <= rh0.EventDate
and not exists
(
select * from RateHistory rh2
where rh2.RatePlan != rh0.RatePlan
and rh2.EventDate > rh1.EventDate
and rh2.EventDate < rh0.EventDate
)
) as mindate,
(
select max(EventDate)
from RateHistory rh1
where rh1.RatePlan = rh0.RatePlan
and rh1.EventDate >= rh0.EventDate
and not exists
(
select * from RateHistory rh2
where rh2.RatePlan != rh0.RatePlan
and rh2.EventDate < rh1.EventDate
and rh2.EventDate > rh0.EventDate
)
) as maxdate
from RateHistory rh0
order by mindate
Check out the SQLFiddle. BTW 2012 has some cool features that could make your version of the query more elegant.

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.

So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'

The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.

SQL ROW_NUMBER with INNER JOIN

I need to use ROW_NUMBER() in the following Query to return rows 5 to 10 of the result. Can anyone please show me what I need to do? I've been trying to no avail. If anyone can help I'd really appreciate it.
SELECT *
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity

You need to stick it in a table expression to filter on ROW_NUMBER. You won't be able to use * as it will complain about the column name starRating appearing more than once so will need to list out the required columns explicitly. This is better practice anyway.
WITH CTE AS
(
SELECT /*TODO: List column names*/
ROW_NUMBER()
OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity) AS RN
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT /*TODO: List column names*/
FROM CTE
WHERE RN BETWEEN 5 AND 10
ORDER BY RN

You can use a with clause. Please try the following
WITH t AS
(
SELECT villa_data.starRating,
villa_data.capacity,
villa_data.bedrooms,
villa_prices.period,
villa_prices.price,
ROW_NUMBER() OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity ) AS 'RowNumber'
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT *
FROM t
WHERE RowNumber BETWEEN 5 AND 10;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove duplicates in SQL Result set of ONE table - sql

SELECT A.* FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY IDNEW ORDER BY CallbackDate DESC) AS [row_num] FROM CallbackNewID ) A WHERE A.row_num = 1

Related

How to pivot two rows into two columns

Update table with another column in the same table

Grouping by date range combined with another field?

Fastest way to check if the the most recent result for a patient has a certain value

SQL ROW_NUMBER with INNER JOIN

Categories

Resources