query without duplicates SQL - sql

I have query with duplicates. And now I need to build query without duplicates. I'm trying to do it, but my query need long time. My query with duplicates:
SELECT
c.*
FROM
Clients c
INNER JOIN
(
SELECT
iin,
COUNT(iin) AS countIIN
FROM
Clients
GROUP BY
iin
HAVING
COUNT(iin) > 1
) cc
ON c.IIN = cc.IIN
ORDER BY
c.last_name DESC
I need above anti-query.

You can use below query to find only unique record.
WITH CTE AS
(SELECT *, COUNT(IIN) OVER (PARTITION BY IIN) RECORDCOUNT FROM CLIENTS)
SELECT * FROM CTE WHERE RECORDCOUNT =1
make sure * should be replace in query with required column.
Also if you want to fetch unique record from duplicate list as well then you can choose below query
WITH CTE AS
(SELECT *, RECORD_NUMBER() OVER (PARTITION BY IIN ORDER BY IIN) RECORDCOUNT FROM CLIENTS)
SELECT * FROM CTE WHERE RECORDCOUNT =1

To find duplicates in SQL Row_Number() function is best option,
Please check following query
WITH [CTE NoDuplicates] AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY iin ORDER BY c.last_name DESC),
*
FROM Clients
)
DELETE FROM [CTE DUPLICATE] WHERE RN = 1

Related

How to get the records from inner query results with the MAX value

The results are below. I need to get the records (seller and purchaser) with the max count- grouped by purchaser (marked with yellow)
You can use window functions:
with q as (
<your query here>
)
select q.*
from (select q.*,
row_number() over (order by seller desc) as seqnum_s,
row_number() over (order by purchaser desc) as seqnum_p
from q
) q
where seqnum_s = 1 or seqnum_p = 1;
Try this:
SELECT COUNT,seller,purchaser FROM YourTable ORDER BY seller,purchaser DESC
SELECT T2.MaxCount,T2.purchaser,T1.Seller FROM <Yourtable> T1
Inner JOIN
(
Select Max(Count) as MaxCount, purchaser
FROM <Yourtable>
GROUP BY Purchaser
)T2
On T2.Purchaser=T1.Purchaser AND T2.MaxCount=T1.Count
First you select the Seller from which will give you a list of all 5 sellers. Then you write another query where you select only the Purchaser and the Max(count) grouped by Purchaser which will give you the two yellow-marked lines. Join the two queries on fields Purchaser and Max(Count) and add the columns from the joined table to your first query.
I can't think of a faster way but this works pretty fast even with rather large queries. You can further-by order the fields as needed.

Is there any optimised way in sql sever to optimse this code, I am trying to find 2nd duplicate

Is there any optimised way in sql sever to optimse this code, I am trying to find 2nd duplicate
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY id,AN_KEY ORDER BY [ENTITYID]) AS [rn]
FROM [data].[dbo].[TRANSFER]
)
select *
INTO dbo.#UpSingle
from CTE
where RN=2
UPDATE:
As GurV pointed out - this query doesn't solve the problem. It will only give you the items that have exactly two duplicates, but not the row where the second duplicate lies.
I am just going to leave this here from reference purposes.
Original Answer
Why not try something like this from another SO post: Finding duplicate values in a SQL table
SELECT
id, AN_KEY, COUNT(*)
FROM
[data].[dbo].[TRANSFER]
GROUP BY
id, AN_KEY
HAVING
COUNT(*) = 2
I gather from your original SQL that the cols you would want to group by on are :
Id
AN_KEY
Here is another way to get the the second duplicate row (in the order of increasing ENTITYID of course):
select *
from [data].[dbo].[TRANSFER] a
where [ENTITYID] = (
select min([ENTITYID])
from [data].[dbo].[TRANSFER] b
where [ENTITYID] > (
select min([ENTITYID])
from [data].[dbo].[TRANSFER] c
where b.id = c.id
and b.an_key = c.an_key
)
and a.id = b.id
and a.an_key = b.an_key
)
Provided there is an index on id, an_key and ENTITYID columns, performance of both your query and this should be acceptable.
Let me assume that this query does what you want:
WITH CTE AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id, AN_KEY
ORDER BY [ENTITYID]) AS [rn]
FROM [data].[dbo].[TRANSFER] t
)
SELECT *
INTO dbo.#UpSingle
FROM CTE
WHERE RN = 2;
For performance, you want a composite index on [data].[dbo].[TRANSFER](id, AN_KEY, ENTITYID).

How to use distinct when you select multiple column in SQL

I have use simple inner join statement and getting result into CTE table. I want to select distinct 'ServiceId' from CTE. I have following query
SELECT DISTINCT(ServicesId), ServiceNo, ServiceDate, DealerCode FROM CTE_Temp
Suppose there are duplicate entries of ServiceId in CTE then I want to select first entry only and ignore rest of them.
You can use ROW_NUMBER() OVER() for this. Just replace the column in the ORDER BY to define what's first.
;WITH AnotherCTE AS(
SELECT
ServicesId, ServiceNo, ServiceDate, DealerCode,
RN = ROW_NUMBER() OVER(PARTITION BY ServicesID ORDER BY ServiceDate DESC)
FROM CTE_Temp
)
SELECT
ServicesId, ServiceNo, ServiceDate, DealerCode
FROM AnotherCTE
WHERE RN = 1

Select the first instance of a record

I have a table, myTable that has two fields in it ID and patientID. The same patientID can be in the table more than once with a different ID. How can I make sure that I get only ONE instance of every patientID.?
EDIT: I know this isn't perfect design, but I need to get some info out of the database and today and then fix it later.
You could use a CTE with ROW_NUMBER function:
WITH CTE AS(
SELECT myTable.*
, RN = ROW_NUMBER()OVER(PARTITION BY patientID ORDER BY ID)
FROM myTable
)
SELECT * FROM CTE
WHERE RN = 1
It sounds like you're looking for DISTINCT:
SELECT DISTINCT patientID FROM myTable
you can get the same "effect" with GROUP BY:
SELECT patientID FROM myTable GROUP BY patientID
The simple way would be to add LIMIT 1 to the end of your query. This will ensure only a single row is returned in the result set.
WITH CTE AS
(
SELECT tableName.*,ROW_NUMBER() OVER(PARTITION BY patientID ORDER BY patientID) As 'Position' FROM tableName
)
SELECT * FROM CTE
WHERE
Position = 1

Column is invalid error when using derived table

I'm using ROW_NUMBER() and a derived table to fetch data from the derived table result.
However, I get the error message telling me I don't have the appropriate columns in the GROUP BY clause.
Here's the error:
Column 'tblCompetition.objID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
What column am I missing? Or am I doing something else wrong? Find below the query that is not working, and the (more simple) query that is working.
SQL Server 2008.
Query that isn't working:
SELECT
objID,
objTypeID,
userID,
datAdded,
count,
sno
FROM
(
SELECT scc.objID,scc.objTypeID,scc.userID,scc.datAdded,
COUNT(sci.favID) as count,
ROW_NUMBER() OVER(PARTITION BY scc.userID ORDER BY scc.unqID DESC) as sno
FROM tblCompetition scc
LEFT JOIN tblFavourites sci
ON sci.favID = scc.objID
AND sci.datTimeStamp BETWEEN #datStart AND #datEnd
) as t
WHERE sno <= 2 AND objTypeID = #objTypeID
AND datAdded BETWEEN #datStart AND #datEnd
GROUP BY objID,objTypeID,userID,datAdded,count,sno
Simple query that is working:
SELECT objId,objTypeID,userId,datAdded FROM
(
SELECT objId,objTypeID,userId,datAdded,
ROW_NUMBER() OVER(PARTITION BY userId ORDER BY unqid DESC) as sno
FROM tblRdbCompetition
) as t
WHERE sno<=2 AND objtypeid=#objTypeID
AND datAdded BETWEEN #datStart AND #datEnd
Thank you!
you need the GROUP BY in your subquery since that's where the aggregate is:
SELECT
objID,
objTypeID,
userID,
datAdded,
count,
sno
FROM
(
SELECT scc.objID,scc.objTypeID,scc.userID,scc.datAdded,
COUNT(sci.favID) as count,
ROW_NUMBER() OVER(PARTITION BY scc.userID ORDER BY scc.unqID DESC) as sno
FROM tblCompetition scc
LEFT JOIN tblFavourites sci
ON sci.favID = scc.objID
AND sci.datTimeStamp BETWEEN #datStart AND #datEnd
GROUP BY scc.objID,scc.objTypeID,scc.userID,scc.datAdded) as t
WHERE sno <= 2 AND objTypeID = #objTypeID
AND datAdded BETWEEN #datStart AND #datEnd
You cannot have count in a group by clause. Infact the count is derived when you have other fields in group by. Remove count from your Group by.
In the innermost query you are using
COUNT(sci.favID) as count,
which is an aggregate, and you select other non-aggregating columns along with it.
I believe you wanted an analytic COUNT instead:
SELECT objID,
objTypeID,
userID,
datAdded,
count,
sno
FROM (
SELECT scc.objID,scc.objTypeID,scc.userID,scc.datAdded,
COUNT(sci.favID) OVER (PARTITION BY scc.userID ) AS count,
ROW_NUMBER() OVER (PARTITION BY scc.userID ORDER BY scc.unqID DESC) as sno
FROM tblCompetition scc
LEFT JOIN
tblFavourites sci
ON sci.favID = scc.objID
AND sci.datTimeStamp BETWEEN #datStart AND #datEnd
) as t
WHERE sno = 1
AND objTypeID = #objTypeID