How to find first duplicate row in a table sql server - sql

I am working on SQL Server. I have a table, that contains around 75000 records. Among them there are several duplicate records. So i wrote a query to know which record repeated how many times like,
SELECT [RETAILERNAME],COUNT([RETAILERNAME]) as Repeated FROM [Stores] GROUP BY [RETAILERNAME]
It gives me result like,
---------------------------
RETAILERNAME | Repeated
---------------------------
X | 4
---------------------------
Y | 6
---------------------------
Z | 10
---------------------------
Among 4 record(s) of X record, i need take only first record of X.
so here i want to retrieve all fields from first row of duplicate records. i.e. Take all records whose RETAILERNAME='X' we will get some no. of duplicate records, we need to get only first row from them.
Please guide me.

You could try using ROW_NUMBER.
Something like
;WITH Vals AS (
SELECT [RETAILERNAME],
ROW_NUMBER() OVER(PARTITION BY [RETAILERNAME] ORDER BY [RETAILERNAME]) RowID
FROM [Stores ]
)
SELECT *
FROm Vals
WHERE RowID = 1
SQL Fiddle DEMO
You can then also remove the duplicates if need be (BUT BE CAREFUL THIS IS PERMANENT)
;WITH Vals AS (
SELECT [RETAILERNAME],
ROW_NUMBER() OVER(PARTITION BY [RETAILERNAME] ORDER BY [RETAILERNAME]) RowID
FROM Stores
)
DELETE
FROM Vals
WHERE RowID > 1;

You Can write query as under
SELECT TOP 1 * FROM [Stores] GROUP BY [RETAILERNAME]
HAVING your condition

WITH cte
AS (SELECT [retailername],
Row_number()
OVER(
partition BY [retailername]
ORDER BY [retailername])'RowRank'
FROM [retailername])
SELECT *
FROM cte

Related

SQL query looping for each value in a list

New to SQL here - I am trying to get 1 row from a table matching to a particular criteria
Typically this would look like
SELECT TOP 1 *
FROM myTable
WHERE id = 'abc'
The output may look like
value id
--------------
1 abc
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's. How would I execute something like
SELECT TOP 1 *
FROM myTable
FOR EACH id
WHERE id IN ('abc', 'edf', 'fgh')
Expecting result like
value id
--------------
1 abc
10 edf
12 fgh
I do not know if it is some sort union or concat operation, but would like to learn. I am working on Azure SQL Server
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's.
A typical method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from mytable t
) t
where seqnum = 1;
Note: you can filter on particular ids, if you want. It is unclear if that is really required for your question.
If you happen to be using SQL Server (as select top suggests), you can use the more concise, but somewhat less performant:
select top (1) with ties t.*
from mytable t
order by row_number() over (order by id order by (select null));

Select multiple columns having distinct just in 3 of them

i've got a table that i need to return about 14 column values but only return 1 row for the duplicates on some of the columns.
The second problem is that between the duplicates i need to keep the one that has the biggest int in one of the columns that is not required to be unique.
Since the Table is somewhat big, I am seeking advice into doing this in the most efficient way.
should i be doing a group by?
my table is somewhat like this, i will simplify the number of columns.
ID(UniqueIdentifier) | ACCID(UniqueIdentifier) | DateTime(DateTime) | distance(int)|type(int)
28761188-0886-E911-822F-DD1FA635D450 1238FD8A-BD00-411A-A81C-0F6F5C026BCC 2019-06-03 14:04:41.000 2 3
41761188-0886-E911-822F-DD1FA635D450 1238FD8A-BD00-411A-A81C-0F6F5C026BCC 2019-06-03 14:04:41.000 1 3
I should be only selecting when ACCID and DATETIME is unique, the column ID in primary so will never be duplicate, and i need to keep the row with the biggest distance.
You can use the ROW_NUMBER() window function, as in:
select *
from (
select
id,
accid,
datetime,
distance,
type,
row_number() over(partition by accid, datetime order by type desc) as rn
from t
) x
where rn = 1
If you want to show multiple "ties", then replace ROW_NUMBER() by RANK().
I would suggest a correlated subquery with the right index as the fastest method:
select t.*
from t
where t.id = (select top (1) t2.id
from t t2
where t2.ACCID = t.ACCID
order by t2.distance desc
) ;
The best index is on (ACCID, distance desc, id).

Permuting values in SQL

Let's say I have a table with two columns:
id | value
----------
1 | 101
2 | 356
3 | 28
I need to randomly permute the value column so that each id is randomly assigned a new value from the existing set {101,356,28}. How could I do this in Oracle SQL?
It may sound odd but this is a real problem, just with more columns.
You can do this by using row_number() with a random number generator and then joining back to the original rows:
with cte as (
select id, value,
row_number() over (order by id) as i,
row_number() over (order by dbms_random.random) as rand_i
from table t
)
select cte.id, cte1.value
from cte join
cte cte1
on cte.i = cte.rand_i;
This guarantees a permutation (i.e. no original row has its value used twice).
EDIT:
By the way, if the original ids are sequential from 1 and have no gaps, you could just do:
select row_number() over (order by dbms.random) as id, value
from table t;
An Option : select * from x_table where id = round(dbms_random.value() * 3) + 1; [Here 3 is the number of rows in your random data table and I am assuming that id is incremental and unique?]
I'll think of other options.
I'm not sure whether this is the right task for SQL database. Maybe you should implement something like this:
Factoradic permutation - in PL/SQL and then return a cursor via PIPE ROW construct. Ordering by dbms.random might be slow for large data sets.

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

Deletion of duplicate records using one query only

I am using SQL server 2005.
I have a table like this -
ID Name
1 a
1 a
1 a
2 b
2 b
3 c
4 d
4 d
In this, I want to delete all duplicate entries and retain only one instance as -
ID Name
1 a
2 b
3 c
4 d
I can do this easily by adding another identity column to this table and having unique numbers in it and then deleting the duplicate records. However I want to know if I can delete the duplicate records without adding that additional column to this table.
Additionally if this can be done using only one query statement. i.e. Without using Stored procedures or temp tables.
Using a ROW_NUMBER in a CTE allows you to delete duplicate values while retaining unique rows.
WITH q AS (
SELECT RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID )
, ID
, Name
FROM ATable
)
DELETE FROM q WHERE RN > 1
Lieven is Right... however you may want to tweak lieven's code by just adding a top clause in the delete statement like this:
delete top(1) from q where RN > 1;
Hope this helps
You may use this query:
delete a from
(select id,name, ROW_NUMBER() over (partition by id,name order by id) row_Count
from dup_table) a
where a.row_Count >1
delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.Name=vtable.Name)
DELETE FROM tbl
WHERE ID NOT IN (
SELECT MIN(ID)
FROM tbl
GROUP BY Name
)