Removing duplicate rows but retaining unique copy - sql

I got a SQL table of the form:-
Name Age
Kim 5
Tom 8
Jim 12
Kim 5
David 21
Jim 12
In the above scenario, Kim and Jim are duplicated and they exist twice.
If I have a large table of the above form, with about 60000 entries, how do I write a sql query to delete only the duplicates but still retain the unique ones?

With the help of Common Table Expression and a Window Function, you can easily delete duplicate records.
WITH dups
AS
(
SELECT Name, Age,
ROW_NUMBER() OVER (PARTITION BY Name, Age ORDER BY Age DESC) rn
FROM TableName
)
DELETE FROM dups
WHERE rn > 1
SQLFiddle Demo
Ranking Functions (Transact-SQL)

Related

Display DISTINCT value on SQL statement by column condition

i'm introducing you the problem with DISTINCT values by column condition i have dealt with and can't provide
any idea how i can solve it.
So. The problem is i have two Stephen here declared , but i don't want duplicates:
**
The problem:
**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
11 1 1 employees Stephen
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
**
The desired output:**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
I have tried CASE statements but without success. When i group by it by worker_id,
it removes another duplicates, so i figured out it needs to be grouped by some special condition?
If anyone can provide me some hint how i can solve this problem , i will be very grateful.
Thank's!
There are no duplicate rows in this table. Just because Stephen appears twice doesn't make them duplicates because the ID, VEHICLE_ID, and USER_TYPE are different.
What you need to do is decide how you want to identify the Stephen record you wish to see in the output. Is it the one with the highest VEHICLE_ID? The "latest" record, i.e. the one with the highest ID?
You will use that rule in a window function to order the rows within your criteria, and then use that row number to filter down to the results you want. Something like this:
select id, vehicle_id, worker_id, user_type, user_fullname
from (
select id, vehicle_id, worker_id, user_type, user_fullname,
row_number() over (partition by worker_id, user_fullname order by id desc) n
from user_vehicle
) t
where t.n = 1

SQL field default "count(another_field) +1"

I need to create a field COUNT whose default value is the automatically generated count of times NAME has appeared in that table till now, as shown in example below. Since i am adding the field to an existing table, i also need to populate existing rows. How best to go about this please?
ID NAME COUNT
1 peter 1
2 jane 1
3 peter 2
4 peter 3
5 frank 1
6 jane 2
7 peter 4
You would do this when you are querying the table, using the ANSI-standard row-number function:
select id, name, row_number() over (partition by name order by id) as seqnum
from t;

Is it possible to SELECT N first rows from database using ROWNUM if some rows have the same value?

I'm trying to select N first rows from a database using ROWNUM. The problem is that if I want to select 6 first values of age with names of people with this age and some of the have the same value of age not every person will be shown.
For example
name age
Peter 15
Mark 22
Kelly 17
Mike 17
George 17
If I want to show people with 2 biggest values of age ROWNUM will show Mark and Kelly or Mike or George. Not all three of them. The expected result is:
Name age
Mark 22
Kelly 17
Mike 17
George 17
Is it possible to get this result using ROWNUM? Not RANK, DENSE_RANK, JOIN, correlated subquery but ROWNUM?
You can try something like this:
select *
from test
where age in (
select age
from (
select age
from test
group by age
order by age desc
)
where rownum <=2
)
The right solution dense_rank(), but you can do it with just row_number() and some subqueries:
select t.*
from t
where t.age in (select age
from (select age
from t t2
order by age desc
) x
where rownum <= 2
);
In Oracle 12+, you can simplify this:
select t.*
from t
where t.age in (select age
from t t2
order by age desc
fetch first 2 rows only
);

SQLite selecting multiple sets of rows

I'm using SQLite, trying to select multiple sets of rows. Example:
RowID USER Event
------------------
1 Sam eventX
2 Sam eventY
3 Sam eventA
4 John E1
5 John E5
6 Lisa ev3
7 Lisa ev4
8 Lisa ev3
I want to select one entry per USER, the one with the highest rowid.
The result set should look like:
RowID USER Event
------------------
3 Sam eventA
5 John E5
8 Lisa ev3
Please help.
Mean and lean:
SELECT * FROM Table1 WHERE RowID IN (
SELECT MAX(RowID) FROM Table1 GROUP BY User
);
select max(RowID),User,Event from Info group by User order by max(RowID)
Info Table
Result
Please learn Aggregate function for SQL Lite
select max(RowID) as RowID,User,Event
from Table1
group by User
order by max(RowID)
SQL FIDDLE
SELECT rowid, user, event FROM table WHERE rowid IN (SELECT MAX(rowid) FROM table GROUP BY user);

selecting top N rows for each group in a table

I am facing a very common issue regarding "Selecting top N rows for each group in a table".
Consider a table with id, name, hair_colour, score columns.
I want a resultset such that, for each hair colour, get me top 3 scorer names.
To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"
That solution doesn't work as expected when my scores are equal.
In above example the result as follow.
id name hair score ranknum
---------------------------------
12 Kit Blonde 10 1
9 Becca Blonde 9 2
8 Katie Blonde 8 3
3 Sarah Brunette 10 1
4 Deborah Brunette 9 2 - ------- - - > if
1 Kim Brunette 8 3
Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.
What's the solution to this?
If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:
;WITH HairColors AS
(SELECT id, name, hair, score,
ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3
This CTE will "partition" your data by the value of the hair column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.
So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum of 3 or less (1, 2, 3) --> there you go!
The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like
a.name | a.score | b.name | b.score
-------+---------+---------+--------
Sarah | 9 | Sarah | 9
Sarah | 9 | Deborah | 9
and similarly for Deborah, which is why both girls get a rank of 2 here.
The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:
Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:
SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3
Can anyone see any problems with this approach that have escaped my notice?
Use this compound select which handles OP problem properly
SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)
Note that you need to use IFNULL here to handle case when table girls has less rows for some type of hair then we want to see in sql answer (in OP case it is 3 items).