i'm introducing you the problem with DISTINCT values by column condition i have dealt with and can't provide
any idea how i can solve it.
So. The problem is i have two Stephen here declared , but i don't want duplicates:
**
The problem:
**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
11 1 1 employees Stephen
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
**
The desired output:**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
I have tried CASE statements but without success. When i group by it by worker_id,
it removes another duplicates, so i figured out it needs to be grouped by some special condition?
If anyone can provide me some hint how i can solve this problem , i will be very grateful.
Thank's!
There are no duplicate rows in this table. Just because Stephen appears twice doesn't make them duplicates because the ID, VEHICLE_ID, and USER_TYPE are different.
What you need to do is decide how you want to identify the Stephen record you wish to see in the output. Is it the one with the highest VEHICLE_ID? The "latest" record, i.e. the one with the highest ID?
You will use that rule in a window function to order the rows within your criteria, and then use that row number to filter down to the results you want. Something like this:
select id, vehicle_id, worker_id, user_type, user_fullname
from (
select id, vehicle_id, worker_id, user_type, user_fullname,
row_number() over (partition by worker_id, user_fullname order by id desc) n
from user_vehicle
) t
where t.n = 1
Related
I have 2 tables in the following way
Table 1:
e_id e_name e_salary e_age e_gender e_dept
---------------------------------------------------
1 sam 95000 45 male operations
2 bob 80000 21 male support
3 ann 125000 25 female analyst
Table 2:
d_salary d_age d_gender e_dept
----------------------------------
34000 25 male Admin
56000 41 female Tech
77000 35 female HR
I want the output something like this:
e_id e_name e_salary e_age e_gender e_dept d_salary d_age d_gender e_dept
1 sam 95000 45 male operations 34000 25 male Admin
2 bob 80000 21 male support 56000 41 female Tech
3 ann 125000 25 female analysts 77000 35 female HR
There is no dependency between the tables. No common columns. No primary or foreign key.
I tried using cross join that results in duplicate rows because it works on M X N
I am new to this SQL thing. Can someone help me, please? Thanks in advance
Though I didn't get the reason behind your desired output but you can get that with below query:
select a.e_id ,a.e_name ,a.e_salary ,a.e_age ,a.e_gender ,a.e_dept,b.d_salary ,b.d_age ,b.d_gender ,b.e_dept
from
(select e_id ,e_name ,e_salary ,e_age ,e_gender ,e_dept, row_number()over(order by e_id)rn
from table1)a
inner join
(select d_salary d_age d_gender e_dept,row_number()over(order by d_salary) rn
from table 2) b
on a.rn=b.rn
Generally you can create a row count using the row_number() window function on both tables and use this as join criterion. But this requires a certain order for both tables, which means that you have explicitly tell the query why is the Admin record ordered first and must be joined on the first record of table 1:
SELECT
*
FROM (
SELECT
*,
row_number() OVER (ORDER BY e_id) as row_count -- assuming e_id is your order criterion
FROM table1
) t1
JOIN (
SELECT
*,
row_number() OVER (ORDER BY /*whatever you expect to be ordered*/) as row_count
FROM table2
) t2
ON t1.row_count = t2.row_count
I got the following records where different people have the same name:
id name category_id birthdate family_id
1 joe 2 2014-05-01 1
2 jack 3 2013-04-01 2
3 joe 2 1964-05-01 1
4 jack 5 1982-05-01 2
5 emma 1 2014-05-01 1
6 joe 3 2003-07-06 3
Now I need a query which results to the following. I want only each name once per family_id. I need all values of each record afterwards including the id. In case the table gets further rows down the road I need them too. So the result should include all values.
id name category_id birthdate family_id
1 joe 2 2014-05-01 1
2 jack 3 2013-04-01 2
5 emma 1 2014-05-01 1
6 joe 3 2003-07-06 3
I tried it with several approaches (GROUP BY, DISTINCT, DISTINCT ON etc.) but none was working out for me.
When I use a GROUP BY clause (GROUP BY name) I get a ERROR: column "deals.id" must appear in the GROUP BY clause or be used in an aggregate function. But when I put id inside the clause I get all records back.
Same with distinct. There I have to choose all fields on which the result set should be distinct. But I need all values of the record. And because of the primary each record is distinct when i include the id.
I tried it with a sub clause which has filtered all distinct names. I used them in a where clause. But I got all values back including (of course) the not distinct name/family_id records.
Has anybody a helping hand for me?
You might not of specified all of the fields in your group by and if you included id, then that would make the rows unique.
Try something like:
SELECT
name, category_id, birthdate, family_id
FROM family
GROUP BY
name, category_id, birthdate, family_id;
It worked with DISTINCT ON.
The following worked quite well:
SELECT DISTINCT ON (table.name, table.family_id) table.* FROM table;
The only thing I have to check is the ordering. But I wanted to share my solution so far.
I got a SQL table of the form:-
Name Age
Kim 5
Tom 8
Jim 12
Kim 5
David 21
Jim 12
In the above scenario, Kim and Jim are duplicated and they exist twice.
If I have a large table of the above form, with about 60000 entries, how do I write a sql query to delete only the duplicates but still retain the unique ones?
With the help of Common Table Expression and a Window Function, you can easily delete duplicate records.
WITH dups
AS
(
SELECT Name, Age,
ROW_NUMBER() OVER (PARTITION BY Name, Age ORDER BY Age DESC) rn
FROM TableName
)
DELETE FROM dups
WHERE rn > 1
SQLFiddle Demo
Ranking Functions (Transact-SQL)
How can retrieve that data:
Name Title Profit
Peter CEO 2
Robert A.D 3
Michael Vice 5
Peter CEO 4
Robert Admin 5
Robert CEO 13
Adrin Promotion 8
Michael Vice 21
Peter CEO 3
Robert Admin 15
to get this:
Peter........4
Robert.......15
Michael......21
Adrin........8
I want to get the highest profit value from each name.
If there are multiple equal names always take the highest value.
select name,max(profit) from table group by name
Since this type of request almost always follows with "now can I include the title?" - here is a query that gets the highest profit for each name but can include all the other columns without grouping or applying arbitrary aggregates to those other columns:
;WITH x AS
(
SELECT Name, Title, Profit, rn = ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY Profit DESC)
FROM dbo.table
)
SELECT Name, Title, Profit
FROM x
WHERE rn = 1;
I am facing a very common issue regarding "Selecting top N rows for each group in a table".
Consider a table with id, name, hair_colour, score columns.
I want a resultset such that, for each hair colour, get me top 3 scorer names.
To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"
That solution doesn't work as expected when my scores are equal.
In above example the result as follow.
id name hair score ranknum
---------------------------------
12 Kit Blonde 10 1
9 Becca Blonde 9 2
8 Katie Blonde 8 3
3 Sarah Brunette 10 1
4 Deborah Brunette 9 2 - ------- - - > if
1 Kim Brunette 8 3
Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.
What's the solution to this?
If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:
;WITH HairColors AS
(SELECT id, name, hair, score,
ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3
This CTE will "partition" your data by the value of the hair column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.
So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum of 3 or less (1, 2, 3) --> there you go!
The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like
a.name | a.score | b.name | b.score
-------+---------+---------+--------
Sarah | 9 | Sarah | 9
Sarah | 9 | Deborah | 9
and similarly for Deborah, which is why both girls get a rank of 2 here.
The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:
Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:
SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3
Can anyone see any problems with this approach that have escaped my notice?
Use this compound select which handles OP problem properly
SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)
Note that you need to use IFNULL here to handle case when table girls has less rows for some type of hair then we want to see in sql answer (in OP case it is 3 items).