Remove dulicate rows using SQL - sql

I want to know if there is a way to remove duplicate values from a table. The key 'distinct' will fetch us the unique rows however if one value differs in a column, it wont. so just wanted to know if this can be achieved by any means. Hope the below example will help.
For example : In the below table there are two entries for Emp_ID 1234 with two different priorities. my output should consider the higher priority row alone. Is it possible?
My table
+---------+------+--------+-------+
| Employee_ID| priority | gender |
+------------+-----------+--------+
| 1234 | 1 | F |
| 1234 | 10 | F |
| 5678 | 2 | M |
| 5678 | 25 | M |
| 9101 | 45 | F |
+------------+-----------+--------+
Output
+---------+------+--------+-------+
| Employee_ID| priority | gender |
+------------+-----------+--------+
| 1234 | 1 | F |
| 5678 | 2 | M |
| 9101 | 45 | F |
+------------+-----------+--------+

DELETE
FROM Table t
WHERE EXISTS ( SELECT Employee_ID FROM Table WHERE Employee_ID = t.Employee_ID AND priority < t.Priority)
That is if you really want to remove them from the table. The Exists part can also be used in a select query to leave the values in the Original table.
SELECT *
FROM Table t
WHERE NOT EXISTS (SELECT Employee_ID FROM Table WHERE Employee_ID = t.Employee_ID AND priority > t.Priority)

select Employee_ID,max(priority) as priority,gender
from table
group by Employee_ID,gender

Related

Updating the column that the table was joined on

I need to update a column that two tables were joined on and I'm having a difficult time wrapping my head around it. This is for SQL Server. Loose example below...
User
ID | Name | GroupID |
---------------------
1 | Bob | 100 |
2 | Alex | 300 |
3 | Sara | 300 |
Group
ID | Name |
----------------
100 | Produce |
200 | Cashier |
300 | Stocker |
GroupID is a foreign key to the Group table and they are being joined on that. I HAVE to update the GroupID column in User based on the Name column in Group. For example, I want Alex and Sara to change from 'Stocker' to 'Cashier'. My solution is below, but it doesn't seem to work.
UPDATE User
SET User.GroupID = G.ID
FROM User U
JOIN Group G ON U.GroupID = G.ID
WHERE User = 'Sara' OR User = 'Alex'
Expected Result
User
ID | Name | GroupID |
---------------------
1 | Bob | 100 |
2 | Alex | 200 |
3 | Sara | 200 |
You don't need and updated with join ..
but you could use a subquery for get the expected id from group
update user
set User.GroupID = (select id
from group where name = 'Cashier )
where User = 'Sara' OR User = 'Alex'

Selecting the two most common attribute pairings from a Entity-Attribute Table?

I have a simple Entity-Attribute table in my database describing simply if an Entity has some Attribute by the existance of a row consisting of (Entity, Attribute).
I want to find out, of all the Entities with two and only two Attributes, what are the most common Attribute pairs
For example, if my table looked like:
+--------+-----------+
| Entity | Attribute |
+--------+-----------+
| Bob | A |
| Sally | B |
| Terry | C |
| Bob | B |
| Sally | A |
| Terry | D |
| Larry | C |
+--------+-----------+
I would want it to return
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| C | D | 1 |
+-------------+-------------+-------+
I currently have a short query that looks like:
WITH TwoAtts (
SELECT entity
FROM table
GROUP BY entity
HAVING COUNT(att) = 2
)
SELECT t1.att, t2.att, COUNT(entity)
FROM table t1
JOIN table t2
ON t1.entity = t2.entity
WHERE t1.entity IN (SELECT * FROM TwoAtts)
AND t1.att != t2.att
GROUP BY t1.att, t2.att
ORDER BY COUNT(entity) DESC
but is only capable of producing "duplicate" results like
+-------------+-------------+-------+
| Attribute-1 | Attribute-2 | Count |
+-------------+-------------+-------+
| A | B | 2 |
| B | A | 2 |
| D | C | 1 |
| C | D | 1 |
+-------------+-------------+-------+
In a sense I would like to be able to run a unordered DISTINCT / set operator over the two attribute columns, but I am not sure how to acheive this functionality in SQL?
Hmmm, I think you want two levels of aggregation, with some filtering:
select attribute_1, attribute_2, count(*)
from (select min(ea.attribute) as attribute_1, max(ea.attribute) as attribute_2
from entity_attribute ea
group by entity
having count(*) = 2
) aa
group by attribute_1, attribute_2;
Here is a db<>fiddle

Join Lookup from 1 table to multiple columns

How do I link 1 table with multiple columns in another table without using mutiple JOIN query?
Below is my scenario:
I have table User with ID and Name
User
+---------+------------+
| Id | Name |
+---------+------------+
| 1 | John |
| 2 | Mike |
| 3 | Charles |
+---------+------------+
And table Product with multiple columns, but just focus on 2 columns CreateBy And ModifiedBy
+------------+-----------+-------------+
| product_id | CreateBy | ModifiedBy |
+------------+-----------+-------------+
| 1 | 1 | 3 |
| 2 | 1 | 3 |
| 3 | 2 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 3 |
+------------+-----------+-------------+
With normal JOIN, i will need to do 2 JOIN:
SELECT p.Product_id,
u1.Name AS CreateByName,
u2.Name AS ModifiedByName
FROM Product p
JOIN USER user u1 ON p.CreateBy = u1.Id,
JOIN USER user u2 ON p.ModifiedBy = u2.Id
to come out result
+------------+---------------+-----------------+
| product_id | CreateByName | ModifiedByName |
+------------+---------------+-----------------+
| 1 | John | Charles |
| 2 | John | Charles |
| 3 | Mike | Charles |
| 4 | Mike | John |
| 5 | Mike | Charles |
+------------+---------------+-----------------+
How do i avoid that 2 times JOIN?
I'm using MS-SQL , but open to all SQL query for my own learning curious
Your current design/approach is acceptable, I think, and the need for two joins is a function of there being two user ID columns. Each of the two columns requires a separate join.
For fun, here is a table design which you may consider if you really want to have to perform only one join:
+------------+-----------+-------------+
| product_id | user_id | type |
+------------+-----------+-------------+
| 1 | 1 | created |
| 2 | 1 | created |
| 3 | 2 | created |
| 4 | 2 | created |
| 5 | 2 | created |
| 1 | 3 | modified |
| 2 | 3 | modified |
| 3 | 3 | modified |
| 4 | 1 | modified |
| 5 | 3 | modified |
+------------+-----------+-------------+
Now, you can get away with a just a single join followed by an aggregation:
SELECT
p.product_id,
MAX(CASE WHEN t.type = 'created' THEN u.Name END) AS CreateByName,
MAX(CASE WHEN t.type = 'modified' THEN u.Name END) AS ModifiedByName
FROM Product p
INNER JOIN user u
ON p.user_id = u.Id
GROUP BY
p.product_id;
Note that I don't recommend this approach at all. It is much cleaner to use your current approach and use two joins. Joins can fairly easily be optimized using one or more indices. The above aggregation approach would probably not perform as well as what you already have.
If you use natural keys instead of surrogates, you won't need to join at all.
I don't know how you tell your products apart in the real world, but for the example I will assume you have a UPC
CREATE TABLE User
(Name VARCHAR(20) PRIMARY KEY);
CREATE TABLE Product
(UPC CHAR(12) PRIMARY KEY,
CreatedBy VARCHAR(20) REFERENCES User(Name),
ModifiedBy VARCHAR(20) REFERENCES User(Name)
);
Now your query is a simple select, and you also enforce uniqueness of your user names as a bonus, and don't need additional indexes.
Try it...
HTH
Join is the best Approach, but if looking for alternate approach you can use Inline Query.
SELECT P.PRODUCT_ID,
(SELECT [NAME] FROM #USER WHERE ID = CREATED_BY) AS CREATED_BY,
(SELECT [NAME] FROM #USER WHERE ID = MODIFIED_BY) AS MODIFIED_BY
FROM #PRODUCT P
DEMO

SQL convert column headers to row values

I have a table that looks like this:
+--------+-----------+------------+-----------+
| Group# | Person A | Person B | Person C |
+--------+-----------+------------+-----------+
| 1 | yes | no | no |
| 2 | no | yes | yes |
| 3 | yes | yes | yes |
I want to use a SQL query on this data that will return the Group# in one column and the column header in the second column when the value = yes. The result I want would look like this for the above table:
+-----------+----------+
| Group# | Person |
+-----------+----------+
| 1 | Person A |
| 2 | Person B |
| 2 | Person C |
| 3 | Person A |
| 3 | Person B |
| 3 | Person C |
+-----------+----------+
*Note that in contrast to my example, my actual data has many more columns than rows.
Thank you.
In my opinion, the best approach is a lateral join. But the most general method is simply union all:
select group#, 'personA' as person
from t
where personA = 'yes'
union all
select group#, 'personB' as person
from t
where personB = 'yes'
union all
select group#, 'personC' as person
from t
where personC = 'yes';
In answer to your next question . . . yes, you have to explicitly list the columns. However, you can use a SQL query on the metadata tables to generate the query you really want. And then execute that query.

Select rows appearing after a row with a given ID when sorted by criteria unrelated to the ID

Given the data in the table "people":
+----+-------+
| id | name |
+----+-------+
| 1 | Jane |
| 2 | Joe |
| 4 | John |
| 5 | Alice |
| 6 | Bob |
+----+-------+
And the order:
SELECT * FROM people ORDER BY name
... which would return:
+----+-------+
| id | name |
+----+-------+
| 5 | Alice |
| 6 | Bob |
| 1 | Jane |
| 2 | Joe |
| 4 | John |
+----+-------+
How could one write a query--including the order above--which would return only rows after the one with a given id, e.g., if given an id of 1, it would return:
+----+-------+
| id | name |
+----+-------+
| 2 | Joe |
| 4 | John |
+----+-------+
To be clear, the id is variable and not known before hand.
An approach using commonly supported SQL would be great, but I'm using PostgreSQL 9.2 and ActiveRecord 3.2 if they have anything additional of use, e.g., OVER() and ROW_NUMBER().
[Edit] I'd previously showed the wrong desired result set, including the row with the given id. But, the result set, as described in the question, should only include rows after the given ID.
select *
from people
where
name >= (
select name
from people
where id = 1
)
and id != 1
order by name
So far the simplest approach I've found for a situation where precision is needed, e.g., no missing or duplicate results across multiple calls with varying values for ID is to combine window functions and CTEs, as in:
WITH ordered_people AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY name) AS n
FROM people
ORDER BY name
)
SELECT *
FROM ordered_people
WHERE n > (SELECT n FROM ordered_people WHERE id = 1)
ORDER BY name
;