Finding duplicates between two columns same table

Finding duplicates between two columns same table - sql

I want to find duplicate values between two columns without a id.
Example:
table employees
-----------------------------------
employee_one | employee_two |
-----------------------------------
JOHN SMITH | JACK STEVENS |
MASON LEWIS | JOHN WALKER |
ANDREA YOUNG | MARTINA ROBINSON|
JACK STEVENS | JOHN SMITH |
JOHN WALKER | MASON LEWIS |
MARTINA ROBINSON| ANDREA YOUNG |
and the results I want is:
-----------------------------------
employee_one | employee_two |
-----------------------------------
JOHN SMITH | JACK STEVENS |
MASON LEWIS | JOHN WALKER |
ANDREA YOUNG | MARTINA ROBINSON|
or
-----------------------------------
employee_one | employee_two |
-----------------------------------
JACK STEVENS | JOHN SMITH |
JOHN WALKER | MASON LEWIS |
MARTINA ROBINSON| ANDREA YOUNG |
My problem is that my query always find all the results and I get the same table. I tried:
SELECT DISTINCT t1.*
FROM employees
AS t1 LEFT JOIN employees AS t2 ON (t1.employee_one = t2.employee_two AND t1.employee_two = t2.employee_one)
OR (t1.employee_one = t2.employee_one AND t1.employee_two = t2.employee_two)
But I get the same results

Every pair of names will show up twice, so use a where clause to limit the output to just those where employee_one < employee_two:
select t1.*
from employees t1
where employee_one < employee_two
and exists (
select *
from employees t2
where t2.employee_two = t1.employee_one
and t2.employee_one = t1.employee_two)
Caveat: this assumes that there are no rows where employee_one = employee_two.

Related

Finding an ID not in another column

I'm working on a little SQL exercise, and am scratching my head an this problem.
I am trying to find all the Employees to whom no other employee reports to.
This is what the employees table looks like:
EmployeeId LastName FirstName Title ReportsTo
1 Adams Andrew General Manager null
2 Edwards Nancy Sales Manager 1
3 Peacock Jane Sales Support Agent 2
4 Park Margaret Sales Support Agent 2
5 Johnson Steve Sales Support Agent 2
6 Mitchell Michael IT Manager 1
7 King Robert IT Staff 6
8 Callahan Laura IT Staff 6
I thought a straightforward one of these queries would do it:
SELECT *
FROM employees
Where EmployeeId not in (select ReportsTo from employees)
SELECT *
FROM employees
Where EmployeeId not in (ReportsTo)
But those return the following results, which isn't what I'm looking for:
EmployeeId LastName FirstName Title ReportsTo
2 Edwards Nancy Sales Manager 1
3 Peacock Jane Sales Support Agent 2
4 Park Margaret Sales Support Agent 2
5 Johnson Steve Sales Support Agent 2
6 Mitchell Michael IT Manager 1
7 King Robert IT Staff 6
8 Callahan Laura IT Staff 6
Why is NOT IN returning items that are definitely in that column? How would I go about returning items not in ReportsTo if I am using NOT IN incorrectly?

Use not exists with a correlated subquery (as commented by jarlh):
select *
from employees e
where not exists (
select 1
from employees e1
where e1.ReportsTo = e.EmployeeId
)

The problem with your 1st query is that you use NOT IN with a list that contains a NULL value.
So a comparison of an EmployeeId like say 5:
5 NOT IN (null, 1, 2, 6)
will return NULL, because any comparison to NULL returns NULL and that EmployeeId will not be included in the results.
Change to:
SELECT *
FROM employees
Where EmployeeId not in (
select ReportsTo
from employees
where ReportsTo is not null
);
See the demo.
Results:
| EmployeeId | LastName | FirstName | Title | ReportsTo |
| ---------- | -------- | --------- | ------------------- | --------- |
| 3 | Peacock | Jane | Sales Support Agent | 2 |
| 4 | Park | Margaret | Sales Support Agent | 2 |
| 5 | Johnson | Steve | Sales Support Agent | 2 |
| 7 | King | Robert | IT Staff | 6 |
| 8 | Callahan | Laura | IT Staff | 6 |

You can simply use the below query--
select * from employees emp where employeeID not in (select ReportsTo from employee)

Multi-Row function to filter out Duplicates

I'm relatively new at using SQL, So I would like your help regarding a case.
I have the following Table (just a sample):
| id | FName_LVL1 | LName_LVL1 | FName_LVL2 | LName_LVL2 |
|----|-------------|------------|------------|-------------|
| 1 | John | Kennedy | Marc | Guy |
| 2 | John | Kennedy | Olivier | Oslo |
| 3 | Mike | Lanes | Patrick | James |
I would like to isolate the duplicates in FName_LVL1 and LName_LVL1
So that the Table looks like this :
| id | FName_LVL1 | LName_LVL1 | FName_LVL2 | LName_LVL2 |
|----|-------------|------------|------------|-------------|
| 1 | John | Kennedy | Marc | Guy |
| 2 | John | Kennedy | Olivier | Oslo |
My idea was to create a flag column with a condition that IF lines Above or below in column FName_LVL1 and LName_LVL1 are the same, then put "1", else "0"
Having a Column looking like this:
| id | FName_LVL1 | LName_LVL1 | FName_LVL2 | LName_LVL2 | Flag
|----|-------------|------------|------------|-------------|
| 1 | John | Kennedy | Marc | Guy | 1
| 2 | John | Kennedy | Olivier | Oslo | 1
| 3 | Mike | Lanes | Patrick | James | 0
After having a table like this I could just filter and having the result I want to achieve.
That's a way to work I'm used to in Alteryx, but I'm not sure if this is possible using SQL statements, or even if this is the best way to tackle this case

You may use the count() with window function .
SQL Fiddle
Query 1:
SELECT t.*
,CASE
WHEN COUNT(*) OVER (
PARTITION BY fname_lvl1
,lname_lvl1
) > 1
THEN 1
ELSE 0
END AS Flag
FROM t
Results:
| ID | FNAME_LVL1 | LNAME_LVL1 | FNAME_LVL2 | LNAME_LVL2 | FLAG |
|----|------------|------------|------------|------------|------|
| 1 | John | Kennedy | Marc | Guy | 1 |
| 2 | John | Kennedy | Olivier | Oslo | 1 |
| 3 | Mike | Lanes | Patrick | James | 0 |

The no_of_records is a column that tells you how many times the combination is present in the table. I.e. it will be 2 in your example table
select table1.*
from table as table1
inner join
(
Select FName_LVL1, LName_LVL1, count(*) as no_of_records
from Table
group by FName_LVL1, LName_LVL1
) table2
on table1.FName_LVL1 = table2.FName_LVL1
and table1.LName_LVL1 = table2.LName_LVL1
and no_of_records>1

You can use "semi join" subquery to get a result like that:
SELECT * FROM Table1 t1
WHERE EXISTS (
SELECT 'Anything' FROM Table1 t2
WHERE t1.FName_LVL1 = t2.FName_LVL1
AND t1.LName_LVL1 = t2.LName_LVL1
AND t1.id <> t2.id
)
Demo: http://sqlfiddle.com/#!4/f9c44/3
| ID | FNAME_LVL1 | LNAME_LVL1 | FNAME_LVL2 | LNAME_LVL2 |
|----|------------|------------|------------|------------|
| 2 | John | Kennedy | Olivier | Oslo |
| 1 | John | Kennedy | Marc | Guy |

The most efficient way is to use partition by clause to have only one table scan.
I have save the output in Livesql
drop table t1 purge;
create table t1 ( c1 varchar2(20), c2 varchar2(20), c3 varchar2(20), c4 varchar2(20));
insert into t1 values ('John','Kennedy','Marc','Guy');
insert into t1 values ('John','Kennedy','Olivier','Oslo');
insert into t1 values ('not','john','vijay','balebail');
commit;
select t1.*, count(c1||c2) over (partition by c1,c2 order by c1,c2 ) flag from t1;
select t1.*, decode (count(c1||c2) over (partition by c1,c2 order by c1,c2 ),1,0,1) flag from t1;
C1 C2 C3 C4 FLAG
John Kennedy Marc Guy 2
John Kennedy Olivier Oslo 2
not john vijay balebail 1
Download CSV
3 rows selected.
Statement 7
select t1.*, decode (count(c1||c2) over (partition by c1,c2 order by c1,c2 ),1,0,1) flag from t1
C1 C2 C3 C4 FLAG
John Kennedy Marc Guy 1
John Kennedy Olivier Oslo 1
not john vijay balebail 0

You may prefer using LAG & LEAD analytic functions with the contribution of NVL2 :
select n.*,
nvl2(lag(FName_LVL1||' '||LName_LVL1,1,null) over
(partition by FName_LVL1||' '||LName_LVL1 order by FName_LVL1, LName_LVL1),1,0)+
nvl2(lead(FName_LVL1||' '||LName_LVL1,1,null) over
(partition by FName_LVL1||' '||LName_LVL1 order by FName_LVL1, LName_LVL1),1,0) flag
from names n;
ID FNAME_LVL1 LNAME_LVL1 FNAME_LVL2 LNAME_LVL2 FLAG
-- ---------- ---------- ---------- ---------- -----
1 John Kennedy Marc Guy 1
2 John Kennedy Olivier Oslo 1
3 Mike Lanes Patrick James 0
SQL Fiddle Demo

Well thank you all ! It seems that there is indeed plenty of solutions regarding that case !
I'll keep dig into it to see what's the one I prefer the most, but thanks to you it give me a good insight on SQL logic
Sorry for the delay in my reply, was away for work

Return two random rows without duplicate properties in PostgreSQL

Let's say I have a table of customer addresses:
Name | AddressLine
-------------------------------
John Smith | 123 Nowheresville
Jane Doe | 456 Evergreen Terrace
John Smith | 999 Somewhereelse
Joe Bloggs | 1 Second Ave
I would like to return two random rows from this table, but I do not want to return two rows with the same Name (example of what I don't want):
Name | AddressLine
-------------------------------
John Smith | 123 Nowheresville
John Smith | 999 Somewhereelse
How can I do this in Postgres?

Here is one method:
select distinct on (name) t.*
from t
order by name, random();

SQL group by and listing

I have the following information in DB :
id | name | company
---+----------+------
1 | Joe | company_1
2 | Sally | company_2
3 | Marc | company_3
4 | Bob | company_1
Is possible to do something like that in SQL :
SELECT company, _something_ FROM my_db GROUP BY company
And get :
company | members
---------- | ---------
company_1 | Joe, Bob
company_2 | Sally
company_3 | Marc
Thanks in advance for your help !

MySQL:
SELECT company, GROUP_CONCAT(name SEPARATOR ',') AS members FROM my_table GROUP BY company

Make one-to-many relationship look like one-to-one

I'm using postgres 9.3.5.
Given the following data:
select * from department;
id | name
----+-----------
1 | sales
2 | marketing
3 | HR
and
select * from people;
id | department_id | first_name | last_name
----+---------------+------------+-----------
1 | 1 | Tom | Jones
2 | 1 | Bill | Cosby
3 | 2 | Jessica | Biel
4 | 1 | Rachel | Hunter
5 | 2 | John | Barnes
I'd like to return a result set like this:
id | name | first_name-1 | last_name-1 | first_name-2 | last_name-2 | first_name-3 | last_name-3
----+-----------+--------------+-------------+--------------+-------------+--------------+------------
1 | sales | Tom | Jones | Bill | Cosby | Rachel | Hunter
2 | marketing | Jessica | Biel | John | Barnes
3 | HR |
Is this possible?
The answer provided here by Max Shawabkeh using the GROUP_CONCAT is close - but its not returning as extra fields in the dataset, its concatenating them into a single field.

You need cross-tabulation (sometimes called pivot).
Could look like this in your case:
SELECT * FROM crosstab(
$$SELECT d.id, d.name,'p' AS dummy_cat
,concat_ws(' ', p.first_name, p.last_name) AS person
FROM department d
LEFT JOIN people p ON p.department_id = d.id
ORDER BY d.department_id, p.id$$
)
AS ct (id int, department text, person_1 text, person_2 text, person_3 text);
Returns:
id department person_1 person_2 person_3
--------------------------------------------------------
1 sales Tom Jones Bill Cosby Rachel Hunter
2 marketing Jessica Biel John Barnes <NULL>
3 HR <NULL> <NULL> <NULL>
Very similar to this related case (explanation for special difficulties there):
Postgres - Transpose Rows to Columns
But this case is simpler: since you do not seem to care about the order in which persons are listed, you can use the basic one-parameter form of crosstab().
Also, according to your comment, you want all departments, even if no people are assigned. Adjusted the LEFT JOIN accordingly.
Basic details in this related answer:
PostgreSQL Crosstab Query

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding duplicates between two columns same table - sql

Related

Finding an ID not in another column

Multi-Row function to filter out Duplicates

Return two random rows without duplicate properties in PostgreSQL

SQL group by and listing

Make one-to-many relationship look like one-to-one

Categories

Resources