Finding duplicated column value pairs in sql table

Finding duplicated column value pairs in sql table - sql

I have a table containing pairs for matches, and the table looks like this:
|pairing_id|player1_id|player2_id|number_of_round|
| 132 | Thomas | Brian | 1 |
I try to write an sql query, which shows me all the redundant pairs, so the pairing is the same if the 2 player names are the same, but for the second time, Brian is player1 and Thomas is the player2.
So this 2 matchups are considered the same pairs, as the player names are the same:
|pairing_id|player1_id|player2_id|number_of_round|
| 132 | Thomas | Brian | 1 |
| 458 | Brian | Thomas | 4 |
I need to find all the redundant pairings in the table, but sadly I dont know how to query for this.

Can be done with EXISTS
select t1.pairing_id, t1.player1_id, t1.player2_id, t1.number_of_round
from myTable t1
where exists (select null
from myTable t2
where t2.player1_id = t1.player2_id and t2.player2_id = t1.player1_id)
order by case when t1.player1_id > t1.player2_id then t1.player2_id else t1.player1_id end

You can do it with exists:
select t.* from tablename t
where exists (
select 1 from tablename
where pairing_id <> t.pairing_id and player1_id = t.player2_id and player2_id = t.player1_id
)

Related

In a query (no editing of tables) how do I join data without any similarities?

I Have a query that finds a table, here's an example one.
Name |Age |Hair |Happy | Sad |
Jon | 15 | Black |NULL | NULL|
Kyle | 18 |Blonde |YES |NULL |
Brad | 17 | Blue |NULL |YES |
Name and age come from one table in a database, hair color comes from a second which is joined, and happy and sad come from a third table.My goal would be to make the first line of the chart like this:
Name |Age |Hair |Happy |Sad |
Jon | 15 |Black |Yes |Yes |
Basically I want to get rid of the rows under the first and get the non NULL data joined to the right. The problem is that there is no column where the Yes values are on the Jon row, so I have no idea how to get them there. Any suggestions?
PS. With the data I am using I can't just put a 'YES' in the 'Jon' row and call it a day, I would need to find the specific value from the lower rows and somehow get that value in the boxes that are NULL.

Do you just want COALESCE()?
COALESCE(Happy, 'Yes') as happy
COALESCE() replaces a NULL value with another value.

If you want to join on a NULL value work with nested selects. The inner select gets an Id for NULLs, the outer select joins
select COALESCE(x.Happy, yn_table.description) as happy, ...
from
(select
t1.Happy,
CASE WHEN t1.Happy is null THEN 1 END as happy_id
from t1 ...) x
left join yn_table
on x.xhappy_id = yn_table.id
If you apply an ORDER BY to the query, you can then select the first row relative to this order with WHERE rownum = 1. If you don't apply an ORDER BY, then the order is random.
After reading your new comment...
the sense is that in my real data the yes under the other names will be a number of a piece of equipment. I want the numbers of the equipment in one row instead of having like 8 rows with only 4 ' yes' values and the rest null.
... I come to the conclusion that this a XY problem.
You are asking about a detail you think will solve your problem, instead of explaining the problem and asking how to solve it.
If you want to store several pieces of equipment per person, you need three tables.
You need a Person table, an Article table and a junction table relating articles to persons to equip them. Let's call this table Equipment.
Person
------
PersonId (Primary Key)
Name
optional attributes like age, hair color
Article
-------
ArticleId (Primary Key)
Description
optional attributes like weight, color etc.
Equipment
---------
PersonId (Primary Key, Foreign Key to table Person)
ArticleId (Primary Key, Foreign Key to table Article)
Quantity (optional, if each person can have only one of each article, we don't need this)
Let's say we have
Person: PersonId | Name
1 | Jon
2 | Kyle
3 | Brad
Article: ArticleId | Description
1 | Hat
2 | Bottle
3 | Bag
4 | Camera
5 | Shoes
Equipment: PersonId | ArticleId | Quantity
1 | 1 | 1
1 | 4 | 1
1 | 5 | 1
2 | 3 | 2
2 | 4 | 1
Now Jon has a hat, a camera and shoes. Kyle has 2 bags and one camera. Brad has nothing.
You can query the persons and their equipment like this
SELECT
p.PersonId, p.Name, a.ArticleId, a.Description AS Equipment, e.Quantity
FROM
Person p
LEFT JOIN Equipment e
ON p.PersonId = e.PersonId
LEFT JOIN Article a
ON e.ArticleId = a.ArticleId
ORDER BY p.Name, a.Description
The result will be
PersonId | Name | ArticleId | Equipment | Quantity
---------+------+-----------+-----------+---------
3 | Brad | NULL | NULL | NULL
1 | Jon | 4 | Camera | 1
1 | Jon | 1 | Hat | 1
1 | Jon | 5 | Shoes | 1
2 | Kyle | 3 | Bag | 2
2 | Kyle | 4 | Camera | 1
See example: http://sqlfiddle.com/#!4/7e05d/2/0

Since you tagged the question with the oracle tag, you could just use NVL(), which allows you to specify a value that would replace a NULL value in the column you select from.

Assuming that you want the 1st row because it contains the smallest age:
- wrap your query inside a CTE
- in another CTE get the 1st row of the query
- in another CTE get the max values of Happy and Sad of your query (for your sample data they both are 'YES')
- cross join the last 2 CTEs.
with
cte as (
<your query here>
),
firstrow as (
select name, age, hair from cte
order by age
fetch first row only
),
maxs as (
select max(happy) happy, max(sad) sad
from cte
)
select f.*, m.*
from firstrow f cross join maxs m

You can try this:
SELECT A.Name,
A.Age,
B.Hair,
C.Happy,
C.Sad
FROM A
INNER JOIN B
ON A.Name = B.Name
INNER JOIN C
ON A.Name = B.Name
(Assuming that Name is the key columns in the 3 tables)

SQL Server stored procedure inserting duplicate rows

I have a table with column GetDup and I'd like to the duplicate records based on the value of this column. For example, if value on is 1 in GetDup, then duplicate the record once. If value in the column is 2, then duplicate the record twice and so on and the statement has to be in looping statement.
What will be a good way to write a stored procedures for this? Please help.
Input:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
What I want:
+--------+--------------+---------------+
| Getdup | CustomerName | CustomerAdd |
+--------+--------------+---------------+
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
+--------+--------------+---------------+
picture of data

Answer #2 After Clarification
Number Table to the Rescue!
The number table in my example (or tally table, if you want to call it that), is both temporary and very small. To make it bigger, just add more values to z and add more CROSS JOINs. In my opinion, a number table and a calendar table are both things that should be in every database you have. They are extremely useful.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE mytable ( Getdup int, CustomerName varchar(10), CustomerAdd varchar(20) ) ;
INSERT INTO mytable (Getdup, CustomerName, CustomerAdd)
VALUES (1,'John','123 SomeWhere'), (2,'Bob','987 SomeWhere')
;
Query 1:
;WITH z AS (
SELECT *
FROM ( VALUES(0),(0),(0),(0) ) v(x)
)
, numTable AS (
SELECT num
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY z1.x)-1 num
FROM z z1
CROSS JOIN z z2
) s1
)
SELECT t1.Getdup, t1.CustomerName, t1.CustomerAdd
FROM mytable t1
INNER JOIN numTable ON t1.getdup >= numTable.num
ORDER BY CustomerName, CustomerAdd
Results:
| Getdup | CustomerName | CustomerAdd |
|--------|--------------|---------------|
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 2 | Bob | 987 SomeWhere |
| 1 | John | 123 SomeWhere |
| 1 | John | 123 SomeWhere |
--------------------------------------------------------------------------
ORIGINAL ANSWER
EDIT: After further clarification of the problem, this won't duplicate rows, this will only duplicate the data in a column.
Something like one of these might work.
T-SQL
SELECT replicate(mycolumn,getdup) AS x
FROM mytable
MySQL
SELECT repeat(mycolumn,getdup) AS x
FROM mytable
Oracle SQL
SELECT rpad(mycolumn,getdup*length(mycolumn),mycolumn) AS x
FROM mytable
PostgreSQL
SELECT repeat(mycolumn,getdup+1) AS x
FROM mytable
If you can provide more details for exactly what you want and what you're working with, we might be able to help you better.
NOTE 2: Depending on what you need, you may need to do some math magic. You say above if GetDup is 1 then you want one duplicate. If that means that your output should be GetDup``GetDup, then you'll want to add one in the repeat(),replicate() or rpad() functions. ie replicate(mycolumn,getdup+1). Oracle SQL will be a little different, since it uses rpad().

In standard SQL you can use a recursive CTE:
with recursive cte as (
select t.dup, . . .
from t
union all
select cte.dup - 1, . . .
from cte
where cte.dup > 1
)
select *
from cte;
Of course, not all databases support recursive CTEs (and the recursive keyword is not used in some of them).

So, you want recursive solution :
with t as (
select Getdup, CustomerName, CustomerAdd, 0 as id
from table
union all
select Getdup, CustomerName, CustomerAdd, id + 1
from t
where id < getdup
)
insert into table (col1, col2, col3)
select Getdup, CustomerName, CustomerAdd
from t
order by getdup
option (maxrecursion 0);

Why intermediate relation created in postgresql query can not be referenced in where clause filter?

I am trying to remove duplicate data from a table in postgres. In my table, there's not primary key.
postgres=# select * from customer_temp;
id | firstname | country | phonenumber
----+-----------+-----------+-------------
1 | Sachin | India | 3454
2 | Viru | India | 3454
3 | Saurav | India | 3454
4 | Ponting | Australia | 3454
5 | Warne | Australia | 3454
7 | Be;; | England | 3454
8 | Cook | England | 3454
8 | Cook | England | 3454
8 | Cook | England | 3454
(9 rows)
I am using following query to delete duplicate records.
delete from customer_temp temp
using (select out1.id, out1.firstname
from customer_temp out1
where (select count(out2.id)
from customer_temp out2
where out1.firstname=out2.firstname group by out2.firstname
) > 1
) temp1
where temp.id in (select id
from temp1
where id not in(select id
from temp1
LIMIT 1 OFFSET 0));
But I am getting following error:-
ERROR: relation "temp1" does not exist
LINE 1: ...name) > 1) temp1 where temp.id in (select id from temp1 wher...
Although relation temp1 is created as part of using, then why I can't use them in where clause filter.
As per How Select SQL gets executed, FROM is executed first and the result of row is available to next stages of query execution. Then, why temp1 is not available to subqueries in where section.

Hmmm . . . Assuming that id uniquely identifies each row, this is a simple way to write the logic:
delete from customer_temp
where id not in (select min(ct2.id)
from customer_temp ct2
where ct2.id is not null
group by ct2.firstname, ct2.country, ct2.phonenumber
);
I do note that I'm using not in with a subquery. I do usually warn against this (although this is safe because of the where). You can do something similar with exists or using > and a correlated subquery.
EDIT:
If id is not unique, then it is a really bad name for a column. But apart from that, you can use oid:
delete from customer_temp
where oid not in (select min(oid)
from customer_temp ct2
group by ct2.firstname, ct2.country, ct2.phonenumber
);
This is a built-in identifier.
However, the best approach is probably just to rebuild the table:
create table customer_temp_temp as
select distinct on (firstname, country, phone_number) t.*
from customer_temp t
order by firstname, country, phone_number;

Rows which do not exist in a table

I have a lists of names John, Rupert, Cassandra, Amy, and I want to get names which are not exists in table: Cassandra, Amy
How should I write such query?
My table:
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 029 |
| 3 | Donald | Armstrong | 021 |
| 4 | Bob | Gates | 022 |
+----+--------+-----------+------+

Think in sets. You add names to a the result set with UNION ALL, you remove names from the result set with EXCEPT.
select 'John'
union all
select 'Rupert'
union all
select 'Cassandra'
union all
select 'Amy'
except
select name from mytable;

Build up a list of your names to check and do a left join to the users table:
with to_check (name) as (
values
('John'), ('Rupert'), ('Cassandra'), ('Amy')
)
select tc.name as missing_name
from to_check tc
left join the_table tt on tt.name = tc.name
where tt.name is null;
SQLFiddle example: http://sqlfiddle.com/#!15/5c4f5/1

Hope your list is in form of table lets its be table b and your original table as a
now SQL query goes like it
Select name from a where name not in (select name from b);
Think this will give you solution as per my understanding. Also if further details are required please comment.
Also its more important to search for an answer as it look like its a question from a book/Class. Please try out to find solution could have got much more information like link below
How to write "not in ()" sql query using join

Splitting a string column in BigQuery

Let's say I have a table in BigQuery containing 2 columns. The first column represents a name, and the second is a delimited list of values, of arbitrary length. Example:
Name | Scores
-----+-------
Bob |10;20;20
Sue |14;12;19;90
Joe |30;15
I want to transform into columns where the first is the name, and the second is a single score value, like so:
Name,Score
Bob,10
Bob,20
Bob,20
Sue,14
Sue,12
Sue,19
Sue,90
Joe,30
Joe,15
Can this be done in BigQuery alone?

Good news everyone! BigQuery can now SPLIT()!
Look at "find all two word phrases that appear in more than one row in a dataset".
There is no current way to split() a value in BigQuery to generate multiple rows from a string, but you could use a regular expression to look for the commas and find the first value. Then run a similar query to find the 2nd value, and so on. They can all be merged into only one query, using the pattern presented in the above example (UNION through commas).

Trying to rewrite Elad Ben Akoune's answer in Standart SQL, the query becomes like this;
WITH name_score AS (
SELECT Name, split(Scores,';') AS Score
FROM (
(SELECT * FROM (SELECT 'Bob' AS Name ,'10;20;20' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Sue' AS Name ,'14;12;19;90' AS Scores))
UNION ALL
(SELECT * FROM (SELECT 'Joe' AS Name ,'30;15' AS Scores))
))
SELECT name, score
FROM name_score
CROSS JOIN UNNEST(name_score.score) AS score;
And this outputs;
+------+-------+
| name | score |
+------+-------+
| Bob | 10 |
| Bob | 20 |
| Bob | 20 |
| Sue | 14 |
| Sue | 12 |
| Sue | 19 |
| Sue | 90 |
| Joe | 30 |
| Joe | 15 |
+------+-------+

If someone is still looking for an answer
select Name,split(Scores,';') as Score
from (
# replace the inner custome select with your source table
select *
from
(select 'Bob' as Name ,'10;20;20' as Scores),
(select 'Sue' as Name ,'14;12;19;90' as Scores),
(select 'Joe' as Name ,'30;15' as Scores)
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding duplicated column value pairs in sql table - sql

You can do it with exists: select t.* from tablename t where exists ( select 1 from tablename where pairing_id <> t.pairing_id and player1_id = t.player2_id and player2_id = t.player1_id )

Related

In a query (no editing of tables) how do I join data without any similarities?

SQL Server stored procedure inserting duplicate rows

Why intermediate relation created in postgresql query can not be referenced in where clause filter?

Rows which do not exist in a table

Splitting a string column in BigQuery

Categories

Resources