how to perform sql actions/query for duplicate rows

how to perform sql actions/query for duplicate rows - sql

I have 2 tables:
1-brokers(this is a company that could have multiple broker individuals)
and
2-brokerIndividuals (A person/individuals table that has a foreign key of broker company it belongs to and the individuals details)
I'm trying to create a unique index column for brokers table where the fields companyName are unique and isDeleted is NULL. Currently, the table is already populated so I want to write an SQL QUERY to find duplicate rows and whenever there are rows with the same companyName and isDeleted=NULL, I would like to perform 2 actions/queries:
1-keep the first row as it is and changes other duplicates(rows following the first duplicate) rows' isDeleted columns value to true.
2- associate or change the foreign key in brokerIndividuals for the duplicate rows for the first row.
The verbal description of what I am trying to do is: soft delete the duplicate rows and associate their corresponding brokerIndividuals to the first occurrence of duplicates. Table needs to have 1 occurrence of companyName where isDeleted is NULL.
I am using knex.js ORM so if that help's you can also suggest a solution using knex functions but knex doesn't support partial index yet( Knex.js - How to create unique index with 'where' clause? ) so I have to use the raw SQL method. Plus the DB I'm using is mssql(version: 6.0.1).

Here's a full test case (commented), with link to the fiddle:
Working test case, tested with MySQL 5.5, 5.6, 5.7, 8.0 and MariaDB up to 10.6
Create the tables and insert initial data with duplicate company_name entries:
CREATE TABLE brokers (
id int primary key auto_increment
, company_name VARCHAR(30)
, isDeleted boolean default null
);
CREATE TABLE brokerIndividuals (
id int primary key auto_increment
, broker_id int references brokers (id)
);
INSERT INTO brokers (company_name) VALUES
('name1')
, ('name1')
, ('name1')
, ('name1')
, ('name123')
, ('name123')
, ('name123')
, ('name123')
;
INSERT INTO brokerIndividuals (broker_id) VALUES
(2)
, (7)
;
SELECT * FROM brokers;
+----+--------------+-----------+
| id | company_name | isDeleted |
+----+--------------+-----------+
| 1 | name1 | null |
| 2 | name1 | null |
| 3 | name1 | null |
| 4 | name1 | null |
| 5 | name123 | null |
| 6 | name123 | null |
| 7 | name123 | null |
| 8 | name123 | null |
+----+--------------+-----------+
SELECT * FROM brokerIndividuals;
+----+-----------+
| id | broker_id |
+----+-----------+
| 1 | 2 |
| 2 | 7 |
+----+-----------+
Adjust brokers to determine isDeleted based on the MIN(id) per company_name:
UPDATE brokers
JOIN (
SELECT company_name, MIN(id) AS id
FROM brokers
GROUP BY company_name
) AS x
ON x.company_name = brokers.company_name
AND isDeleted IS NULL
SET isDeleted = CASE WHEN (x.id <> brokers.id) THEN 1 END
;
The updated brokers contents:
SELECT * FROM brokers;
+----+--------------+-----------+
| id | company_name | isDeleted |
+----+--------------+-----------+
| 1 | name1 | null |
| 2 | name1 | 1 |
| 3 | name1 | 1 |
| 4 | name1 | 1 |
| 5 | name123 | null |
| 6 | name123 | 1 |
| 7 | name123 | 1 |
| 8 | name123 | 1 |
+----+--------------+-----------+
For brokerIndividuals, find / set the correct broker_id:
UPDATE brokerIndividuals
JOIN brokers AS b1
ON b1.id = brokerIndividuals.broker_id
JOIN brokers AS b2
ON b1.company_name = b2.company_name
AND b2.isDeleted IS NULL
SET brokerIndividuals.broker_id = b2.id
;
New contents:
SELECT * FROM brokerIndividuals;
+----+-----------+
| id | broker_id |
+----+-----------+
| 1 | 1 |
| 2 | 5 |
+----+-----------+

Related

Generate matrix of access in source-target table

I have two tables in PostgreSQL, class and inheritance.
Each row in inheritance has 2 class IDs source_id and target_id:
CREATE TABLE public.class (
id bigint NOT NULL DEFAULT nextval('class_id_seq'::regclass),
name character varying(500) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT class_pkey PRIMARY KEY (id)
)
CREATE TABLE public.inheritance (
id bigint NOT NULL DEFAULT nextval('inherited_id_seq'::regclass),
source_id bigint NOT NULL,
target_id bigint NOT NULL,
CONSTRAINT inherited_pkey PRIMARY KEY (id),
CONSTRAINT inherited_source_id_fkey FOREIGN KEY (source_id)
REFERENCES public.class (id),
CONSTRAINT inherited_target_id_fkey FOREIGN KEY (target_id)
REFERENCES public.class (id)
)
I want to create Access Matrix between all classes based in inheritance relationship in inheritance table.
I try this code:
select * , case when id in (select target_id from inheritance where source_id=1) then 1 else 0 end as "1"
, case when id in (select target_id from inheritance where source_id=2) then 1 else 0 end as "2"
, case when id in (select target_id from inheritance where source_id=3) then 1 else 0 end as "3"
, case when id in (select target_id from inheritance where source_id=4) then 1 else 0 end as "4"
, case when id in (select target_id from inheritance where source_id=5) then 1 else 0 end as "5"
, case when id in (select target_id from inheritance where source_id=6) then 1 else 0 end as "6"
, case when id in (select target_id from inheritance where source_id=7) then 1 else 0 end as "7"
, case when id in (select target_id from inheritance where source_id=8) then 1 else 0 end as "8"
, case when id in (select target_id from inheritance where source_id=9) then 1 else 0 end as "9"
from class
and get the right answer, but it's just for 9 static rows in class.
How can I get all number of rows in class using a dynamic SQL command?
If we can't do it with SQL, how can we do it with PL/pgSQL?

Static solution
SQL demands to know name and type of each result column (and consequently their number) at call time. You cannot derive result columns from data dynamically with plain SQL. You can use an array or a document type instead of separate columns:
SELECT *
FROM class c
LEFT JOIN (
SELECT target_id AS id, array_agg(source_id) AS sources
FROM (SELECT source_id, target_id FROM inheritance i ORDER BY 1,2) sub
GROUP BY 1
) i USING (id);
id
name
sources
1
c1
{2,3,4}
2
c2
{5}
3
c3
{5,6,7}
4
c4
{7}
5
c5
{8}
6
c6
{9}
7
c7
{9}
8
c8
null
9
c9
null
Dynamic solution
If that's not good enough you need dynamic SQL with two round-trips to the DB server: 1. Generate SQL. 2. Execute SQL. Using the crosstab() function from the additional module tablefunc. If you are unfamiliar, read this first:
PostgreSQL Crosstab Query
Generate SQL:
SELECT format(
$q$SELECT *
FROM class c
LEFT JOIN crosstab(
'SELECT target_id, source_id, 1 FROM inheritance ORDER BY 1,2'
, 'VALUES (%s)'
) AS ct (id int, %s int)
USING (id)
ORDER BY id;
$q$
, string_agg(c.id::text, '), (')
, string_agg('"' || c.id || '"', ' int, ')
)
FROM (SELECT id FROM class ORDER BY 1) c;
Returns a query of this form, which we ...
2. Execute:
SELECT *
FROM class c
LEFT JOIN crosstab(
'SELECT target_id, source_id, 1 FROM inheritance ORDER BY 1,2'
, 'VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9)'
) AS ct (id int, "1" int, "2" int, "3" int, "4" int, "5" int, "6" int, "7" int, "8" int, "9" int)
USING (id)
ORDER BY id;
... to get:
id | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
----+------+---+---+---+---+---+---+---+---+---
1 | c1 | | 1 | 1 | 1 | | | | |
2 | c2 | | | | | 1 | | | |
3 | c3 | | | | | 1 | 1 | 1 | |
4 | c4 | | | | | | | 1 | |
5 | c5 | | | | | | | | 1 |
6 | c6 | | | | | | | | | 1
7 | c7 | | | | | | | | | 1
8 | c8 | | | | | | | | |
9 | c9 | | | | | | | | |
db<>fiddle here
See:
Dynamic alternative to pivot with CASE and GROUP BY
Dynamically convert hstore keys into columns for an unknown set of keys
Dynamic execution with psql
Still two round-trips to the server, but only a single command.
Both of the following solutions use psql meta-commands and only work from within psql!
With \gexec
Using the standard interactive terminal, you can feed the generated SQL back to the Postgres server for execution directly with \gexec:
test=> SELECT format(
$q$SELECT *
FROM class c
LEFT JOIN crosstab(
'SELECT target_id, source_id, 1 FROM inheritance ORDER BY 1,2'
, 'VALUES (%s)'
) AS ct (id int, %s int)
USING (id)
ORDER BY id;
$q$
, string_agg(c.id::text, '), (')
, string_agg('c' || c.id, ' int, ')
)
FROM (SELECT id FROM class ORDER BY 1) c\gexec
Same result.
With \crosstabview
test=> SELECT *
test-> FROM class c
test-> LEFT JOIN (
test(> SELECT target_id AS id, source_id, 1 AS val
test(> FROM inheritance
test(> ) i USING (id)
test-> \crosstabview id source_id val
id | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
----+---+---+---+---+---+---+---+---+--
1 | 1 | 1 | 1 | | | | | |
2 | | | | 1 | | | | |
3 | | | | 1 | 1 | 1 | | |
4 | | | | | | 1 | | |
5 | | | | | | | 1 | |
6 | | | | | | | | 1 |
7 | | | | | | | | 1 |
8 | | | | | | | | |
9 | | | | | | | | |
(9 rows)
See (with related answers for both):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
There are lots of subtleties in these solutions ...
Aside
Assuming there are some mechanisms in place to disallow duplicates and directly contradicting relationships. Like:
CREATE UNIQUE INDEX inheritance_uni_idx
ON inheritance (GREATEST(source_id, target_id), LEAST(source_id, target_id));
See:
How to create a Postgres table with unique combined primary key?

Update statement to update two columns of a table using the next value of a column in same table

I have a table in which I want to update two columns. below is example:
Actual table:
Team | PLAYERNUM| NAME
--------+----------+--------
A | 1 | ONE |
A | 2 | TWO |
A | 3 | THREE|
B | 1 | FOUR |
B | 2 | FIVE |
B | 3 | SIX |
Expected result:
Team |PLAYERNUM | NAME
--------+----------+--------
A | 1 | ONE |
A | 2 | TWO |
A | 3 | THREE|
A | 4 | FOUR |
A | 5 | FIVE |
A | 6 | SIX |
unique constraint is enable on columns 'Team' and 'PLAYERNUM'. Now I want to update all the rows with Team as 'B' to 'A'. I'm getting a unique constraint violation error because the 'PLAYERNUM' are unique. Any clue on how to update Team B and PLAYERNUM to 4 5 6.

You can use common-table-expression to with row_number() window analytic function to perform update operation :
update tab t
set playernum = (
with cte as (
select t.*, row_number() over (order by Team,playernum) as rn
from tab t
)
select rn
from cte
where cte.team = t.team and cte.playernum = t.playernum
), team = 'A';
Demo
if only the column team is updated as team = 'A', you'd get primary key constraint violation error(try in the demo), but the above style yields no error.

SQL query by compound index

Let's say I have a table items with columns id type number room, id is primary key, (type, number) is a unique compound key; And a table inventory with columns id, item_type, item_number, owner, id is primary key, (type, number) is a unique compound key.
Example:
items
| id | type | number | room |
+----+---------+--------+------+
| 1 | laptop | 1 | 12 |
| 2 | laptop | 2 | 13 |
| 3 | desktop | 1 | 13 |
inventory
| id | item_type | item_number | owner |
+----+-----------+-------------+-------+
| 1 | laptop | 1 | Joe |
| 2 | laptop | 2 | Joe |
| 3 | desktop | 1 | Susan |
How do I query all items owned by Joe? If I do
SELECT *
FROM items
WHERE (type, number) IN (
SELECT item_type, item_number FROM inventory WHERE owner = 'Joe'
)
I only get one row in the result, though subquery returns multiple rows. I can't seem to do join on multiple columns either, like
SELECT *
FROM items
JOIN inventory ON inventory.item_type = items.type,
inventory.item_number = items.number`
WHERE inventory.owner = 'Joe'

You ought to combine the join conditions with AND, not with a comma.

find set of row, duplicate list, before insert

I have table (it's a list of struct with 4 integers, first id is list id)
id | idL | idA(null) | idB(null) | idC
1 | 1 | 2 | null | 1
2 | 1 | 4 | null | 1
3 | 1 | null | 1 | 1
4 | 2 | 2 | null | 1
5 | 2 | 4 | null | 1
6 | 3 | 6 | null | 1
7 | 3 | null | 4 | 1
Now I need to insert 4th list to this table
idA | idB | idC
2 | null | 1
4 | null | 1
null | 1 | 1
but, it's already exist (list id = 1)
idA | idB | idC
2 | null | 1
4 | null | 1
alse exist (idL = 2)
idA | idB | idC
2 | null | 1
4 | null | 1
null | 7 | 1
does not exist.
How to find duplicate before insert it to table

It appears to be just a matter of insert from (select not in).
Try this example:
SQLFiddle
Disclaimer: In the example data you provided rows 2 and 4 got a identical idA,idB,idC set.
If that columns cannot form a unique and you already got that tuple in copy table and you need one row in copy table for each row in original table that ill be a lot harder because for a such row in copy there's no way to tell the row in original it's related.

if values is in table temp and you know the list id.
you can use "Except"
eg:
insert into list (idL, idA, idB, idC)
select #list_id, t.idA, t.idB, t.idC
from
(
select idA, idB, idC
from #new_values
except
select idA, idB, idC
from list
) t

SQL Performance: Using Union and Subqueries

Hi stackoverflow(My first question!),
We're doing something like an SNS, and got a question about optimizing queries.
Using mysql 5.1, the current table was created with:
CREATE TABLE friends(
user_id BIGINT NOT NULL,
friend_id BIGINT NOT NULL,
PRIMARY KEY (user_id, friend_id)
) ENGINE INNODB;
Sample data is populated like:
INSERT INTO friends VALUES
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,3),
(2,4),
(3,1),
(3,2),
(4,1),
(4,2),
(5,1),
(5,6),
(6,5),
(7,8),
(8,7);
The business logic: we need to figure out which users are friends or friends of friends for a given user.
The current query for this for a user with user_id=1 is:
SELECT friend_id FROM friends WHERE user_id = 1
UNION
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
);
The expected result is(order doesn't matter):
2
3
4
5
1
6
As you can see, the above query performs the subquery "SELECT friend_id FROM friends WHERE user_id = 1" twice.
So, here is the question. If performance is your primary concern, how would you change the above query or schema?
Thanks in advance.

In this particular case, you can use a JOIN:
SELECT DISTINCT f2.friend_id
FROM friends AS f1
JOIN friends AS f2 ON f1.friend_id=f2.user_id OR f2.user_id=1
WHERE f1.user_id=1;
Examining each query suggests the JOIN will about as performant as the UNION in a big-O sense, though perhaps faster by a constant factor. Jasie's query looks like it might be big-O faster.
EXPLAIN SELECT friend_id FROM friends WHERE user_id = 1
UNION
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
);
+----+--------------------+------------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| 1 | PRIMARY | friends | ref | PRIMARY | PRIMARY | 8 | const | 4 | Using index |
| 2 | UNION | friends | index | NULL | PRIMARY | 16 | NULL | 16 | Using where; Using index; Using temporary |
| 3 | DEPENDENT SUBQUERY | friends | eq_ref | PRIMARY | PRIMARY | 16 | const,func | 1 | Using index |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------------+------------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
EXPLAIN SELECT DISTINCT f2.friend_id
FROM friends AS f1
JOIN friends AS f2
ON f1.friend_id=f2.user_id OR f2.user_id=1
WHERE f1.user_id=1;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------------------+
| 1 | SIMPLE | f1 | ref | PRIMARY | PRIMARY | 8 | const | 4 | Using index; Using temporary |
| 1 | SIMPLE | f2 | index | PRIMARY | PRIMARY | 16 | NULL | 16 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------------------+
EXPLAIN SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
) OR user_id = 1;
+----+--------------------+---------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| 1 | PRIMARY | friends | index | PRIMARY | PRIMARY | 16 | NULL | 16 | Using where; Using index; Using temporary |
| 2 | DEPENDENT SUBQUERY | friends | eq_ref | PRIMARY | PRIMARY | 16 | const,func | 1 | Using index |
+----+--------------------+---------+--------+---------------+---------+---------+------------+------+-------------------------------------------+

No need for the UNION. Just include an OR with the user_id of the beginning user:
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
) OR user_id = 1;
+-----------+
| friend_id |
+-----------+
| 2 |
| 3 |
| 4 |
| 5 |
| 1 |
| 6 |
+-----------+

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to perform sql actions/query for duplicate rows - sql

Related

Generate matrix of access in source-target table

Update statement to update two columns of a table using the next value of a column in same table

SQL query by compound index

find set of row, duplicate list, before insert

SQL Performance: Using Union and Subqueries

Categories

Resources