Replacing set of rows with another set in sqlite - sql

I've a table values with columns like this:
id: integer primary key
value: varchar(128)
type_id: integer (foreign key)
owner_id: integer (foreign key)
and some sample data:
id value type_id owner_id
...
5 aaa 0 1
6 bbb 0 2 // Rows
7 ccc 1 2 // to
8 ddd 1 2 // be
9 eee 2 2 // replaced
10 fff 0 3
...
Now I would like to replace all rows where owner_id == 2 with a new set of data. Simple approach is to DELETE all rows for owner_id == 2 and INSERT new ones. However I wonder if there is another solution?
In my case:
New set may contain exactly the same data (no action needed).
Or it could contain the same data but one row (deletion needed). Example: no more bbb with type_id == 0
Or there is one more row (insertion needed). Example: bbb, ccc, ddd and eee with exactly the same values for type_id plus ggg with type_id = 1
Or one of the values in values column changed (update needed). Example: exactly the same data but instead of ccc with type_id == 1 there is ggg with type_id == 1
It can be also any combination of operations above.
The reason I try to avoid DELETE + INSERT that I'll have many such updates and with such approach id will start growing fast.

As you don't seem to be around to respond to comments, let's get started.
In line with my above comments, I did iron out (what appears to me as) some wrinkles:
in your request: "no more bbb with type_id == 1" - which is not part of your sample data - going for type_id 0), and
your sample data: (values "ccc" and "ddd" for type_id 1 and owner_id - going for unique owner_id type_id combinations).
If applicable, you might enforce the latter by:
CREATE UNIQUE INDEX ValuesTable_TypeOwner ON ValuesTable(owner_id, type_id);
NB: I changed the tablename as VALUES is a SQL reserved word.
You might want to try along (pulling the to be applied modifications from a table called Changes):
Delete no longer existing owner_id type_id combinations:
WITH
To_Delete (id) AS (
SELECT
id
FROM ValuesTable V
JOIN Changes C
ON V.owner_id = C.owner_id
AND V.type_id
NOT IN (SELECT type_id
FROM Changes
WHERE owner_id = C.owner_id)
)
DELETE FROM ValuesTable
WHERE id IN (SELECT id FROM To_Delete)
;
Update deviating values:
WITH
To_Update (id) AS (
SELECT
id
FROM ValuesTable V
JOIN Changes C
ON V.owner_id = C.owner_id
AND V.type_id = C.type_id
AND V.value <> C.value
)
UPDATE ValuesTable
SET value = (SELECT value
FROM Changes
WHERE ValuesTable.owner_id = owner_id
AND ValuesTable.type_id = type_id
)
WHERE id IN (SELECT id FROM To_Update)
;
Insert new owner_id type_id combinations:
WITH
To_Insert (value, type_id, owner_id) AS (
SELECT
value
, type_id
, owner_id
FROM Changes
WHERE NOT EXISTS
(SELECT 1
FROM ValuesTable
WHERE Changes.owner_id = owner_id
AND Changes.type_id = type_id
)
)
INSERT INTO ValuesTable (value, type_id, owner_id)
SELECT value, type_id, owner_id FROM To_Insert
;
Starting from
ValuesTable Changes
| id | value | type_id | owner_id | | value | type_id | owner_id |
|----|-------|---------|----------| |-------|---------|----------|
| 5 | aaa | 0 | 1 | | ccc | 1 | 2 |
| 6 | bbb | 0 | 2 | | ddd | 2 | 2 |
| 7 | ccc | 1 | 2 | | xxx | 3 | 2 |
| 8 | ddd | 2 | 2 | | yyy | 4 | 2 |
| 9 | eee | 3 | 2 |
| 10 | fff | 0 | 3 |
it returns:
| id | value | type_id | owner_id |
|----|-------|---------|----------|
| 5 | aaa | 0 | 1 |
| 7 | ccc | 1 | 2 |
| 8 | ddd | 2 | 2 |
| 9 | xxx | 3 | 2 |
| 11 | yyy | 4 | 2 |
| 10 | fff | 0 | 3 |
See it in action: SQL Fiddle.
NB: Instead of using a Changes table, the WITH clause could, of course, be extended accordingly.
Please comment if and as this requires adjustment / further detail.

Related

how to perform sql actions/query for duplicate rows

I have 2 tables:
1-brokers(this is a company that could have multiple broker individuals)
and
2-brokerIndividuals (A person/individuals table that has a foreign key of broker company it belongs to and the individuals details)
I'm trying to create a unique index column for brokers table where the fields companyName are unique and isDeleted is NULL. Currently, the table is already populated so I want to write an SQL QUERY to find duplicate rows and whenever there are rows with the same companyName and isDeleted=NULL, I would like to perform 2 actions/queries:
1-keep the first row as it is and changes other duplicates(rows following the first duplicate) rows' isDeleted columns value to true.
2- associate or change the foreign key in brokerIndividuals for the duplicate rows for the first row.
The verbal description of what I am trying to do is: soft delete the duplicate rows and associate their corresponding brokerIndividuals to the first occurrence of duplicates. Table needs to have 1 occurrence of companyName where isDeleted is NULL.
I am using knex.js ORM so if that help's you can also suggest a solution using knex functions but knex doesn't support partial index yet( Knex.js - How to create unique index with 'where' clause? ) so I have to use the raw SQL method. Plus the DB I'm using is mssql(version: 6.0.1).
Here's a full test case (commented), with link to the fiddle:
Working test case, tested with MySQL 5.5, 5.6, 5.7, 8.0 and MariaDB up to 10.6
Create the tables and insert initial data with duplicate company_name entries:
CREATE TABLE brokers (
id int primary key auto_increment
, company_name VARCHAR(30)
, isDeleted boolean default null
);
CREATE TABLE brokerIndividuals (
id int primary key auto_increment
, broker_id int references brokers (id)
);
INSERT INTO brokers (company_name) VALUES
('name1')
, ('name1')
, ('name1')
, ('name1')
, ('name123')
, ('name123')
, ('name123')
, ('name123')
;
INSERT INTO brokerIndividuals (broker_id) VALUES
(2)
, (7)
;
SELECT * FROM brokers;
+----+--------------+-----------+
| id | company_name | isDeleted |
+----+--------------+-----------+
| 1 | name1 | null |
| 2 | name1 | null |
| 3 | name1 | null |
| 4 | name1 | null |
| 5 | name123 | null |
| 6 | name123 | null |
| 7 | name123 | null |
| 8 | name123 | null |
+----+--------------+-----------+
SELECT * FROM brokerIndividuals;
+----+-----------+
| id | broker_id |
+----+-----------+
| 1 | 2 |
| 2 | 7 |
+----+-----------+
Adjust brokers to determine isDeleted based on the MIN(id) per company_name:
UPDATE brokers
JOIN (
SELECT company_name, MIN(id) AS id
FROM brokers
GROUP BY company_name
) AS x
ON x.company_name = brokers.company_name
AND isDeleted IS NULL
SET isDeleted = CASE WHEN (x.id <> brokers.id) THEN 1 END
;
The updated brokers contents:
SELECT * FROM brokers;
+----+--------------+-----------+
| id | company_name | isDeleted |
+----+--------------+-----------+
| 1 | name1 | null |
| 2 | name1 | 1 |
| 3 | name1 | 1 |
| 4 | name1 | 1 |
| 5 | name123 | null |
| 6 | name123 | 1 |
| 7 | name123 | 1 |
| 8 | name123 | 1 |
+----+--------------+-----------+
For brokerIndividuals, find / set the correct broker_id:
UPDATE brokerIndividuals
JOIN brokers AS b1
ON b1.id = brokerIndividuals.broker_id
JOIN brokers AS b2
ON b1.company_name = b2.company_name
AND b2.isDeleted IS NULL
SET brokerIndividuals.broker_id = b2.id
;
New contents:
SELECT * FROM brokerIndividuals;
+----+-----------+
| id | broker_id |
+----+-----------+
| 1 | 1 |
| 2 | 5 |
+----+-----------+

Select from cross-reference based on inclusion (column values being superset)

Given a cross-reference table t relating table a with b:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a superset of a given set?
For example, for the set (2,3), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a superset of (2,3).
The best solution I've found so far (thanks to this answer):
select id
from a
where 2 = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);
But I'd prefer to avoid calculating stuff like cardinality before running the query.
You can simply adapt the query as:
select id
from a cross join
(select count(*) as cnt
from t
where . . .
) x
where x.cnt = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);

Select from cross-reference based on inclusion (column values being subset)

Suppose I have a cross-reference table t with the following data:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a subset of a given set?
For example, for some set (1,2,3,4,5), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a subset of (1,2,3,4,5).
Hmmm . . . One way uses aggregation:
select a_id
from t
group by a_id
having sum(case when b_id not in (1, 2, 3, 4, 5) then 1 else 0 end) = 0;
However, assuming you have an a table, then I prefer this method:
select a_id
from a
where not exists (select 1
from t
where t.a_id = a.a_id and t.b_id not in (1, 2, 3, 4, 5)
);
This saves the expense of aggregation and the lookup can take advantage of an appropriate index (on t(a_id, b_id)) so this should have better performance.

find set of row, duplicate list, before insert

I have table (it's a list of struct with 4 integers, first id is list id)
id | idL | idA(null) | idB(null) | idC
1 | 1 | 2 | null | 1
2 | 1 | 4 | null | 1
3 | 1 | null | 1 | 1
4 | 2 | 2 | null | 1
5 | 2 | 4 | null | 1
6 | 3 | 6 | null | 1
7 | 3 | null | 4 | 1
Now I need to insert 4th list to this table
idA | idB | idC
2 | null | 1
4 | null | 1
null | 1 | 1
but, it's already exist (list id = 1)
idA | idB | idC
2 | null | 1
4 | null | 1
alse exist (idL = 2)
idA | idB | idC
2 | null | 1
4 | null | 1
null | 7 | 1
does not exist.
How to find duplicate before insert it to table
It appears to be just a matter of insert from (select not in).
Try this example:
SQLFiddle
Disclaimer: In the example data you provided rows 2 and 4 got a identical idA,idB,idC set.
If that columns cannot form a unique and you already got that tuple in copy table and you need one row in copy table for each row in original table that ill be a lot harder because for a such row in copy there's no way to tell the row in original it's related.
if values is in table temp and you know the list id.
you can use "Except"
eg:
insert into list (idL, idA, idB, idC)
select #list_id, t.idA, t.idB, t.idC
from
(
select idA, idB, idC
from #new_values
except
select idA, idB, idC
from list
) t

Set-based way to calculate family ranges in SQL?

I have a table that contains parents and 0 or more children for each parent, with a flag indicating which records are parents. All of the members of a given family have the same parent id, and the parent always has the lowest id in a given family. Also, each child has a value associated with it. (Specifically, this is a database of emails and attachments, where each parent is an email and the children are the attachments.)
I have two fields I need to calculate:
Range = {lowest id in family} - {highest id in family} [populated for all members]
Value-list = {delimited list of the values of each child, in id order} [only for parent]
So, given this:
Id | Parent| HasChildren| Value | Range | Value-list
----------------------------------------|-----------
1 | 1 | 1 | | |
2 | 1 | 0 | a | |
3 | 1 | 0 | b | |
4 | 4 | 1 | | |
5 | 4 | 0 | c | |
6 | 6 | 0 | | |
I would like to end up with this:
Id | Parent| HasChildren| Value | Range | Value-list
----------------------------------------|-----------
1 | 1 | 1 | | 1-3 | a;b
2 | 1 | 0 | a | 1-3 |
3 | 1 | 0 | b | 1-3 |
4 | 4 | 1 | | 4-5 | c
5 | 4 | 0 | c | 4-5 |
6 | 6 | 0 | | 6-6 |
How can I do this efficiently? Ideally, I'd like to do this with just set-based logic, without cursors, or even stored procedures. Temporary tables are fine.
I'm working in T-SQL, if that makes a difference, though I'd be curious to see platform agnostic answers.
The following SQLFiddle Solution should do the job for you, however as #Allan mentioned, you might want to revise your database structure.
Using CTE's:
Note: my query uses table1 as name of Your table
with cte as(
select parent
,ValueList= stuff(( select ';' +isnull(t2.Value, '')
from table1 t2
where t1.parent=t2.parent
order by t2.value
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
from table1 t1
group by parent
),
cte2 as (select parent
, min(id) as firstID
, max(id) as LastID
from table1
group by parent)
select *
,(select FirstID from cte2 t2 where t2.parent=t1.parent)+'-'+(select LastID from cte2 t2 where t2.parent=t1.parent) as [Range]
,(select ValueList from cte t2 where t1.parent=t2.parent and t1.[haschildren]='1') as [Value -List]
from table1 t1