Select from cross-reference based on inclusion (column values being subset)

Select from cross-reference based on inclusion (column values being subset) - sql

Suppose I have a cross-reference table t with the following data:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a subset of a given set?
For example, for some set (1,2,3,4,5), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a subset of (1,2,3,4,5).

Hmmm . . . One way uses aggregation:
select a_id
from t
group by a_id
having sum(case when b_id not in (1, 2, 3, 4, 5) then 1 else 0 end) = 0;
However, assuming you have an a table, then I prefer this method:
select a_id
from a
where not exists (select 1
from t
where t.a_id = a.a_id and t.b_id not in (1, 2, 3, 4, 5)
);
This saves the expense of aggregation and the lookup can take advantage of an appropriate index (on t(a_id, b_id)) so this should have better performance.

Related

how to "deepcopy" rows

My question is similar to this one but more involved. Suppose I have a table A with id idA, and another table B with idB and foreign key idA. I would like to duplicate all entries of A, including corresponding entries in B. For example, if I have the following tables at the start:
A
|---|
|idA|
|---|
| 1 |
| 2 |
| 3 |
|---|
B
|---|---|
|idB|idA|
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
|---|---|
Then the result should be:
A
|---|
|idA|
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
|---|
B
|---|---|
|idB|idA|
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 4 |
| 5 | 4 |
| 6 | 5 |
|---|---|

This is quite tricky. You need to insert the ids into the a -- but then be able to match them back to the existing ids to insert the right values into b.
A generic solution looks like this:
with i as (
insert into a
select . . . -- the other columns you want
from a
order by idA
returning *
),
a_mapping (
select a.idA, i.idA as new_idA
from (select a.*, row_number() over (order by idA) as seqnum
from a
) a join
(select i.*, row_number() over (order by idA) as seqnum
from i
) i
on a.seqnum = i.seqnum
)
insert into b (idA) (
select am.new_idA
from b join
a_mapping am
on b.idA = am.idA;
Note: If you have another unique column or columns in the row, then the mapping is a little easier to generate. Of course, if you are copying all the columns, then nothing else is unique, so you do need the row_number().
Of course, for your very simple example, you don't need a mapping table. You can just use:
with i as (
insert into a
select . . . -- the other columns you want
from a
order by idA
returning *
)
insert into b (idA) (
select i.idA
from i

I have an approach which may be equivalent to what Gordon Linoff suggests, I would be grateful if you could point out any flaws!
Let's set up the tables:
CREATE TABLE A(
idA SERIAL PRIMARY KEY,
txt varchar);
INSERT INTO A(txt)
VALUES ('A1'), ('A2'),('A3');
CREATE TABLE B(
idB SERIAL PRIMARY KEY,
idA int REFERENCES A(idA),
txt varchar);
INSERT INTO B(idA, txt)
VALUES (1, 'A1.B1'), (1, 'A1.B2'), (2, 'A2.B1');
so the initial data looks as follows:
SELECT * FROM (A LEFT JOIN B ON A.idA=B.idA) ORDER BY A.idA, B.idB;
ida | txt | idb | ida | txt
-----+-----+-----+-----+-------
1 | A1 | 1 | 1 | A1.B1
1 | A1 | 2 | 1 | A1.B2
2 | A2 | 3 | 2 | A2.B1
3 | A3 | | |
(4 rows)
Now, we can use the NEXTVAL function to generate the mappings directly:
CREATE TEMP TABLE tmp_A_new AS (
SELECT *, NEXTVAL('A_idA_seq') as newidA
FROM A ORDER BY idA -- order probably not needed
);
INSERT INTO A(idA, txt) (SELECT newidA, txt FROM tmp_A_new);
CREATE TEMP TABLE tmp_B_new AS (
SELECT B.idB, newidA, B.txt, NEXTVAL('B_idB_seq') as newidB
FROM B, tmp_A_new WHERE B.idA=tmp_A_new.idA ORDER BY idB
);
INSERT INTO B(idB, idA, txt) (SELECT newidB, newidA, txt FROM tmp_B_new);
The results look correct:
SELECT * FROM (A LEFT JOIN B ON A.idA=B.idA) ORDER BY A.idA, B.idB;
ida | txt | idb | ida | txt
-----+-----+-----+-----+-------
1 | A1 | 1 | 1 | A1.B1
1 | A1 | 2 | 1 | A1.B2
2 | A2 | 3 | 2 | A2.B1
3 | A3 | | |
4 | A1 | 4 | 4 | A1.B1
4 | A1 | 5 | 4 | A1.B2
5 | A2 | 6 | 5 | A2.B1
6 | A3 | | |
(8 rows)
Note that this could be continued further down to C, D, etc.
I would be glad for any comments :)

Replacing set of rows with another set in sqlite

I've a table values with columns like this:
id: integer primary key
value: varchar(128)
type_id: integer (foreign key)
owner_id: integer (foreign key)
and some sample data:
id value type_id owner_id
...
5 aaa 0 1
6 bbb 0 2 // Rows
7 ccc 1 2 // to
8 ddd 1 2 // be
9 eee 2 2 // replaced
10 fff 0 3
...
Now I would like to replace all rows where owner_id == 2 with a new set of data. Simple approach is to DELETE all rows for owner_id == 2 and INSERT new ones. However I wonder if there is another solution?
In my case:
New set may contain exactly the same data (no action needed).
Or it could contain the same data but one row (deletion needed). Example: no more bbb with type_id == 0
Or there is one more row (insertion needed). Example: bbb, ccc, ddd and eee with exactly the same values for type_id plus ggg with type_id = 1
Or one of the values in values column changed (update needed). Example: exactly the same data but instead of ccc with type_id == 1 there is ggg with type_id == 1
It can be also any combination of operations above.
The reason I try to avoid DELETE + INSERT that I'll have many such updates and with such approach id will start growing fast.

As you don't seem to be around to respond to comments, let's get started.
In line with my above comments, I did iron out (what appears to me as) some wrinkles:
in your request: "no more bbb with type_id == 1" - which is not part of your sample data - going for type_id 0), and
your sample data: (values "ccc" and "ddd" for type_id 1 and owner_id - going for unique owner_id type_id combinations).
If applicable, you might enforce the latter by:
CREATE UNIQUE INDEX ValuesTable_TypeOwner ON ValuesTable(owner_id, type_id);
NB: I changed the tablename as VALUES is a SQL reserved word.
You might want to try along (pulling the to be applied modifications from a table called Changes):
Delete no longer existing owner_id type_id combinations:
WITH
To_Delete (id) AS (
SELECT
id
FROM ValuesTable V
JOIN Changes C
ON V.owner_id = C.owner_id
AND V.type_id
NOT IN (SELECT type_id
FROM Changes
WHERE owner_id = C.owner_id)
)
DELETE FROM ValuesTable
WHERE id IN (SELECT id FROM To_Delete)
;
Update deviating values:
WITH
To_Update (id) AS (
SELECT
id
FROM ValuesTable V
JOIN Changes C
ON V.owner_id = C.owner_id
AND V.type_id = C.type_id
AND V.value <> C.value
)
UPDATE ValuesTable
SET value = (SELECT value
FROM Changes
WHERE ValuesTable.owner_id = owner_id
AND ValuesTable.type_id = type_id
)
WHERE id IN (SELECT id FROM To_Update)
;
Insert new owner_id type_id combinations:
WITH
To_Insert (value, type_id, owner_id) AS (
SELECT
value
, type_id
, owner_id
FROM Changes
WHERE NOT EXISTS
(SELECT 1
FROM ValuesTable
WHERE Changes.owner_id = owner_id
AND Changes.type_id = type_id
)
)
INSERT INTO ValuesTable (value, type_id, owner_id)
SELECT value, type_id, owner_id FROM To_Insert
;
Starting from
ValuesTable Changes
| id | value | type_id | owner_id | | value | type_id | owner_id |
|----|-------|---------|----------| |-------|---------|----------|
| 5 | aaa | 0 | 1 | | ccc | 1 | 2 |
| 6 | bbb | 0 | 2 | | ddd | 2 | 2 |
| 7 | ccc | 1 | 2 | | xxx | 3 | 2 |
| 8 | ddd | 2 | 2 | | yyy | 4 | 2 |
| 9 | eee | 3 | 2 |
| 10 | fff | 0 | 3 |
it returns:
| id | value | type_id | owner_id |
|----|-------|---------|----------|
| 5 | aaa | 0 | 1 |
| 7 | ccc | 1 | 2 |
| 8 | ddd | 2 | 2 |
| 9 | xxx | 3 | 2 |
| 11 | yyy | 4 | 2 |
| 10 | fff | 0 | 3 |
See it in action: SQL Fiddle.
NB: Instead of using a Changes table, the WITH clause could, of course, be extended accordingly.
Please comment if and as this requires adjustment / further detail.

Select from cross-reference based on inclusion (column values being superset)

Given a cross-reference table t relating table a with b:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a superset of a given set?
For example, for the set (2,3), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a superset of (2,3).
The best solution I've found so far (thanks to this answer):
select id
from a
where 2 = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);
But I'd prefer to avoid calculating stuff like cardinality before running the query.

You can simply adapt the query as:
select id
from a cross join
(select count(*) as cnt
from t
where . . .
) x
where x.cnt = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);

find set of row, duplicate list, before insert

I have table (it's a list of struct with 4 integers, first id is list id)
id | idL | idA(null) | idB(null) | idC
1 | 1 | 2 | null | 1
2 | 1 | 4 | null | 1
3 | 1 | null | 1 | 1
4 | 2 | 2 | null | 1
5 | 2 | 4 | null | 1
6 | 3 | 6 | null | 1
7 | 3 | null | 4 | 1
Now I need to insert 4th list to this table
idA | idB | idC
2 | null | 1
4 | null | 1
null | 1 | 1
but, it's already exist (list id = 1)
idA | idB | idC
2 | null | 1
4 | null | 1
alse exist (idL = 2)
idA | idB | idC
2 | null | 1
4 | null | 1
null | 7 | 1
does not exist.
How to find duplicate before insert it to table

It appears to be just a matter of insert from (select not in).
Try this example:
SQLFiddle
Disclaimer: In the example data you provided rows 2 and 4 got a identical idA,idB,idC set.
If that columns cannot form a unique and you already got that tuple in copy table and you need one row in copy table for each row in original table that ill be a lot harder because for a such row in copy there's no way to tell the row in original it's related.

if values is in table temp and you know the list id.
you can use "Except"
eg:
insert into list (idL, idA, idB, idC)
select #list_id, t.idA, t.idB, t.idC
from
(
select idA, idB, idC
from #new_values
except
select idA, idB, idC
from list
) t

Set-based way to calculate family ranges in SQL?

I have a table that contains parents and 0 or more children for each parent, with a flag indicating which records are parents. All of the members of a given family have the same parent id, and the parent always has the lowest id in a given family. Also, each child has a value associated with it. (Specifically, this is a database of emails and attachments, where each parent is an email and the children are the attachments.)
I have two fields I need to calculate:
Range = {lowest id in family} - {highest id in family} [populated for all members]
Value-list = {delimited list of the values of each child, in id order} [only for parent]
So, given this:
Id | Parent| HasChildren| Value | Range | Value-list
----------------------------------------|-----------
1 | 1 | 1 | | |
2 | 1 | 0 | a | |
3 | 1 | 0 | b | |
4 | 4 | 1 | | |
5 | 4 | 0 | c | |
6 | 6 | 0 | | |
I would like to end up with this:
Id | Parent| HasChildren| Value | Range | Value-list
----------------------------------------|-----------
1 | 1 | 1 | | 1-3 | a;b
2 | 1 | 0 | a | 1-3 |
3 | 1 | 0 | b | 1-3 |
4 | 4 | 1 | | 4-5 | c
5 | 4 | 0 | c | 4-5 |
6 | 6 | 0 | | 6-6 |
How can I do this efficiently? Ideally, I'd like to do this with just set-based logic, without cursors, or even stored procedures. Temporary tables are fine.
I'm working in T-SQL, if that makes a difference, though I'd be curious to see platform agnostic answers.

The following SQLFiddle Solution should do the job for you, however as #Allan mentioned, you might want to revise your database structure.
Using CTE's:
Note: my query uses table1 as name of Your table
with cte as(
select parent
,ValueList= stuff(( select ';' +isnull(t2.Value, '')
from table1 t2
where t1.parent=t2.parent
order by t2.value
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
from table1 t1
group by parent
),
cte2 as (select parent
, min(id) as firstID
, max(id) as LastID
from table1
group by parent)
select *
,(select FirstID from cte2 t2 where t2.parent=t1.parent)+'-'+(select LastID from cte2 t2 where t2.parent=t1.parent) as [Range]
,(select ValueList from cte t2 where t1.parent=t2.parent and t1.[haschildren]='1') as [Value -List]
from table1 t1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select from cross-reference based on inclusion (column values being subset) - sql

Related

how to "deepcopy" rows

Replacing set of rows with another set in sqlite

Select from cross-reference based on inclusion (column values being superset)

find set of row, duplicate list, before insert

Set-based way to calculate family ranges in SQL?

Categories

Resources