Insert multiple references in a nested table - sql

I have the table customer_table containing a list (nested table) of references toward rows of the account_table.
Here are my declarations :
Customer type:
CREATE TYPE customer as object(
custid integer,
infos ref type_person,
accounts accounts_list
);
accounts_list type:
CREATE TYPE accounts_list AS table of ref account;
Table:
CREATE TABLE customer_table OF customer(
custid primary key,
constraint c_inf check(infos is not null),
constraint c_acc check(accounts is not null)
)
NESTED TABLE accounts STORE AS accounts_refs_nt_table;
So I would like to insert multiple refs in my nested table when I create a customer, as an account can be shared.
I can't find out how to do that.
I tried:
INSERT INTO customer_table(
SELECT 0,
ref(p),
accounts_list(
SELECT ref(a) FROM account_table a WHERE a.accid = 0
UNION ALL
SELECT ref(a) FROM account_table a WHERE a.accid = 1
)
FROM DUAL
FROM person_table p
WHERE p.personid = 0
);
With no success.
Thank you

You can use the collect() function, e.g. in a subquery:
INSERT INTO customer_table(
SELECT 0,
ref(p),
(
SELECT CAST(COLLECT(ref(a)) AS accounts_list)
FROM account_table a
WHERE accid IN (0, 1)
)
FROM person_table p
WHERE p.personid = 0
);
As the documentation says, "To get accurate results from this function you must use it within a CAST function", so I've explicitly cast it to your account_list type.
If you don't want a subquery you could instead do:
INSERT INTO customer_table(
SELECT 0,
ref(p),
CAST(COLLECT(a.r) AS accounts_list)
FROM person_table p
CROSS JOIN (SELECT ref(a) AS r FROM account_table a WHERE accid IN (0, 1)) a
WHERE p.personid = 0
GROUP BY ref(p)
);
but I think that's a bit messier; check the performance of both though...

Related

Select records that do not have at least one child element

How can I make an SQL query to select records that do not have at least one child element?
I have 3 tables: article (~40K rows), calendar (~450K rows) and calendar_cost (~500K rows).
It is necessary to select such entries of the article table:
there are no entries in the calendar table,
if there are entries in the calendar table, then all of them should not have any entries in the calendar_cost table.
create table article (
id int PRIMARY KEY,
name varchar
);
create table calendar (
id int PRIMARY KEY,
article_id int REFERENCES article (id) ON DELETE CASCADE,
number varchar
);
create table calendar_cost (
id int PRIMARY KEY,
calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
cost_value numeric
);
insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');
insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');
insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);
As a result, "Article 2" (condition 2) and "Article 3" (condition 1) will suit us.
My SQL query is very slow (the second condition part), how can I do it optimally? Is it possible to do without "union all" operator?
-- First condition
select a.id from article a
left join calendar c on a.id = c.article_id
where c.id is null
union all
-- Second condition
select a.id from article a
where id not in(
select aa.id from article aa
join calendar c on aa.id = c.article_id
join calendar_cost cost on c.id = cost.calendar_id
where aa.id = a.id limit 1
)
UPDATE
This is how you can fill my tables with random data for about the same amount of data. The #Bohemian query is very fast, and the rest are very slow. But as soon as I applied 2 indexes, as #nik advised, all queries began to be executed very, very quickly!
do $$
declare
article_id int;
calendar_id bigint;
i int; j int;
begin
create table article (
id int PRIMARY KEY,
name varchar
);
create table calendar (
id serial PRIMARY KEY,
article_id int REFERENCES article (id) ON DELETE CASCADE,
number varchar
);
create INDEX ON calendar(article_id);
create table calendar_cost (
id serial PRIMARY KEY,
calendar_id bigint REFERENCES calendar (id) ON DELETE CASCADE,
cost_value numeric
);
create INDEX ON calendar_cost(calendar_id);
for article_id in 1..45000 loop
insert into article (id, name) values (article_id, 'Article ' || article_id);
for i in 0..floor(random() * 25) loop
insert into calendar (article_id, number) values (article_id, 'Number ' || article_id || '-' || i) returning id into calendar_id;
for j in 0..floor(random() * 2) loop
insert into calendar_cost (calendar_id, cost_value) values (calendar_id, round((random() * 100)::numeric, 3));
end loop;
end loop;
end loop;
end $$;
#Bohemian
Planning Time: 0.405 ms
Execution Time: 1196.082 ms
#nbk
Planning Time: 0.702 ms
Execution Time: 165.129 ms
#Chris Maurer
Planning Time: 0.803 ms
Execution Time: 800.000 ms
#Stu
Planning Time: 0.446 ms
Execution Time: 280.842 ms
So which query to choose now as the right one is a matter of taste.
No need to split the conditions: The only condition you need to check for is that there are no calendar_cost rows whatsoever, which is the case if there are no calendar rows.
The trick is to use outer joins, which still return the parent table but have all null values when there is no join. Further, count() does not count null values, so requiring that the count of calendar_cost is zero is all you need.
select a.id
from article a
left join calendar c on c.article_id = a.id
left join calendar_cost cost on cost.calendar_id = c.id
group by a.id
having count(cost.calendar_id) = 0
See live demo.
If there are indexes on the id columns (the usual case), this query will perform quite well given the small table sizes.
Your second condition should start just like your first one: find all the calendar entries without calendar cost and only afterwards join it to article.
select a.id
from article a
Inner Join (
Select article_id
From calendar c left join calendar_cost cc
On c.id=cc.calendar_id
Where cc.calendar_id is null
) cnone
On a.id = cnone.article_id
This approach is based on the thought that calendar entries without calendar_cost is relatively rare compared to all the calendar entries.
Your query is not valid as IN clauses don't support LIMIT
Adding some indexes on article_id and calender_id
Will help the performance
As you can see in the query plan
create table article (
id int PRIMARY KEY,
name varchar(100)
);
create table calendar (
id int PRIMARY KEY,
article_id int REFERENCES article (id) ON DELETE CASCADE,
number varchar(100)
,index(article_id)
);
create table calendar_cost (
id int PRIMARY KEY,
calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
cost_value numeric
,INDEX(calendar_id)
);
insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');
insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');
insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);
Records: 3 Duplicates: 0 Warnings: 0
Records: 3 Duplicates: 0 Warnings: 0
Records: 2 Duplicates: 0 Warnings: 2
select a.id from article a
left join calendar c on a.id = c.article_id
where c.id is null
id
3
-- First condition
EXPLAIN
select a.id from article a
left join calendar c on a.id = c.article_id
where c.id is null
union all
-- Second condition
select a.id from article a
JOIN (
select aa.id from article aa
join calendar c on aa.id = c.article_id
join calendar_cost cost on c.id = cost.calendar_id
LIMIT 1
) t1 ON t1.id <> a.id
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
a
null
index
null
PRIMARY
4
null
3
100.00
Using index
1
PRIMARY
c
null
ref
article_id
article_id
5
fiddle.a.id
3
33.33
Using where; Not exists; Using index
2
UNION
<derived3>
null
system
null
null
null
null
1
100.00
null
2
UNION
a
null
index
null
PRIMARY
4
null
3
66.67
Using where; Using index
3
DERIVED
cost
null
index
calendar_id
calendar_id
5
null
2
100.00
Using where; Using index
3
DERIVED
c
null
eq_ref
PRIMARY,article_id
PRIMARY
4
fiddle.cost.calendar_id
1
100.00
Using where
3
DERIVED
aa
null
eq_ref
PRIMARY
PRIMARY
4
fiddle.c.article_id
1
100.00
Using index
fiddle
Try the following using a combination of exists criteria.
Usually, with supporting indexes, this is more performant than simply joining tables as it offers a short-circuit to get out as soon as a match is found, where as joining typically filters after all rows are joined.
select a.id
from article a
where not exists (
select * from calendar c
where c.article_id = a.id
)
or (exists (
select * from calendar c
where c.article_id = a.id
)
and not exists (
select * from calendar_cost cc
where cc.calendar_id in (select id from calendar c where c.article_id = a.id)
)
);

Aggregate SQLite query across multiple tables using JSON1

I can't get my head around the following problem. The other day I learned how to use the JSON1 family of functions, but this time it seems to be more of an SQL issue.
This is my database setup:
CREATE TABLE persons(id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT UNIQUE)
CREATE TABLE interests(id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT UNIQUE)
CREATE TABLE persons_interests(person INTEGER, interest INTEGER, FOREIGN KEY(person) REFERENCES persons(id), FOREIGN KEY(interest) REFERENCES interests(id))
INSERT INTO persons(name) VALUES('John')
INSERT INTO persons(name) VALUES('Jane')
INSERT INTO interests(name) VALUES('Cooking')
INSERT INTO interests(name) VALUES('Gardening')
INSERT INTO interests(name) VALUES('Relaxing')
INSERT INTO persons_interests VALUES(1, 1)
INSERT INTO persons_interests VALUES(1, 2)
INSERT INTO persons_interests VALUES(2, 3)
Based on this data I'd like to get the following output, which is all interests of all persons aggregated into a single JSON array:
[{name: John, interests:[{name: Cooking},{name: Gardening}]}, {name: Jane, interests:[{name: Relaxing}]}]
Now the following is what I tried to do. Needless to say, this doesn't give me what I want:
SELECT p.name, json_object('interests', json_group_array(json_object('name', i.name))) interests
FROM persons p, interests i
JOIN persons_interests pi ON pi.person = p.id AND pi.interest = i.id
The undesired output is:
John|{"interests":[{"name":"Cooking"},{"name":"Gardening"},{"name":"Relaxing"}]}
Any help is highly appreciated!
For using json_group_array you must group line , in your case by person , except you want only one row with all your results .
Example 1)
This first version , will give you 1 json object by person , so the result will be N rows for N persons :
SELECT json_object( 'name ',
p.name,
'interests',
json_group_array(json_object('name', i.name))) jsobjects
FROM persons p, interests i
JOIN persons_interests pi ON pi.person = p.id AND pi.interest = i.id
group by p.id ;
Example 2)
This second version , will give return 1 big json array that contains all persons , but you fetch only one row .
SELECT json_group_array(jsobjects)
FROM (
SELECT json_object( 'name ',
p.name,
'interests',
json_group_array(json_object('name', i.name))) jsobjects
FROM persons p, interests i
JOIN persons_interests pi ON pi.person = p.id AND pi.interest = i.id
group by p.id
) jo ;

Select rows that have a specific set of items associated with them through a junction table

Suppose we have the following schema:
CREATE TABLE customers(
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE items(
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE customers_items(
customerid INTEGER,
itemid INTEGER,
FOREIGN KEY(customerid) REFERENCES customers(id),
FOREIGN KEY(itemid) REFERENCES items(id)
);
Now we insert some example data:
INSERT INTO customers(name) VALUES ('John');
INSERT INTO customers(name) VALUES ('Jane');
INSERT INTO items(name) VALUES ('duck');
INSERT INTO items(name) VALUES ('cake');
Let's assume that John and Jane have id's of 1 and 2 and duck and cake also have id's of 1 and 2.
Let's give a duck to John and both a duck and a cake to Jane.
INSERT INTO customers_items(customerid, itemid) VALUES (1, 1);
INSERT INTO customers_items(customerid, itemid) VALUES (2, 1);
INSERT INTO customers_items(customerid, itemid) VALUES (2, 2);
Now, what I want to do is to run two types of queries:
Select names of customers who have BOTH a duck and a cake (should return 'Jane' only).
Select names of customers that have a duck and DON'T have a cake (should return 'John' only).
For the two type of queries listed, you could use the EXISTS clause. Below is an example query using the exists clause:
SELECT cust.name
from customers AS cust
WHERE EXISTS (
SELECT 1
FROM items
INNER JOIN customers_items ON items.id = customers_items.itemid
INNER JOIN customers on customers_items.customerid = cust.id
WHERE items.name = 'duck')
AND NOT EXISTS (
SELECT 1
FROM items
INNER JOIN customers_items ON items.id = customers_items.itemid
INNER JOIN customers on customers_items.customerid = cust.id
WHERE items.name = 'cake')
Here is a working example: http://sqlfiddle.com/#!6/3d362/2

Identify Duplicate Xml Nodes

I have a set of tables (with several one-many relationships) that form a single "unit". I need to ensure that we weed out duplicates, but determining duplicates requires consideration of all the data.
To make matters worse, the DB in question is still in Sql 2000 compatibility mode, so it can't use any newer features.
Create Table UnitType
(
Id int IDENTITY Primary Key,
Action int not null,
TriggerType varchar(25) not null
)
Create Table Unit
(
Id int IDENTITY Primary Key,
TypeId int Not Null,
Message varchar(100),
Constraint FK_Unit_Type Foreign Key (TypeId) References UnitType(Id)
)
Create Table Item
(
Id int IDENTITY Primary Key,
QuestionId int not null,
Sequence int not null
)
Create Table UnitCondition
(
Id int IDENTITY Primary Key,
UnitId int not null,
Value varchar(10),
ItemId int not null
Constraint FK_UnitCondition_Unit Foreign Key (UnitId) References Unit(Id),
Constraint FK_UnitCondition_Item Foreign Key (ItemId) References Item(Id)
)
Insert into Item (QuestionId, Sequence)
Values (1, 1),
(1, 2)
Insert into UnitType(Action, TriggerType)
Values (1, 'Changed')
Insert into Unit (TypeId, Message)
Values (1, 'Hello World'),
(1, 'Hello World')
Insert into UnitCondition(UnitId, Value, ItemId)
Values (1, 'Test', 1),
(1, 'Hello', 2),
(2, 'Test', 1),
(2, 'Hello', 2)
I've created a SqlFiddle demonstrating a simple form of this issue.
A Unit is considered a Duplicate with all (non-Id) fields on the Unit, and all conditions on that Unit combined are exactly matched in every detail. Considering it like Xml - A Unit Node (containing the Unit info, and a Conditions sub-collection) is unique if no other Unit node exists that is an exact string copy
Select
Action,
TriggerType,
U.TypeId,
U.Message,
(
Select C.Value, C.ItemId, I.QuestionId, I.Sequence
From UnitCondition C
Inner Join Item I on C.ItemId = I.Id
Where C.UnitId = U.Id
For XML RAW('Condition')
) as Conditions
from UnitType T
Inner Join Unit U on T.Id = U.TypeId
For XML RAW ('Unit'), ELEMENTS
But the issue I have is that I can't seem to get the XML for each Unit to appear as a new record, and I'm not sure how to compare the Unit Nodes to look for Duplicates.
How Can I run this query to determine if there are duplicate Xml Unit nodes within the collection?
If you want to determine whether record is duplicate or not, you don't need to combine all values into one string. You can do this with ROW_NUMBER function like this:
SELECT
Action,
TriggerType,
U.Id,
U.TypeId,
U.Message,
C.Value,
I.QuestionId,
I.Sequence,
ROW_NUMBER () OVER (PARTITION BY <LIST OF FIELD THAT SHOULD BE UNIQUE>
ORDER BY <LIST OF FIELDS>) as DupeNumber
FROM UnitType T
Inner Join Unit U on T.Id = U.TypeId
Inner Join UnitCondition C on U.Id = C.UnitId
Inner Join Item I on C.ItemId = I.Id;
If DupeNumber is greater than 1, then record id a duplicate.
give this a try
this would find the pairs not unique
how to build that into you final answer - not sure - but possibly a start
select u1.id, u2.id
from unit as u1
join unit as u2
on ui.ID < u2.id
join UnitCondition uc1
on uc1.unitID = u1.ID
full outer join uc2
on uc2.unitID = u2.ID
and uc2.itemID = uc1.itemID
where uc2.itemID is null or uc1.itemID is null
So, I managed to figure out what I needed to do. It's a little clunky though.
First, you need to wrap the Xml Select statement in another select against the Unit table, in order to ensure that we end up with xml representing only that unit.
Select
Id,
(
Select
Action,
TriggerType,
IU.TypeId,
IU.Message,
(
Select C.Value, I.QuestionId, I.Sequence
From UnitCondition C
Inner Join Item I on C.ItemId = I.Id
Where C.UnitId = IU.Id
Order by C.Value, I.QuestionId, I.Sequence
For XML RAW('Condition'), TYPE
) as Conditions
from UnitType T
Inner Join Unit IU on T.Id = IU.TypeId
WHERE IU.Id = U.Id
For XML RAW ('Unit')
)
From Unit U
Then, you can wrap this in another select, grouping the xml up by content.
Select content, count(*) as cnt
From
(
Select
Id,
(
Select
Action,
TriggerType,
IU.TypeId,
IU.Message,
(
Select C.Value, C.ItemId, I.QuestionId, I.Sequence
From UnitCondition C
Inner Join Item I on C.ItemId = I.Id
Where C.UnitId = IU.Id
Order by C.Value, I.QuestionId, I.Sequence
For XML RAW('Condition'), TYPE
) as Conditions
from UnitType T
Inner Join Unit IU on T.Id = IU.TypeId
WHERE IU.Id = U.Id
For XML RAW ('Unit')
) as content
From Unit U
) as data
group by content
having count(*) > 1
This will allow you to group entire units where the whole content is identical.
One thing to watch out for though, is that to test "uniqueness", you need to guarantee that the data on the inner Xml selection(s) is always the same. To that end, you should apply ordering on the relevant data (i.e. the data in the xml) to ensure consistency. What order you apply doesn't really matter, so long as two identical collections will output in the same order.

SQL - Find duplicates with equivalencies

I'm having trouble wrapping my mind around developing this SQL query. Given the following two tables:
ACADEMIC_HISTORY ( STUDENT_ID, TERM, COURSE_ID, COURSE_GRADE )
COURSE_EQUIVALENCIES ( COURSE_ID, COURSE_ID_EQUIVALENT )
What would be the best way to detect if students have taken the same (or an equivalent) course in the past with a passing grade (C or better)?
Example
Student #1 took the course ABC001 and received a grade of C. Ten years later, the course was renamed ABC011 and the appropriate entry was made in COURSE_EQUIVALENCIES. The student retook the course under this new name and received a grade of B. How can I construct a SQL query that will detect the duplicate courses and only count the first passing grade?
(The actual case is significantly more complicated, but this should get me started.)
Thanks in advance.
EDIT:
It's not even necessary to keep or discard any information. A query that simply shows classes with duplicates will be sufficient.
you could use something like:
SELECT
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID
http://sqlfiddle.com/#!3/d608f/20
Sorry posted with a bug.. it preferred the score of the actual course requested over any equivalencies - fixed now
this only looks for one level of equivalencies.. but maybe you want to enforce that and have that part of the data entry process.. review all possible equivalencies and enter the valid ones
EDIT: for first pass of qualifying course (using numbered terms..)
SELECT TOP 1
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.TERM
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID, TERM
ORDER BY TERM ASC
http://sqlfiddle.com/#!3/fdded/6
(note TOP is a t-sql command for MySQL you need LIMIT)
The data (in LOWERCASE)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;
SET search_path='tmp';
CREATE TABLE academic_history
( student_id INTEGER NOT NULL
, course_id CHAR(6)
, course_grade CHAR(1)
, PRIMARY KEY(student_id,course_id)
);
INSERT INTO academic_history ( student_id,course_id,course_grade) VALUES
(1, 'ABC001' , 'C' )
, (1, 'ABC011' , 'B' )
, (2, 'ABC011' , 'A' )
;
CREATE TABLE course_equivalencies
( course_id CHAR(6)
, course_id_equivalent CHAR(6)
);
INSERT INTO course_equivalencies(course_id,course_id_equivalent) VALUES
( 'ABC011' , 'ABC001' )
;
The query:
-- EXPLAIN ANALYZE
WITH canon AS (
SELECT ah.student_id AS student_id
, ah.course_id AS course_id
, COALESCE (eq.course_id_equivalent,ah.course_id) AS course_id_equivalent
FROM academic_history ah
LEFT JOIN course_equivalencies eq ON eq.course_id = ah.course_id
)
SELECT h.student_id
, c.course_id_equivalent
, MIN(h.course_grade) AS the_grade
FROM academic_history h
JOIN canon c ON c.student_id = h.student_id AND c.course_id = h.course_id
GROUP BY h.student_id, c.course_id_equivalent
ORDER BY h.student_id, c.course_id_equivalent
;
The output:
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table tmp.academic_history
drop cascades to table tmp.course_equivalencies
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "academic_history_pkey" for table "academic_history"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 1
student_id | course_id_equivalent | the_grade
------------+----------------------+-----------
1 | ABC001 | B
2 | ABC001 | A
(2 rows)