Select rows that have a specific set of items associated with them through a junction table - sql

Suppose we have the following schema:
CREATE TABLE customers(
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE items(
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE customers_items(
customerid INTEGER,
itemid INTEGER,
FOREIGN KEY(customerid) REFERENCES customers(id),
FOREIGN KEY(itemid) REFERENCES items(id)
);
Now we insert some example data:
INSERT INTO customers(name) VALUES ('John');
INSERT INTO customers(name) VALUES ('Jane');
INSERT INTO items(name) VALUES ('duck');
INSERT INTO items(name) VALUES ('cake');
Let's assume that John and Jane have id's of 1 and 2 and duck and cake also have id's of 1 and 2.
Let's give a duck to John and both a duck and a cake to Jane.
INSERT INTO customers_items(customerid, itemid) VALUES (1, 1);
INSERT INTO customers_items(customerid, itemid) VALUES (2, 1);
INSERT INTO customers_items(customerid, itemid) VALUES (2, 2);
Now, what I want to do is to run two types of queries:
Select names of customers who have BOTH a duck and a cake (should return 'Jane' only).
Select names of customers that have a duck and DON'T have a cake (should return 'John' only).

For the two type of queries listed, you could use the EXISTS clause. Below is an example query using the exists clause:
SELECT cust.name
from customers AS cust
WHERE EXISTS (
SELECT 1
FROM items
INNER JOIN customers_items ON items.id = customers_items.itemid
INNER JOIN customers on customers_items.customerid = cust.id
WHERE items.name = 'duck')
AND NOT EXISTS (
SELECT 1
FROM items
INNER JOIN customers_items ON items.id = customers_items.itemid
INNER JOIN customers on customers_items.customerid = cust.id
WHERE items.name = 'cake')
Here is a working example: http://sqlfiddle.com/#!6/3d362/2

Related

Select records that do not have at least one child element

How can I make an SQL query to select records that do not have at least one child element?
I have 3 tables: article (~40K rows), calendar (~450K rows) and calendar_cost (~500K rows).
It is necessary to select such entries of the article table:
there are no entries in the calendar table,
if there are entries in the calendar table, then all of them should not have any entries in the calendar_cost table.
create table article (
id int PRIMARY KEY,
name varchar
);
create table calendar (
id int PRIMARY KEY,
article_id int REFERENCES article (id) ON DELETE CASCADE,
number varchar
);
create table calendar_cost (
id int PRIMARY KEY,
calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
cost_value numeric
);
insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');
insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');
insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);
As a result, "Article 2" (condition 2) and "Article 3" (condition 1) will suit us.
My SQL query is very slow (the second condition part), how can I do it optimally? Is it possible to do without "union all" operator?
-- First condition
select a.id from article a
left join calendar c on a.id = c.article_id
where c.id is null
union all
-- Second condition
select a.id from article a
where id not in(
select aa.id from article aa
join calendar c on aa.id = c.article_id
join calendar_cost cost on c.id = cost.calendar_id
where aa.id = a.id limit 1
)
UPDATE
This is how you can fill my tables with random data for about the same amount of data. The #Bohemian query is very fast, and the rest are very slow. But as soon as I applied 2 indexes, as #nik advised, all queries began to be executed very, very quickly!
do $$
declare
article_id int;
calendar_id bigint;
i int; j int;
begin
create table article (
id int PRIMARY KEY,
name varchar
);
create table calendar (
id serial PRIMARY KEY,
article_id int REFERENCES article (id) ON DELETE CASCADE,
number varchar
);
create INDEX ON calendar(article_id);
create table calendar_cost (
id serial PRIMARY KEY,
calendar_id bigint REFERENCES calendar (id) ON DELETE CASCADE,
cost_value numeric
);
create INDEX ON calendar_cost(calendar_id);
for article_id in 1..45000 loop
insert into article (id, name) values (article_id, 'Article ' || article_id);
for i in 0..floor(random() * 25) loop
insert into calendar (article_id, number) values (article_id, 'Number ' || article_id || '-' || i) returning id into calendar_id;
for j in 0..floor(random() * 2) loop
insert into calendar_cost (calendar_id, cost_value) values (calendar_id, round((random() * 100)::numeric, 3));
end loop;
end loop;
end loop;
end $$;
#Bohemian
Planning Time: 0.405 ms
Execution Time: 1196.082 ms
#nbk
Planning Time: 0.702 ms
Execution Time: 165.129 ms
#Chris Maurer
Planning Time: 0.803 ms
Execution Time: 800.000 ms
#Stu
Planning Time: 0.446 ms
Execution Time: 280.842 ms
So which query to choose now as the right one is a matter of taste.
No need to split the conditions: The only condition you need to check for is that there are no calendar_cost rows whatsoever, which is the case if there are no calendar rows.
The trick is to use outer joins, which still return the parent table but have all null values when there is no join. Further, count() does not count null values, so requiring that the count of calendar_cost is zero is all you need.
select a.id
from article a
left join calendar c on c.article_id = a.id
left join calendar_cost cost on cost.calendar_id = c.id
group by a.id
having count(cost.calendar_id) = 0
See live demo.
If there are indexes on the id columns (the usual case), this query will perform quite well given the small table sizes.
Your second condition should start just like your first one: find all the calendar entries without calendar cost and only afterwards join it to article.
select a.id
from article a
Inner Join (
Select article_id
From calendar c left join calendar_cost cc
On c.id=cc.calendar_id
Where cc.calendar_id is null
) cnone
On a.id = cnone.article_id
This approach is based on the thought that calendar entries without calendar_cost is relatively rare compared to all the calendar entries.
Your query is not valid as IN clauses don't support LIMIT
Adding some indexes on article_id and calender_id
Will help the performance
As you can see in the query plan
create table article (
id int PRIMARY KEY,
name varchar(100)
);
create table calendar (
id int PRIMARY KEY,
article_id int REFERENCES article (id) ON DELETE CASCADE,
number varchar(100)
,index(article_id)
);
create table calendar_cost (
id int PRIMARY KEY,
calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
cost_value numeric
,INDEX(calendar_id)
);
insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');
insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');
insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);
Records: 3 Duplicates: 0 Warnings: 0
Records: 3 Duplicates: 0 Warnings: 0
Records: 2 Duplicates: 0 Warnings: 2
select a.id from article a
left join calendar c on a.id = c.article_id
where c.id is null
id
3
-- First condition
EXPLAIN
select a.id from article a
left join calendar c on a.id = c.article_id
where c.id is null
union all
-- Second condition
select a.id from article a
JOIN (
select aa.id from article aa
join calendar c on aa.id = c.article_id
join calendar_cost cost on c.id = cost.calendar_id
LIMIT 1
) t1 ON t1.id <> a.id
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
a
null
index
null
PRIMARY
4
null
3
100.00
Using index
1
PRIMARY
c
null
ref
article_id
article_id
5
fiddle.a.id
3
33.33
Using where; Not exists; Using index
2
UNION
<derived3>
null
system
null
null
null
null
1
100.00
null
2
UNION
a
null
index
null
PRIMARY
4
null
3
66.67
Using where; Using index
3
DERIVED
cost
null
index
calendar_id
calendar_id
5
null
2
100.00
Using where; Using index
3
DERIVED
c
null
eq_ref
PRIMARY,article_id
PRIMARY
4
fiddle.cost.calendar_id
1
100.00
Using where
3
DERIVED
aa
null
eq_ref
PRIMARY
PRIMARY
4
fiddle.c.article_id
1
100.00
Using index
fiddle
Try the following using a combination of exists criteria.
Usually, with supporting indexes, this is more performant than simply joining tables as it offers a short-circuit to get out as soon as a match is found, where as joining typically filters after all rows are joined.
select a.id
from article a
where not exists (
select * from calendar c
where c.article_id = a.id
)
or (exists (
select * from calendar c
where c.article_id = a.id
)
and not exists (
select * from calendar_cost cc
where cc.calendar_id in (select id from calendar c where c.article_id = a.id)
)
);

SQL insert multiple rows depending on number of rows returned from subquery

I have 3 SQL tables Companies, Materials and Suppliers as follows.
Tables
I need to insert values into Suppliers from a list which contains Company Name and Material Name as headers. However, I have multiple companies with the same name in the database and i need to add a new value into suppliers for each one of those companies.
For e.g. my list containes values ['Wickes','Bricks'] . I have this sql below to add a new entry into the suppliers table but since i have multple companies called 'Wickes' I'll get an error as the subquery will return more than 1 value.
INSERT INTO Suppliers(Id,CompanyId,MaterialId) VALUES (NEWID(), (SELECT Id FROM Companies WHERE Name = 'Wickes'),(SELECT Id FROM Materials WHERE Name = 'Bricks'))
Whats the best solution to get the Id of all the companies there are called 'Wickes' and then add vales into the suppliers table with that Id and the relevant material Id of 'Bricks'.
You can use INSERT () SELECT.. rather than INSERT () VALUES(), e.g
INSERT INTO Suppliers (Id, CompanyId, MaterialId)
SELECT NEWID(), c.Id, m.Id
FROM Companies AS c
CROSS JOIN Materials AS m
WHERE c.Name = 'Wickes'
AND m.Name = 'Bricks';
This will ensure that if you have multiple companies/materials with the same name, all permutations are inserted. Example on db<>fiddle
Although based on your image Suppliers.Id is an integer, so I think NEWID() is not doing what you think it is here, you probably just want:
INSERT INTO Suppliers (CompanyId, MaterialId)
SELECT c.Id, m.Id
FROM Companies AS c
CROSS JOIN Materials AS m
WHERE c.Name = 'Wickes'
AND m.Name = 'Bricks';
And let IDENTITY take care of the Id column in Suppliers.
As a further aside, I've also just noted that MaterialId is VARCHAR in your Suppliers table, that looks like an error if it is supposed to reference the integer Id column in Materials.
INSERT INTO Suppliers(Id,CompanyId,MaterialId) VALUES (NEWID(), (SELECT distict Id FROM Companies WHERE Name = 'Wickes'),(SELECT distict Id FROM Materials WHERE Name = 'Bricks'));
If I understand rightly Companies are the suppliers and the Suppliers table is the one that says where you can buy each material from.
Why do you have duplicates? Do you have an account for different branches of Wickes for example? If they are really duplicates and you don't care which one you use a function like MIN() will do the job of ensuring that only one value is returned. If you have duplicates it would be a good idea to find a way of disactivating all except one. This will make is simpler for you everytime you want to deal with the supplier: minimum orders, chasing overdue orders, payments etc.
Also Companies.ID and Materials.ID should be foreign keys of the Suppliers table. It is also a good idea for the ID column to be auto-incrementing, which makes it easier to add new products as you do not need to specify the ID column.
If you cannot or do not want to modify the id column to auto-incrementing IDENTITY you can continue to use NEWID().
create table Companies(
id INT PRIMARY KEY NOT NULL IDENTITY,
name VARCHAR(25));
create table Materials(
id INT PRIMARY KEY NOT NULL IDENTITY,
name VARCHAR(25));
create table Suppliers(
id INT PRIMARY KEY NOT NULL IDENTITY,
CompanyId INT FOREIGN KEY REFERENCES Companies(id),
MaterialId INT FOREIGN KEY REFERENCES Materials(id)
);
INSERT INTO Companies (name) VALUES ('Wickes');
INSERT INTO Materials (name) VALUES ('Bricks');
INSERT INTO Suppliers ( CompanyId, MaterialId)
SELECT c.Id, M.Id
FROM Companies AS c
CROSS JOIN Materials AS m
WHERE c.Name = 'Wickes'
AND m.Name = 'Bricks';
SELECT * FROM Companies;
SELECT * FROM Materials;
SELECT * FROM Suppliers;
GO
id | name
-: | :-----
1 | Wickes
id | name
-: | :-----
1 | Bricks
id | CompanyId | MaterialId
-: | --------: | ---------:
1 | 1 | 1
db<>fiddle here
INSERT INTO SUPPLIERS
(ID, COMPANYID, MATERIALID)
VALUES (NEWID(),
(SELECT DISTINCT ID FROM COMPANIES WHERE NAME = 'Wickes'), (SELECT DISTINCT ID FROM MATERIALS WHERE NAME = 'Bricks'))

SQL Join Table as JSON data

I am trying to join reviews and likes onto products, but it seems, for some reason that the output of "reviews" column is duplicated by the length of another foreign table, likes, the output length of "reviews" is
amount of likes * amount of reviews
I have no idea why this is happening
My desired output is that the "reviews" column contains an array of JSON data such that one array is equal to one row of a related review
Products
Title Image
----------------------
Photo photo.jpg
Book book.jpg
Table table.jpg
Users
Username
--------
Admin
John
Jane
Product Likes
product_id user_id
---------------------
1 1
1 2
2 1
2 3
Product Reviews
product_id user_id review
-------------------------------------
1 1 Great Product!
1 2 Looks Great
2 1 Could be better
This is the query
SELECT "products".*,
array_to_json(array_agg("product_review".*)) as reviews,
EXISTS(SELECT * FROM product_like lk
JOIN users u ON u.id = "lk"."user_id" WHERE u.id = 4
AND "lk"."product_id" = products.id) AS liked,
COUNT("product_like"."product_id") AS totalLikes from "products"
LEFT JOIN "product_review" on "product_review"."product_id" = "products"."id"
LEFT JOIN "product_like" on "product_like"."product_id" = "products"."id"
group by "products"."id"
Query to create schema and insert data
CREATE TABLE products
(id SERIAL, title varchar(50), image varchar(50), PRIMARY KEY(id))
;
CREATE TABLE users
(id SERIAL, username varchar(50), PRIMARY KEY(id))
;
INSERT INTO products
(title,image)
VALUES
('Photo', 'photo.jpg'),
('Book', 'book.jpg'),
('Table', 'table.jpg')
;
INSERT INTO users
(username)
VALUES
('Admin'),
('John'),
('Jane')
;
CREATE TABLE product_review
(id SERIAL, product_id int NOT NULL, user_id int NOT NULL, review varchar(50), PRIMARY KEY(id), FOREIGN KEY (product_id) references products, FOREIGN KEY (user_id) references users)
;
INSERT INTO product_review
(product_id, user_id, review)
VALUES
(1, 1, 'Great Product!'),
(1, 2, 'Looks Great'),
(2, 1, 'Could be better')
;
CREATE TABLE product_like
(id SERIAL, product_id int NOT NULL, user_id int NOT NULL, PRIMARY KEY(id), FOREIGN KEY (product_id) references products, FOREIGN KEY (user_id) references users)
;
INSERT INTO product_like
(product_id, user_id)
VALUES
(1, 1),
(1, 2),
(2, 1),
(2, 3)
fiddle with the schema and query:
http://sqlfiddle.com/#!15/dff2c/1
Thanks in advance
The reason you are getting multiple results is because of the one-to-many relationships between product_id and product_review and product_like causing duplication of rows prior to aggregation. To work around that, you need to perform the aggregation of those tables in subqueries and join the derived tables instead:
SELECT "products".*,
"pr"."reviews",
EXISTS(SELECT * FROM product_like lk
JOIN users u ON u.id = "lk"."user_id" WHERE u.id = 4
AND "lk"."product_id" = products.id) AS liked,
COALESCE("pl"."totalLikes", 0) AS totalLikes
FROM "products"
LEFT JOIN (SELECT product_id, array_to_json(array_agg("product_review".*)) AS reviews
FROM "product_review"
GROUP BY product_id) "pr" on "pr"."product_id" = "products"."id"
LEFT JOIN (SELECT product_id, COUNT(*) AS "totalLikes"
FROM "product_like"
GROUP BY product_id) "pl" on "pl"."product_id" = "products"."id"
Output:
id title image reviews liked totallikes
1 Photo photo.jpg [{"id":1,"product_id":1,"user_id":1,"review":"Great Product!"},{"id":2,"product_id":1,"user_id":2,"review":"Looks Great"}] f 2
2 Book book.jpg [{"id":3,"product_id":2,"user_id":1,"review":"Could be better"}] f 2
3 Table table.jpg f 0
Demo on dbfiddle

How to join tables , query sql

I have the following recipe database tables and their data .
how i can find the total number of recipes , number of category for each Ingredient ? I used many joining methods but i couldn't do the query i want.
I need as out put
Ingredient id , how much recipe we can find this ingredient in, how much categories we can find this ingredients in.
This is my attempt
The problem with my attempt is if i had ingredient who is in one recipe and in two categories
It will show in results that this ingredient is in 2 recipe , 2 categories
SELECT
I.idIng,COUNT(CI.idcat)AS "CAT FOR ING" , COUNT(RI.idRecipe )AS "RECETTE
FOR
ING"
FROM
INGREDIENT I
LEFT JOIN
Ingredient_Recipe RI ON I.idIng = ri.idIng
RIGHT JOIN
Ingredient_Catigory CI ON I.idIng = CI.idIng
GROUP BY
I.idIng
ORDER BY
I.idIng;
Below is some test data:
Here i created my category its will have many to many relation with Ingredient.
-- creating table cat category
CREATE TABLE category (
idCat INT NOT NULL PRIMARY KEY,
nomCat INT NOT NULL
);
Here i created my Recipe table its will have many to many relation with Ingredient.
-- creating table cat Recipe
CREATE TABLE Recipe(
idRecipe INT NOT NULL PRIMARY KEY,
nameRecipe VARCHAR2(30) NOT NULL
);
This is my Ingredient table that will be link with both Recipe ,category .
-- creating table Ingredient
CREATE TABLE Ingredient(
idIng INT NOT NULL PRIMARY KEY ,
nameIng VARCHAR2(30) NOT NULL
);
This is the intermediate table between Ingredient ,category because the relation is many to many.
-- creating table Ingredient_category
CREATE TABLE Ingredient_category (
idIng INT NOT NULL,
idCat INT NOT NULL,
CONSTRAINT idIng_FK FOREIGN KEY (idIng) REFERENCES Ingredient(idIng),
CONSTRAINT idCat_FK FOREIGN KEY (idCat) REFERENCES category(idCat)
);
This is the intermediate table between Ingredient ,Recipe because the relation is many to many.
-- creating table Ingredient_Recipe
CREATE TABLE Ingredient_Recipe(
idIng INT NOT NULL,
idRecipe INT NOT NULL,
CONSTRAINT idIngRecipe_FK FOREIGN KEY (idIng) REFERENCES
Ingredient(idIng),
CONSTRAINT idRecipe_FK FOREIGN KEY (idRecipe) REFERENCES Recipe(idRecipe)
);
Here we insert the data for testing.
-- insert data into Recipe
INSERT INTO Recipe VALUES(1,'SOUP');
INSERT INTO Recipe VALUES(2,'FRIED');
INSERT INTO Recipe VALUES(3,'BURGER');
-- insert data into category
INSERT INTO category VALUES(1,'VEGES');
INSERT INTO category VALUES(2,'DAIRY');
INSERT INTO category VALUES(3,'MEAT');
INSERT INTO category VALUES(4,'ANIMAL PRODUCT');
-- insert data into Ingredient
INSERT INTO Ingredient VALUES (1,'Eggs');
INSERT INTO Ingredient VALUES (2,'milk');
INSERT INTO Ingredient VALUES (3,'Beef');
INSERT INTO Ingredient VALUES (4,'chess');
-- insert data into Ingredient_Catigory
INSERT INTO Ingredient_Catigory VALUES(1,4);
INSERT INTO Ingredient_Catigory VALUES(2,2);
INSERT INTO Ingredient_Catigory VALUES(2,4);
INSERT INTO Ingredient_Catigory VALUES(3,3);
INSERT INTO Ingredient_Catigory VALUES(3,4);
INSERT INTO Ingredient_Catigory VALUES(4,2);
INSERT INTO Ingredient_Catigory VALUES(4,4);
-- insert data into Ingredient_Recip
INSERT INTO Ingredient_Recipe VALUES (1,2);
INSERT INTO Ingredient_Recipe VALUES (1,3);
INSERT INTO Ingredient_Recipe VALUES (2,1);
INSERT INTO Ingredient_Recipe VALUES (3,3);
INSERT INTO Ingredient_Recipe VALUES (3,2);
I think you're looking for COUNT(DISTINCT value), you're almost there. Try something like this;
SELECT
I.idIng,
COUNT(DISTINCT CI.idCat)AS [Categories],
COUNT(DISTINCT RI.idRecipe)AS [Recipes]
FROM #INGREDIENT I
LEFT JOIN #Ingredient_Recipe RI ON I.idIng = ri.idIng
LEFT JOIN #Ingredient_Category CI ON I.idIng = CI.idIng
GROUP BY
I.idIng
ORDER BY
I.idIng
The results look like this;
idIng Categories Recipes
1 1 2
2 2 1
3 2 2
4 2 0
Please note, I think a bit of the spelling was incorrect in the sample data but I've corrected it on my test system (and I've used #TempTables). I've changed your RIGHT JOIN to a LEFT JOIN (as a note, I've never seen a need to use RIGHT JOIN in production code, try to avoid them).
Edit: I've just noticed that this is now an Oracle question, the query above has only been tested on SQL Server although a cursory glance at the documentation shows that the syntax should be the same for Oracle too.
select * from Ingredient_Recipe a join Ingredient_category b on a.idIng=b.idIng join Ingredient c on a.idIng=c.idIng join Recipe d on a.idRecipe=d.idRecipe join category e on b.idCat=e.idCat
try this out

SQL Simple SELECT Query

create table Person(
SSN INT,
Name VARCHAR(20),
primary key(SSN)
);
create table Car(
PlateNr INT,
Model VARCHAR(20),
primary key(PlateNr)
);
create table CarOwner(
SSN INT,
PlateNr INT,
primary key(SSN, PlateNR)
foreign key(SSN) references Person (SSN),
foreign key(PlateNr) references Car (PlateNr)
);
Insert into Person(SSN, Name) VALUES ('123456789','Max');
Insert into Person(SSN, Name) VALUES ('123456787','John');
Insert into Person(SSN, Name) VALUES ('123456788','Tom');
Insert into Car(PlateNr, Model) VALUES ('123ABC','Volvo');
Insert into Car(PlateNr, Model) VALUES ('321CBA','Toyota');
Insert into Car(PlateNr, Model) VALUES ('333AAA','Honda');
Insert into CarOwner(SSN, PlateNr) VALUES ('123456789','123ABC');
Insert into CarOwner(SSN, PlateNr) VALUES ('123456787','333AAA');
The problem I'm having is the SELECTE query I wanna make. I wan't to be able to SELECT everything from the Person and wan't the include the PlateNr of the car he's the owner of, an example:
PERSON
---------------------------------
SSN NAME Car
123456789 Max 123ABC
123456787 John 3338AAA
123456788 Tom
----------------------------------
So, I want to be able to show everything from the Person table and display the content of CarOwner aswell if the person is in fact a CarOwner. What I have so far is: "SELECT * from Person, CarOwner WHERE Person.SSN = CarOwner.SSN;". But this obviously results in only showing the person(s) that are CarOwners.
Hope I explained me well enough, Thanks.
Try this:
SELECT p.*, c.*
FROM Person p
LEFT OUTER JOIN CarOwner co
ON p.SSN = co.SSN
LEFT OUTER JOIN Car c
ON co.PlateNr = c.PlateNr
Show SQLFiddle
P.S. I've changed the type of your primary key PlateNr (in varchar and not in int)
select ssn, name, car
from Person p
LEFT OUTER JOIN CarOwner co
ON p.SSN = co.SSN
LEFT OUTER JOIN Car c
ON co.PlateNr = c.PlateNr