Select all products with prices in EAV scheme - sql

It's a generic SQL question. I have a query that selects all rows from the Products with extra information from other tables. The problem is that it's EAV scheme and the last relation is somehow reversed and joins break.
The requirements are:
list all Products with Groups
if 'Price' in Values table is available, add this information
explicitly: if 'Price' is not available, there should be Product row without Price information
Products can't be repeated
Additionally: DISTINCT is out of question
I have a working query (below) that uses a subquery to filter values, but I need to get rid of it. I can only uses joins.
SQL Fiddle : http://sqlfiddle.com/#!15/0576b/8
create table Products (
id int primary key,
groupId int,
code varchar(100)
);
create table Groups (
id int primary key,
code varchar(100)
);
create table Values (
id int primary key,
productId int,
typeId int,
value varchar(100)
);
create table ValueTypes (
id int primary key,
name varchar(100)
);
insert into Products values (1, 1, 'P1');
insert into Products values (2, 2, 'P2');
insert into Groups values (1, 'C1');
insert into Groups values (2, 'C2');
insert into Values values (1, 1, 1, 'Aqua');
insert into Values values (2, 1, 2, '$5');
insert into ValueTypes values (1, 'Name');
insert into ValueTypes values (2, 'Price');
My query that works:
SELECT *
FROM Products p
INNER JOIN Groups g ON p.groupId = g.id
LEFT JOIN Values v ON v.productId = p.id AND v.typeId = (SELECT id FROM ValueTypes WHERE name = 'Price')
The question is, how to rewrite it to use joins instead of subquery?
I tried:
SELECT *
FROM Products p
INNER JOIN Groups g ON p.groupId = g.id
LEFT JOIN Values v ON v.productId = p.id
LEFT JOIN ValueTypes vt ON vt.id = v.typeId AND vt.name = 'Price'
But it returns repeated product P1. INNER JOIN on the other hand omits Products without a 'Price' value.

Define JOIN order explicitly
SELECT *
FROM Products p
INNER JOIN Groups g ON p.groupId = g.id
LEFT JOIN ( --product price
SELECT productId, value
FROM Values v2
JOIN ValueTypes vt ON vt.id = v2.typeId AND vt.name = 'Price'
) v ON v.productId = p.id;
EDIT
2 more JOIN versions. Optimizer produces different plan as compared to above version
SELECT *
FROM ValueTypes vt
INNER JOIN Products p ON vt.name = 'Price'
INNER JOIN Groups g ON p.groupId = g.id
LEFT JOIN Values v ON v.productId = p.id AND v.typeId = vt.id;
or slightly different v3
SELECT *
FROM (SELECT id FROM ValueTypes WHERE name = 'Price') vt
CROSS JOIN Products p
INNER JOIN Groups g ON p.groupId = g.id
LEFT JOIN Values v ON v.productId = p.id AND v.typeId = vt.id

You can do as this:
SELECT p.id, p.code, g.id, g.code,
max(case when vt.name='Price'
then v.value
else null end) as price
FROM Products p
LEFT JOIN Groups g ON p.groupId = g.id
LEFT JOIN Values v ON v.productId = p.id
LEFT join ValueTypes vt ON v.typeId = vt.id
group by p.id, p.code, g.id, g.code
See it working here: http://sqlfiddle.com/#!15/0576b/36

Related

How to select passengers that never flew to a city

I will send the Database Description in an Image.
I tried this Select but I'm afraid that this isn't right
SELECT t.type , a.ICAOId , a.name , ci.id , c.ISOAlpha2ID , p.docReference , ti.docReference , ti.number , p.name , p.surname
FROM dbo.AirportType t
INNER JOIN dbo.Airport a ON t.type = a.type
INNER JOIN dbo.City ci ON a.city = ci.id
INNER JOIN dbo.Country c ON ci.ISOalpha2Id = c.ISOalpha2Id
INNER JOIN dbo.Passenger p ON c.ISOalpha2Id = p.nationality
INNER JOIN dbo.Ticket ti ON p.docReference = ti.docReference
WHERE NOT ci.id = 'Tokyo'
Can you please help to get this right?
enter image description here
You could make a list of the passengers that HAVE flown to the city then use that as a subquery to select the ones not in the list
I am just going to make an example of how it should be done
Subquery:
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
Now you just put that into another query that selects the elements not in it
SELECT * FROM passenger
WHERE id not in (
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
WHERE c.name= 'tokyo'
)
Notice I didn't use your attribute names, you will have to change those.
This was a bit simplified version of what you will have to do because the city is not directly in your tickets table. So you will also have to join tickets, with coupons, and flights to get the people that have flown to a city. But from there it is the same.
Overall I believe this should help you get what you have to do.
A minimal reproducible example is not provided.
Here is a conceptual example, that could be easily extended to a real scenario.
SQL
-- DDL and sample data population, start
DECLARE #passenger TABLE (passengerID INT PRIMARY KEY, passenger_name VARCHAR(20));
INSERT #passenger (passengerID, passenger_name) VALUES
(1, 'Anna'),
(2, 'Paul');
DECLARE #city TABLE (cityID INT PRIMARY KEY, city_name VARCHAR(20));
INSERT #city (cityID, city_name) VALUES
(1, 'Miami'),
(2, 'Orldando'),
(3, 'Tokyo');
-- Already visited cities
DECLARE #passenger_city TABLE (passengerID INT, cityID INT);
INSERT #passenger_city (passengerID, cityID) VALUES
(1, 1),
(2, 3);
-- DDL and sample data population, end
SELECT * FROM #passenger;
SELECT * FROM #city;
SELECT * FROM #passenger_city;
;WITH rs AS
(
SELECT c.passengerID, b.cityID
FROM #passenger AS c
CROSS JOIN #city AS b -- get all possible combinations of passengers and cities
EXCEPT -- filter out already visited cities
SELECT passengerID, cityID FROM #passenger_city
)
SELECT c.*, b.city_name
FROM rs
INNER JOIN #passenger AS c ON c.passengerID = rs.passengerID
INNER JOIN #city AS b ON b.cityID = rs.cityID
ORDER BY c.passenger_name, b.city_name;
Output
passengerID
passenger_name
city_name
1
Anna
Orldando
1
Anna
Tokyo
2
Paul
Miami
2
Paul
Orldando

How to use a where clause with a join

Table Student
Id
Name
1
Name
2
Name
Table Classes
Id
Name
Number
1
class1
2
2
class1
3
and a join table
StudentId
ClassId
Income
1
1
5
2
2
6
and the query
select sum(scr.Income)
from Student s
left join StudentClassRelation scr on scr.classId = s.Id
left join Class c on c.Id = scr.classId and c.Number > 4
I want to move c.Number > 4 on the second line in order to receive only the income from classes with number greater than 4. I cannot change the query significantly because it is a part of a bigger one. I need to filter StudentClassRelation somehow
CREATE TABLE Student
(
Id INT PRIMARY KEY,
Name VARCHAR(100),
);
CREATE TABLE Class
(
Id INT PRIMARY KEY,
Name VARCHAR(100),
Number int
);
CREATE TABLE StudentClassRelation
(
Income int,
StudentID INT NOT NULL,
ClassID INT NOT NULL,
FOREIGN KEY (StudentID) REFERENCES Student(Id),
FOREIGN KEY (ClassID) REFERENCES Class(Id),
UNIQUE (StudentID, ClassID)
);
INSERT INTO Class (Id, Name, Number)
VALUES (1, '1', 1), (2, '5', 5)
INSERT INTO Student (Id, Name)
VALUES (1, '1'), (2, '5')
INSERT INTO StudentClassRelation (StudentID, ClassID, Income)
VALUES (1, 1, 10), (2, 1, 20), (2, 2, 5)
To keep the join to StudentClassRelation as a LEFT JOIN and apply a filter based on a column in the class table, you could use LEFT JOIN to a subquery that uses an INNER JOIN on Class, e.g.
SELECT SUM(scr.Income)
FROM Student AS s
LEFT JOIN
( SELECT scr.StudentId, scr.Income
FROM StudentClassRelation AS scr
INNER JOIN Class AS c
ON c.Id = scr.ClassId
WHERE c.Number > 4
) AS scr
ON scr.StudentId = s.Id;
You can however rewrite this in a less verbose way as follows:
SELECT SUM(scr.Income)
FROM Student AS s
LEFT JOIN (StudentClassRelation AS scr
INNER JOIN Class AS c
ON c.Id = scr.ClassId
AND c.Number > 4)
ON scr.StudentId = s.Id;
The execution plans of the two are exactly the same, which is "better" would be entirely personal preference as to which you find more readable. You spend more time reading code than writing it, so the least verbose method does not equate to the best method.
Also worth noting, that if this was the entire query, there is no difference at all from simply using INNER JOIN throughout
SELECT SUM(scr.Income)
FROM Student AS s
INNER JOIN StudentClassRelation AS scr
ON scr.classId = s.Id
INNER JOIN Class AS c
ON c.Id = scr.classId
AND c.Number > 4;
But since you have mentioned that this is part of a larger query, I will assume that there is more to it than just the posted sample, and there is in fact a need for the LEFT JOIN to StudentClassRelation, e.g. If you were to do something like:
SELECT s.Id, Income = SUM(scr.Income)
FROM Student AS s
LEFT JOIN (StudentClassRelation AS scr
INNER JOIN Class AS c
ON c.Id = scr.ClassId
AND c.Number > 4)
ON scr.StudentId = s.Id
GROUP BY s.Id;
This would yield different results to the version with an INNER JOIN to both tables
SELECT s.Id, Income = SUM(scr.Income)
FROM Student AS s
INNER JOIN StudentClassRelation AS scr
ON scr.classId = s.Id
INNER JOIN Class AS c
ON c.Id = scr.classId
AND c.Number > 4
GROUP BY s.Id;
Here a workaround (2options) if u dont want to change the rest of your query
select sum(scr.Income) from Student s
left join StudentClassRelation scr on scr.classId = s.Id
and EXISTS(select top 1 1 from Class where Class.Id = scr.classId and Class.Number > 4)
--> 5
select sum(scr.Income) from Student s
left join StudentClassRelation scr on scr.classId = s.Id
and (select top 1 Class.Number from Class where Class.Id = scr.classId) > 4
--> 5
Is it working as expected?
EDIT: my bad, the solution provided by #GarethD is way better (using inner join instead of left join)

Figure out the total number of people in an overlapping er database

I am trying to find:
the total number of doctors which aren't patients
the total number of patients which aren't doctors
the total number of people who are both patients and doctors
I can't seem to get the correct answer.
SQL:
CREATE TABLE persons (
id integer primary key,
name text
);
CREATE TABLE doctors (
id integer primary key,
type text,
FOREIGN KEY (id) REFERENCES persons(id)
);
CREATE TABLE patients (
id integer primary key,
suffering_from text,
FOREIGN KEY (id) REFERENCES persons(id)
);
INSERT INTO persons (id, name) VALUES
(1, 'bob'), (2, 'james'), (3, 'bill'), (4, 'mark'), (5, 'chloe');
INSERT INTO doctors (id, type) VALUES
(2, 'family doctor'), (3, 'eye doctor'), (5, 'family doctor');
INSERT INTO patients (id, suffering_from) VALUES
(1, 'flu'), (2, 'diabetes');
Select statement:
select count(d.id) as total_doctors, count(pa.id) as total_patients, count(d.id) + count(pa.id) as both_doctor_and_patient
from persons p
JOIN doctors d
ON p.id = d.id
JOIN patients pa
ON p.id = pa.id;
http://www.sqlfiddle.com/#!17/98ae9/2
One option uses left joins from persons and conditional aggrgation:
select
count(dr.id) filter(where pa.id is null) cnt_doctor_not_patient,
count(pa.id) filter(where dr.id is null) cnt_patient_not_doctor,
count(pa.id) filter(where do.id is not null) cnt_patient_and_doctor,
count(*) filter(where dr.id is null and pa.id is null) cnt_persons_not_dotor_nor_patient
from persons pe
left join doctors dr on dr.id = pe.id
left join patients pa on pa.id = pe.id
As a bonus, this gives you an opportunity to count the persons that are neither patient nor doctor. If you don't need that information, then a full join is simpler, and does not require bringing the persons table:
select
count(dr.id) filter(where pa.id is null) cnt_doctor_not_patient,
count(pa.id) filter(where dr.id is null) cnt_patient_not_doctor,
count(pa.id) filter(where dr.id is not null) cnt_patient_and_doctor
from doctors dr
full join patients pa using (id)
You can simply solve this using LEFT JOIN like:
--Aren't doctors:
SELECT count(*) from persons as A left join doctors as B on A.id=B.id where B.id is null
--Aren't patients:
SELECT count(*) from persons as A left join patients as B on A.id=B.id where B.id is null
--Both:
SELECT
(SELECT count(*) from persons as A left join patients as B on A.id=B.id where B.id is not null) +
(SELECT count(*) from persons as A left join doctors as B on A.id=B.id where B.id is not null)
AS summ
Here a CTE alternative:
with doc_not_pat
as(
select count(*) as Doc_Not_Pat
from doctors d
where not exists (select 1 from patients p where p.id = d.id)
),
pat_not_doc as(
select count(*) as Pat_Not_Doc
from patients p
where not exists ( select 1 from doctors d where d.id = p.id)
),
pat_and_doc as(
select count(*) as Pat_And_Doc
from patients p
where exists (select 1 from doctors d where d.id = p.id)
)
select (select Doc_Not_Pat
from doc_not_pat dcp) as Doc_Not_Pat,
(select Pat_Not_Doc
from pat_not_doc) as Pat_Not_Doc,
(select Pat_And_Doc
from pat_and_doc) as Pat_And_Doc

SQL - identifying rows for a value in one table, where all joined rows only has a specific value

IN SQL Server, I have a result set from a joined many:many relationship.
Considering Products linked to Orders via a link table ,
Table - Products
ID
ProductName
Table - Orders
ID
OrderCountry
LinkTable OrderLines (columns not shown)
I'd like to be able to filter these results to show only the results where for an entity from one table, all the values in the other table only have a given value in a particular column. In terms of my example, for each product, I want to return only the joined rows when all the orders they're linked to are for country 'uk'
So if my linked result set is
productid, product, orderid, ordercountry
1, Chocolate, 1, uk
2, Banana, 2, uk
2, Banana, 3, usa
3, Strawberry, 4, usa
I want to filter so that only those products that have only been ordered in the UK are shown (i.e. Chocolate). I'm sure this should be straight-forward, but its Friday afternoon and the SQL part of my brain has given up for the day...
You could do something like this, where first you get all products only sold in one country, then you proceed to get all orders for those products
with distinctProducts as
(
select LinkTable.ProductID
from Orders
inner join LinkTable on LinkTable.OrderID = Orders.ID
group by LinkTable.ProductID
having count(distinct Orders.OrderCountry) = 1
)
select pr.ID as ProductID
,pr.ProductName
,o.ID as OrderID
,o.OrderCountry
from Products pr
inner join LinkTable lt on lt.ProductID = pr.ID
inner join Orders o on o.ID = lt.OrderID
inner join distinctProducts dp on dp.ProductID = pr.ID
where o.OrderCountry = 'UK'
In the hope that some of this may be generally reusable:
;with startingRS (productid, product, orderid, ordercountry) as (
select 1, 'Chocolate', 1, 'uk' union all
select 2, 'Banana', 2, 'uk' union all
select 2, 'Banana', 3, 'usa' union all
select 3, 'Strawberry', 4, 'usa'
), countryRankings as (
select productid,product,orderid,ordercountry,
RANK() over (PARTITION by productid ORDER by ordercountry) as FirstCountry,
RANK() over (PARTITION by productid ORDER by ordercountry desc) as LastCountry
from
startingRS
), singleCountry as (
select productid,product,orderid,ordercountry
from countryRankings
where FirstCountry = 1 and LastCountry = 1
)
select * from singleCountry where ordercountry='uk'
In the startingRS, you put whatever query you currently have to generate the intermediate results you've shown. The countryRankings CTE adds two new columns, that ranks the countries within each productid.
The singleCountry CTE reduces the result set back down to those results where country ranks as both the first and last country within the productid (i.e. there's only a single country for this productid). Finally, we query for those results which are just from the uk.
If you want, for example, all productid rows with a single country of origin, you just skip this last where clause (and you'd get 3,strawberry,4,usa in your results also)
So is you've got a current query that looks like:
select p.productid,p.product,o.orderid,o.ordercountry
from product p inner join order o on p.productid = o.productid --(or however these joins work for your tables)
Then you'd rewrite the first CTE as:
;with startingRS (productid, product, orderid, ordercountry) as (
select p.productid,p.product,o.orderid,o.ordercountry
from product p inner join order o on p.productid = o.productid
), /* rest of query */
Hmm. Based on Philip's earlier approach, try adding something like this to exclude rows where there's been the same product ordered in another country:
SELECT pr.Id, pr.ProductName, od.Id, od.OrderCountry
from Products pr
inner join LinkTable lt
on lt.ProductId = pr.ID
inner join Orders od
on od.ID = lt.OrderId
where
od.OrderCountry = 'UK'
AND NOT EXISTS
(
SELECT
*
FROM
Products MatchingProducts
inner join LinkTable lt
on lt.ProductId = MatchingProducts.ID
inner join Orders OrdersFromOtherCountries
on OrdersFromOtherCountries.ID = lt.OrderId
WHERE
MatchingProducts.ID = Pr.ID AND
OrdersFromOtherCountries.OrderCountry != od.OrderCountry
)
;WITH mytable (productid,ordercountry)
AS
(SELECT productid, ordercountry
FROM Orders od INNER JOIN LinkTable lt ON od.orderid = lt.OrderId)
SELECT * FROM mytable
INNER JOIN dbo.Products pr ON pr.productid = mytable.productid
WHERE pr.productid NOT IN (SELECT productid FROM mytable
GROUP BY productid
HAVING COUNT(ordercountry) > 1)
AND ordercountry = 'uk'
SELECT pr.Id, pr.ProductName, od.Id, od.OrderCountry
from Products pr
inner join LinkTable lt
on lt.ProductId = pr.ID
inner join Orders od
on od.ID = lt.OrderId
where od.OrderCountry = 'UK'
This probably isn't the most efficient way to do this, but ...
SELECT p.ProductName
FROM Product p
WHERE p.ProductId IN
(
SELECT DISTINCT ol.ProductId
FROM OrderLines ol
INNER JOIN [Order] o
ON ol.OrderId = o.OrderId
WHERE o.OrderCountry = 'uk'
)
AND p.ProductId NOT IN
(
SELECT DISTINCT ol.ProductId
FROM OrderLines ol
INNER JOIN [Order] o
ON ol.OrderId = o.OrderId
WHERE o.OrderCountry != 'uk'
)
TestData
create table product
(
ProductId int,
ProductName nvarchar(50)
)
go
create table [order]
(
OrderId int,
OrderCountry nvarchar(50)
)
go
create table OrderLines
(
OrderId int,
ProductId int
)
go
insert into Product VALUES (1, 'Chocolate')
insert into Product VALUES (2, 'Banana')
insert into Product VALUES (3, 'Strawberry')
insert into [order] values (1, 'uk')
insert into [order] values (2, 'uk')
insert into [order] values (3, 'usa')
insert into [order] values (4, 'usa')
insert into [orderlines] values (1, 1)
insert into [orderlines] values (2, 2)
insert into [orderlines] values (3, 2)
insert into [orderlines] values (4, 3)
insert into [orderlines] values (3, 2)
insert into [orderlines] values (3, 3)

Optimize JOIN SQL query with additional SELECT

I need a query which will select just one (GROUP BY phi.id_product) image for each product and this image have to be the one with the highest priority (inner SELECT with ORDER BY statement).
The priority is stored in N:M relation table called product_has_image
I've created a query, but it tooks about 3 seconds to execute and I need to optimize it. Here it is:
SELECT p.*, i.id AS imageid
FROM `product` p JOIN `category` c on c.`id` = p.`id_category`
LEFT OUTER JOIN (SELECT id_product, id_image FROM
`product_has_image` ORDER BY priority DESC) phi ON p.id = phi.id_product
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
Indexes which I find to be important in this query are:
image (PRIMARY id)
product_has_image (PRIMARY id_product, id_image; INDEX id_product; INDEX id_image)
product (PRIMARY id, id_category; INDEX id_category)
category (PRIMARY id; INDEX id_parent)
Most of the time takes joining the tables using the SELECT statement which is required for sorting.
Joining with LEFT JOIN [product_has_image] phi ON p.id = phi.id_product is much faster, but doesn't assign the image with the highest priority.
Any help would be appreciated.
Reformatted for sensibility . . .
SELECT p.*, i.id AS imageid
FROM `product` p
INNER JOIN `category` c on (c.`id` = p.`id_category`)
LEFT OUTER JOIN (SELECT id_product, id_image
FROM `product_has_image`
ORDER BY priority DESC) phi
ON (p.id = phi.id_product)
LEFT OUTER JOIN `image` i
ON (phi.id_image = i.id)
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
Without seeing an execution plan or DDL, I'd guess (shudder) that the problem is likely to be the inner select/sort. If you create a view
create view highest_priority_images as
select id_product, max(priority)
from product_has_image
group by id_product
Then you can replace that inner SELECT...ORDER BY with a SELECT...INNER JOIN on that view. That would reduce the cardinality, so I'd expect it to run faster.
Posting DDL would help.
I would probably try to do it like this:
SELECT p.*, i.id AS imageid
FROM `product` p
INNER JOIN `category` c ON c.id = p.id_category
/* a list of `id_product`s with their highest priorities
from `product_has_image` */
LEFT OUTER JOIN (
SELECT id_product, MAX(priority) AS max_priority
FROM `product_has_image`
GROUP BY id_product
) m ON p.id = m.id_product
/* now joining `product_has_image` again, using
m.`max_priority` for additional filtering */
LEFT OUTER JOIN `product_has_image` phi
ON p.id = phi.id_product AND m.max_priority = phi.priority
/* if you only select `id` from `image`, you can use
phi.`id_image` instead and remove this join */
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE c.id_parent = 2 OR c.id = 2
Can't test it now, but wouldn't it be possible to do this?
SELECT p.*, i.id AS imageid
FROM `product` p JOIN `category` c on c.`id` = p.`id_category`
LEFT JOIN `product_has_image` phi ON p.id = phi.id_product
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
ORDER BY phi.priority DESC
Do it in a regular join and order by phi.priority.