Postgres array query - sql

(The following is a highly simplified description of my problem. The company policy does not allow me to describe the actual scenario in any detail.)
The DB tables involved are:
PRODUCTS:
ID Name
---------
1 Ferrari
2 Lamborghini
3 Volvo
CATEGORIES:
ID Name
----------
10 Sports cars
20 Safe cars
30 Red cars
PRODUCTS_CATEGORIES
ProductID CategoryID
-----------------------
1 10
1 30
2 10
3 20
LOCATIONS:
ID Name
------------
100 Sports car store
200 Safe car store
300 Red car store
400 All cars r us
LOCATIONS_CATEGORIES:
LocationID CategoryID
------------------------
100 10
200 20
300 30
400 10
400 20
400 30
Note that the locations are not directly connected to the products, only the categories. The customer should be able to see a list of locations that can provide all the product categories that the products they want to buy belong to. So, for example:
A customer wants to buy a Ferrari. This would be available from stores in categories 10 or 30. This gives us stores 100, 300 and 400 but not 200.
However, if a customer wants to buy a Volvo and a Lamborghini this would be available from stores in categories 10 and 20. Which only gives us store 400.
Another customer wants to buy a Ferrari and a Volvo. This they could get from a store in either categories 10 + 20 (sporty and safe) or categories 30 + 20 (red and safe).
What I need is a postgres query that takes a number of products and returns the locations where all of them can be found. I got started with arrays and the <# operator but got lost quickly. Here follows some example SQL that attempts to get stores where a Ferrari and a Lamborghini can be bought. It does not work correctly since it requires the locations to satisfy all the categories that all the selected cars belong to. It returns location 400 only but should return locations 400 and 100.
SELECT l.* FROM locations l
WHERE
(SELECT array_agg(DISTINCT(categoryid)) FROM products_categories WHERE productid IN (1,2))
<#
(SELECT array_agg(categoryid) FROM locations_categories WHERE locationid = l.id);
I hope my description makes sense.

Here is the query. You should insert a list of selected cars Ids pc.ProductId in (1,3) and in the end you should correct condition to selected cars count so if you select 1 and 3 you should write HAVING COUNT(DISTINCT pc.ProductId) = 2 if you select 3 cars then there have to be 3. This condition in HAVING give you condition that ALL cars are in these Locations:
SELECT Id FROM Locations l
JOIN Locations_Categories lc on l.Id=lc.LocationId
JOIN Products_Categories pc on lc.CategoryId=pc.CategoryID
where pc.ProductId in (1,3)
GROUP BY l.id
HAVING COUNT(DISTINCT pc.ProductId) = 2
Sqlfiddle demo
For example for one car it will be:
SELECT Id FROM Locations l
JOIN Locations_Categories lc on l.Id=lc.LocationId
JOIN Products_Categories pc on lc.CategoryId=pc.CategoryID
where pc.ProductId in (1)
GROUP BY l.id
HAVING COUNT(DISTINCT pc.ProductId) = 1
Only Ferrary demo
Volvo and a Lamborghini demo

(This basically elaborates on #valex's answer, though I didn't realise that until I posted; please accept #valex's not this one).
This can be done using only joins and aggregation.
Build a join tree mapping locations to products, as normal. Then join it with the list of desired products (one-column values rows) and filter the join to only matching product names. You now have one row with the location of a product wherever that product can be found.
Now group by location and return locations where the number of products present is equal to the number we're looking for (for ALL). For ANY we omit the HAVING filter because any location row returned by the join is what we want.
So:
WITH wantedproducts(productname) AS (VALUES('Volvo'), ('Lamborghini'))
SELECT l."ID"
FROM locations l
INNER JOIN locations_categories lc ON (l."ID" = lc."LocationID")
INNER JOIN categories c ON (c."ID" = lc."CategoryID")
INNER JOIN products_categories pc ON (pc."CategoryID" = c."ID")
INNER JOIN products p ON (p."ID" = pc."ProductID")
INNER JOIN wantedproducts wp ON (wp.productname = p."Name")
GROUP BY l."ID"
HAVING count(DISTINCT p."ID") = (SELECT count(*) FROM wantedproducts);
is what you want, basically.
For "stores with any of the wanted products" queries, drop the HAVING clause.
You an also ORDER BY the aggregate if you want to show stores with any match but sort based on number of matches.
You can also add a string_agg(p."Name") to the SELECT values-list if you want to list products that can be found at that store.
If you want your input to be an array rather than a values-list, just replace the VALUES (...) with SELECT unnest($1) and pass your array as the parameter $1, or write it literally in place of $1.

ANSWER IN PROGRESS: (I will add answers as I get the required result)
For your first question:
A customer wants to buy a Ferrari. This would be available from stores
in categories 10 or 30. This gives us stores 100, 300 and 400 but not
200.
SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE p.id = 1
Second question:
However, if a customer wants to buy a Volvo and a Lamborghini this
would be available from stores in categories 10 and 20. Which only
gives us store 400.
SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE l.id in (select id
from locations loc
join locations_categories locat1
on loc.id = locat1.LocationId
join locations_categories locat2
on loc.id = locat2.LocationId
where locat1.CategoryId = 10
AND locat2.categoryId = 20)
RESULT FOR SECOND QUESTION USING INTERSECT:
intersect will cross reference all the stores where 1 product can be found each time:
SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE p.id = 2
INTERSECT
SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE p.id = 3
For every new product you add a new INTERSECT statement and create a new select with the wanted product id
SQLFIDDLE: http://sqlfiddle.com/#!15/ce97d/15

Well, it's hard to totally avoid arrays here but I think I found a solution with less array functions.
Instead of selecting needed locations, I excluded non valid ones.
WITH needed_categories AS (
SELECT p."ID", array_agg(pc."CategoryID") AS at_least_one_should_match
FROM Products p
JOIN Products_Categories pc ON p."ID" = pc."ProductID"
WHERE p."ID" IN (1, 3)
GROUP BY p."ID"
),
not_valid_locations AS (
SELECT DISTINCT lc."LocationID", unnest(nc.at_least_one_should_match)
FROM Locations_Categories lc
JOIN needed_categories nc ON NOT ARRAY[lc."CategoryID"] && nc.at_least_one_should_match
EXCEPT
SELECT * FROM Locations_Categories
)
SELECT *
FROM Locations
WHERE "ID" NOT IN (
SELECT "LocationID" FROM not_valid_locations
);
Here is the SQLFiddle: http://sqlfiddle.com/#!15/e138d/78
This works but I'm still trying to avoid double seq scan of the Location_Categories.
The fact that cars can belong to multiple categories is a bit tricky, I solved this using arrays but I'm trying to get rid of these too.

Related

Left outer join with count, on 3 tables not returning all rows from left table

I have these 3 tables:
Areas - id, name
Persons - id, area_id
Special_Persons - id_person, date
I'd like to produce a list of all Areas, followed by a count of Special Persons in each area, including Areas with no Special Persons.
If I do a left join of Areas and Persons, like this:
select a.id as idArea, count(p.id) as count
from areas a
left join persons p on p.area_id = a.id
group by a.id;
This works just fine; Areas that have no Persons show up, and have a count of 0.
What I am not clear on is how to do the same thing with the special_persons table, which currently only has 2 entries, both in the same Area.
I have tried the following:
select a.id as idArea, count(sp.id_person) as count
from special_persons sp, areas a
left join persons p on p.area_id = a.id
where p.area_id = a.id
and sp.id_person = p.id
group by a.id;
And it only returns 1 row, with the Area that happens to have 2 Special Persons in it, and a count of 2.
To continue getting a list of all areas, do I need to use a sub-query? Another join? I'm not sure how to go about it.
You can add another left join to the Special_Persons table:
select a.id as idArea, count(p.id), count(sp.id_person)
from areas a
left join persons p on p.area_id = a.id
left join special_persons sp on sp.id_person = p.id
group by a.id;

SQL to provide characteristics mandatory for each entity

I need a SQL for the following requirement.
Table A:
Product Charactersitic Value
A Age 100
A Number 100
B Age 100
Table B:
Charactersitic
Age
Number
Output Should be:
Product Charactersitic Value
A Age 100
A Number 100
B Age 100
**B Number Null**
This means each characteristic is Mandatory for each Product.
You need a cross join and a left join:
select p.product,
c.characteristic,
p2.value
from (
select distinct product
from products
) p
cross join characteristics c
left join products p2 on p2.product = p.product
and p2.characteristic = c.characteristic;
CROSS JOIN produces every single possible combination of product and characteristic and then you can LEFT JOIN the products with the result of the cross join to get the desired result.
Demo

WHERE Clause for One-To-Many Association

I have two tables Products and ProductProperties.
Products
name - string
description - text
etc etc
ProductProperties
product_id - integer
property_id - integer
There is also a table Properties which basically stores the list of property names and their attributes
How can I implement a SQL command that finds a product with the property_ids (A or B or C) AND (X or Y or Z)
I've got upto here:
SELECT DISTINCT "products".*
FROM "products"
INNER JOIN "product_properties" ON "product_properties"."product_id" = "products"."id" AND "product_properties"."deleted_at" IS NULL
WHERE "products"."deleted_at" IS NULL
AND (product_properties.property_id IN ('504, 506, 403'))
AND (product_properties.property_id IN ('520, 501, 502'))
But it doesn't really work since it's looking for a Product Property which has both values 504 and 520, which will never exist.
Would appreciate some help!
You need to define intermediate resultsets on a property group basis:
SELECT DISTINCT p.*
FROM products p
JOIN product_properties groupA ON groupA.product_id = p.id AND groupA.deleted_at IS NULL AND groupA.property_id IN ('504')
JOIN product_properties groupB ON groupB.product_id = p.id AND groupB.deleted_at IS NULL AND groupB.property_id IN ('520')
WHERE p.deleted_at IS NULL
You see, you detected the problem yourself very nicely: "But it doesn't really work since it's looking for a Product Property which has both values 504 and 520, which will never exist."
Indeed, recordsets are immutable within a query, all single criteria applied to them are applied all at once. You need to duplicate each table and apply individual criteria to them.
One method uses exists or in:
select p.*
from products p
where p.id in (select pp.product_id
from product_properties pp
where pp.propertyid in ('504', '520')
);
This saves you from having to use distinct in the outer query.
If, perchance, you really mean finding the products that have all the properties, then a join and group by work:
select p.*
from products p join
product_properties pp
on p.id = pp.product_id
where pp.propertyid in ('504', '520')
group by p.id -- yes, this is allowed in Postgres
having count(*) = 2;
Hi try this queries i just thinking about it so i didn't try any of them check i got the idea i want to do
SELECT DISTINCT "products".*
FROM products pr
WHERE id IN
(
SELECT product_id FROM ProductProperties WHERE property_id IN (504,520)
GROUP BY product_id
HAVING Count(*) = 2
) AND "products"."deleted_at" IS NULL
SELECT DISTINCT "products".*
FROM products pr, INNER JOIN (
SELECT product_id,count(*) as nbr FROM ProductProperties WHERE property_id IN (504,520)
GROUP BY product_id
) as temp ON temp.product_id = pr.id
WHERE "products"."deleted_at" IS NULL AND temp.nbr = 2
and also you can check this one as well ( you can use also the join in where clause instead of using INNER JOIN)
SELECT DISTINCT products.* FROM products as p
INNER JOIN product_properties as p1 ON p1.product_id = p.id
INNER JOIN product_properties as p2 ON p2.product_id = p.id
WHERE p.deleted_at IS NULL
AND p1.property_id = '504' AND p1.deleted_at IS NULL
AND p2.property_id = '520' AND p2.deleted_at IS NULL

Get correct result from mysql query

I have the following tables:
**products** which has these fields: id,product,price,added_date
**products_to_categories** which has these fields: id,product_id,category_id
**adverts_to_categories** -> id,advert_id,category_id
**adverts** which has these fields: id,advert_name,added_date,location
I can not execute sql that will return to me all products that are from category 14 and that are owned by advert located in London. So I have 4 tables and 2 conditions - to be from category 14 and the owner of the product to be from London. I tried many variants to execute sql but none of the results were correct.. Do I need to use Join and which Join - left, right, full? How the correct sql will look like? thank you in advance for your help and sorry for boring you :)
This is what I have tried so far:
SELECT p.id, product, price, category_id,
p.added_date, adverts.location, adverts.id
FROM products p,
products_to_categories ptc,
adverts,
adverts_to_categories ac
WHERE ptc.category_id = "14"
AND ptc.product_id=p.id
AND ac.advert_id=adverts.id
AND adverts.location= "London"
pretty basic logic
Select * from Products P
INNER JOIN Products_To_Categories PTC ON P.ID = PTC.Product_ID
INNER JOIN Adverts_to_Categories ATC ON ATC.Category_Id = PTC.Category_ID
INNER JOIN Adverts AD on AD.ID = ATC.Advert_ID
WHERE PTC.Category_ID = 14 and AD.Location = 'LONDON'
you would only need a LEFT or right join IF you wanted records from a table which didn't exist in other tables.
so for example, if you wanted all products even if a records even those without a category, then you would use a LEFT Join instead of inner.
The following statement should return all columns from the product table in category with id 14 and all adverts located in London:
select p.* from products p
inner join products_to_categories pc on p.id = pc.product_id
inner join adverts_to_categories ac on pc.category_id = ac.category_id
inner join adverts a on a.id = ac.advert_id
where pc.category_id = 14
and ac.location = 'London';
You should remember to add an index to the column location if you are doing these string-based queries very often.

Join two tables where all child records of first table match all child records of second table

I have four tables: Customer, CustomerCategory, Limit, and LimitCategory. A customer can be in multiple categories and a limit can also have multiple categories. I need to write a query that will return the customer name and limit amount where ALL the customers categories match ALL the limit categories.
I'm guessing it would be similar to the answer here, but I can't seem to get it right. Thanks!
Edit - Here's what the tables look like:
tblCustomer
customerId
name
tblCustomerCategory
customerId
categoryId
tblLimit
limitId
limit
tblLimitCategory
limitId
categoryId
I THINK you're looking for:
SELECT *
FROM CustomerCategory
LEFT OUTER JOIN Customer
ON CustomerCategory.CustomerId = Customer.Id
INNER JOIN LimitCategory
ON CustomerCategory.CategoryId = LimitCategory.CategoryId
LEFT OUTER JOIN Limit
ON Limit.Id = LimitCategory.LimitId
Updated!
Thanks to Felix for pointing out a flaw in my existing solution (3 years after I originally posted it, hehe). After looking at it again, I think this might be correct. Here I'm getting (1) the customers and limits with matching categories, plus the number of matching categories, (2) the number of categories per customer, (3) the number of categories per limit, (4) I then ensure the number of categories for customer and limits is the same as the number of the matches between the customers and limits:
UNTESTED!
select
matches.name,
matches.limit
from (
select
c.name,
c.customerId,
l.limit,
l.limitId,
count(*) over(partition by cc.customerId, lc.limitId) as matchCount
from tblCustomer c
join tblCustomerCategory cc on c.customerId = cc.customerId
join tblLimitCategory lc on cc.categoryId = lc.categoryId
join tblLimit l on lc.limitId = l.limitId
) as matches
join (
select
cc.customerId,
count(*) as categoryCount
from tblCustomerCategory cc
group by cc.customerId
) as customerCategories
on matches.customerId = customerCategories.customerId
join (
select
lc.limitId,
count(*) as categoryCount
from tblLimitCategory lc
group by lc.limitId
) as limitCategories
on matches.limitId = limitCategories.limitId
where matches.matchCount = customerCategories.categoryCount
and matches.matchCount = limitCategories.categoryCount
I don't know if this will work or not, just a thought i had and i can't test it, I'm sures theres a nicer way! don't be too harsh :)
SELECT
c.customerId
, l.limitId
FROM
tblCustomer c
CROSS JOIN
tblLimit l
WHERE NOT EXISTS
(
SELECT
lc.limitId
FROM
tblLimitCategory lc
WHERE
lc.limitId = l.id
EXCEPT
SELECT
cc.categoryId
FROM
tblCustomerCategory cc
WHERE
cc.customerId = l.id
)