SQL Join IN Query with AND? - sql

I have the following tables:
Option
-------
id - int
name - varchar
Product
---------
id - int
name -varchar
ProductOptions
------------------
id - int
product_id - int
option_id - int
If I have a list of option ids, how can I retrieve all Products that have all the options with the list of ids that I have? I know that SQL "IN" will use an "OR" i need an "AND". Thank you!

If the ids are not repeated, you can retrieve the ids of the options you need and count how many they are. Then, you just
SELECT product_id FROM ProductOptions
WHERE option_id IN ( OPTIONS )
GROUP BY product_id
HAVING COUNT(product_id) = NEEDED;
Without the GROUP BY, if you had five option ids, and product 27 had fifteen options among which there were those five, you'd get five rows with the same product_id. The GROUP BY joins those rows. Since you want ALL options, and options have all different IDs, asking "rows with all of them" is equivalent to asking "rows with as many options as the desired option set size".
Plus, you run the big query on ProductOptions only, which should be really fast.

One way to approach queries like this is with a group by and having clause. It is best if you start with your list of required options in a list:
with list as (
select <optionname1> as optionname union all
select <optionname2 union all . . .
)
select ProductId
from list l left outer join
Options o
on l.optionname = o.name
ProductOptions po join
on po.option_id = o.option_id left outer join
group by ProductId
having count(distinct o.optionname) = count(distinct l.optionname)
This guarantees that all are in the list. By the way, I used SQL Server syntax to generate the list.
If you have the list in other formats, such as a delimited string, there are other options. There are other possibilities depending on the database you are using. However, the above idea should work on any database, with two caveats:
The with statement might just become a subquery in the FROM clause where "list" is.
The method for creating the list (a table of constants) varies among databases

If you have list of Id's you have basically only 2 options.
- Either to call as many selects as many id's you have
- or you have to use IN () or OR.
The usage of IN would be recommended however, as calling one statement is usually more performant (moreover in case you have index on all your id columns, no table scan should be required).
I'd use following statement:
select Product.* from Product, Option, ProductOption where Option.id IN ( 1, 2, ... ) and option.id = ProductOption.option_id and Product.product_id = Product.id
One more remark, why do you have id column in ProductOptions table? It's useless from my point of view, you should rather have composite primary key from columns product_id and option_id (as this couple is unique).

Will this work?:
select p.id, p.name
from Product as p inner join
ProductOptions as po on p.id=po.product_id
where po.option_id in (1,2,3,4)

Related

Return all data when grouping on a field

I have the following 2 tables (there are more fields in the real tables):
create table publisher(id serial not null primary key,
name text not null);
create table product(id serial not null primary key,
name text not null,
publisherRef int not null references publisher(id));
Sample data:
insert into publisher (id,name) values (1,'pub1'),(2,'pub2'),(3,'pub3');
insert into product (name,publisherRef) values('p1',1),('p2',2),('p3',2),('p4',2),('p5',3),('p6',3);
And I would like the query to return:
name, numProducts
pub2, 3
pub3, 2
pub1, 1
A product is published by a publisher. Now I need a list of name, id of all publishers which have at least one product, ordered by the total number of products each publisher has.
I can get the id of the publishers ordered by number of products with:
select publisherRef AS id, count(*)
from product
order by count(*) desc;
But I also need the name of each publisher in the result. I thought I could use a subquery like:
select *
from publisher
where id in (
select publisherRef
from product
order by count(*) desc)
But the order of rows in the subquery is lost in the outer SELECT.
Is there any way to do this with a single sql query?
SELECT pub.name, pro.num_products
FROM (
SELECT publisherref AS id, count(*) AS num_products
FROM product
GROUP BY 1
) pro
JOIN publisher pub USING (id)
ORDER BY 2 DESC;
db<>fiddle here
Or (since the title mentions "all data") return all columns of the publisher with pub.*. After products have been aggregated in the subquery, you are free to list anything in the outer SELECT.
This only lists publisher which
have at least one product
And the result is ordered by
the total number of products each publisher has
It's typically faster to aggregate the "n"-table before joining to the "1"-table. Then use an [INNER] JOIN (not a LEFT JOIN) to exclude publishers without products.
Note that the order of rows in an IN expression (or items in the given list - there are two syntax variants) is insignificant.
The column alias in publisherref AS id is totally optional to use the simpler USING clause for identical column names in the following join condition.
Aside: avoid CaMeL-case names in Postgres. Use unquoted, legal, lowercase names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?

SQL Server question - subqueries in column result with a join?

I have a distinct list of part numbers from one table. It is basically a table that contains a record of all the company's part numbers. I want to add columns that will pull data from different tables but only pertaining to the part number on that row of the distinct part list.
For example: if I have part A, B, C from the unique part list I want to add columns for Purchase quantity, repair quantity, loan quantity, etc... from three totally unique tables.
So it's almost like I need 3 subqueries that will sum of that data from the different tables for each part.
Can anybody steer me in the direction of how to do this? Please and thank you so much!
One method is correlated subqueries. Something like this:
select p.*,
(select count(*)
from purchases pu
where pu.part_id = p.part_id
) as num_purchases,
(select count(*)
from repairs r
where r.part_id = p.part_id
) as num_repairs,
(select count(*)
from loans l
where l.part_id = p.part_id
) as num_loans
from parts p;
Another option is joins with aggregation before the join. Or lateral joins (which are quite similar to correlated subqueries).

How do I write an SQL query whose where clause is contained in another query?

I have two tables, one for Customer and one for Item.
In Customer, I have a column called "preference", which stores a list of hard criteria expressed as a WHERE clause in SQL e.g. "item.price<20 and item.category='household'".
I'd like a query that works like this:
SELECT * FROM item WHERE interpret('SELECT preference FROM customer WHERE id = 1')
Which gets translated to this:
SELECT * FROM item WHERE item.price < 20 and item.category = 'household'
Example data model:
CREATE TABLE customer (
cust_id INT
preference VARCHAR
);
CREATE TABLE item (
item_id INT
price DECIMAL(19,4)
category VARCHAR
);
# Additional columns omitted for brevity
I've looked up casting and dynamic SQL but I haven't been able to figure out how I should do this.
I'm using PostgreSQL 9.5.1
I'm going to assume that preference is the same as my made up item_id column. You may need to tweak it to fit your case. For future questions like this it may pay to give us the table structures you are working with, it really helps us out!
What you are asking for is a subquery:
select *
from item
where item_id in (select
preference
from
customer
where id = 1)
What I would suggest though is a join:
select item.*
from item
join customer on item.item_id = customer.preference
where item.price<20 and
item.category='household'
customer.id = 1
I decided to change my schema instead, as it was getting pretty messy to store the criteria in preferences in that manner.
I restricted the kinds of preferences that could be specified, then stored them as columns in Customer.
After that, all the queries I wanted could be expressed as joins.

Best way to filter union of data from 2 tables by value in shared 3rd table

For sake of example, let's assume 3 tables:
PHYSICAL_ITEM
ID
SELLER_ID
NAME
COST
DIMENSIONS
WEIGHT
DIGITAL_ITEM
ID
SELLER_ID
NAME
COST
DOWNLOAD_PATH
SELLER
ID
NAME
Item IDs are guaranteed unique across both item tables. I want to select, in order, with a type label, all item IDs for a given seller. I've come up with:
Query A
SELECT PI.ID AS ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM PI
JOIN SELLER S ON PI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
UNION
SELECT DI.ID AS ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM DI
JOIN SELLER S ON DI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
ORDER BY ID
Query B
SELECT ITEM.ID, ITEM.TYPE
FROM (SELECT ID, SELLER_ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM
UNION
SELECT ID, SELLER_ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM) AS ITEM
JOIN SELLER ON ITEM.SELLER_ID = SELLER.ID
WHERE SELLER.NAME = 'name'
ORDER BY ITEM.ID
Query A seems like it would be the most efficient, but it also looks unnecessarily duplicative (2 table joins to the same table, 2 where clauses on the same table column). Query B looks cleaner in a way to me (no duplication), but it also looks much less efficient, since it has a subquery. Is there a way to get the best of both worlds, so to speak?
In both cases, replace the union with union all. Union unnecessarily removes duplicates.
I would expect Query A to be more efficient, because the optimizer has more information when doing the join (although I think Oracle is pretty good with using indexes even after a union). In addition, the first query reduces the amount of data before the union.
This is, however, only an opinion. The real test is to time the two queries -- multiple times to avoid cache fill delays -- to see which is better.

SQL query to search an unique ID that can be in three different tables

I have three tables that control products, colors and sizes. Products can have or not colors and sizes. Colors can or not have sizes.
product color size
------- ------- -------
id id id
unique_id id_product (FK from product) id_product (FK from version)
stock unique_id id_version (FK from version)
title stock unique_id
stock
The unique_id column, that is present in all tables, is a serial type (autoincrement) and its counter is shared with the three tables, basically it works as a global unique ID between them.
It works fine, but i am trying to increase the query performance when i have to select some fields based in the unique_id.
As i don't know where is the unique_id that i am looking for, i am using UNION, like below:
select title, stock
from product
where unique_id = 10
UNION
select p.title, c.stock
from color c
join product p on c.id_product = p.id
where c.unique_id = 10
UNION
select p.title, s.stock
from size s
join product p on s.id_product = p.id
where s.unique_id = 10;
Is there a better way to do this? Thanks for any suggestion!
EDIT 1
Based on #ErwinBrandstetter and #ErikE answers i decided to use the below query. The main reasons is:
1) As unique_id has indexes in all tables, i will get a good performance
2) Using the unique_id i will find the product code, so i can get all columns i need using a another simple join
SELECT
p.title,
ps.stock
FROM (
select id as id_product, stock
from product
where unique_id = 10
UNION
select id_product, stock
from color
where unique_id = 10
UNION
select id_product, stock
from size
where unique_id = 10
) AS ps
JOIN product p ON ps.id_product = p.id;
PL/pgSQL function
To solve the problem at hand, a plpgsql function like the following should be faster:
CREATE OR REPLACE FUNCTION func(int)
RETURNS TABLE (title text, stock int) LANGUAGE plpgsql AS
$BODY$
BEGIN
RETURN QUERY
SELECT p.title, p.stock
FROM product p
WHERE p.unique_id = $1; -- Put the most likely table first.
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, c.stock
FROM color c
JOIN product p ON c.id_product = p.id
WHERE c.unique_id = $1;
END;
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, s.stock
FROM size s
JOIN product p ON s.id_product = p.id
WHERE s.unique_id = $1;
END IF;
END;
$BODY$;
Updated function with table-qualified column names to avoid naming conflicts with OUT parameters.
RETURNS TABLE requires PostgreSQL 8.4, RETURN QUERY requires version 8.2. You can substitute both for older versions.
It goes without saying that you need to index the columns unique_id of every involved table. id should be indexed automatically, being the primary key.
Redesign
Ideally, you can tell which table from the ID alone. You could keep using one common sequence, but add 100000000 for the first table, 200000000 for the second and 300000000 for the third - or whatever suits your needs. This way, the least significant part of the number is easily distinguishable.
A plain integer spans numbers from -2147483648 to +2147483647, move to bigint if that's not enough for you. I would stick to integer IDs, though, if possible. They are smaller and faster than bigint or text.
CTEs (experimental!)
If you cannot create a function for some reason, this pure SQL solution might do a similar trick:
WITH x(uid) AS (SELECT 10) -- provide unique_id here
, a AS (
SELECT title, stock
FROM x, product
WHERE unique_id = x.uid
)
, b AS (
SELECT p.title, c.stock
FROM x, color c
JOIN product p ON c.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM a)
AND c.unique_id = x.uid
)
, c AS (
SELECT p.title, s.stock
FROM x, size s
JOIN product p ON s.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM b)
AND s.unique_id = x.uid
)
SELECT * FROM a
UNION ALL
SELECT * FROM b
UNION ALL
SELECT * FROM c;
I am not sure whether it avoids additional scans like I hope. Would have to be tested. This query requires at least PostgreSQL 8.4.
Upgrade!
As I just learned, the OP runs on PostgreSQL 8.1.
Upgrading alone would speed up the operation a lot.
Query for PostgreSQL 8.1
As you are limited in your options, and a plpgsql function is not possible, this function should perform better than the one you have. Test with EXPLAIN ANALYZE - available in v8.1.
SELECT title, stock
FROM product
WHERE unique_id = 10
UNION ALL
SELECT p.title, ps.stock
FROM product p
JOIN (
SELECT id_product, stock
FROM color
WHERE unique_id = 10
UNION ALL
SELECT id_product, stock
FROM size
WHERE unique_id = 10
) ps ON ps.id_product = p.id;
I think it's time for a redesign.
You have things that you're using as bar codes for items that are basically all the same in one respect (they are SerialNumberItems), but have been split into multiple tables because they are different in other respects.
I have several ideas for you:
Change the Defaults
Just make each product required to have one color "no color" and one size "no size". Then you can query any table you want to find the info you need.
SuperType/SubType
Without too much modification you could use the supertype/subtype database design pattern.
In it, there is a parent table where all the distinct detail-level identifiers live, and the shared columns of the subtype tables go in the supertype table (the ways that all the items are the same). There is one subtype table for each different way that the items are distinct. If mutual exclusivity of the subtype is required (you can have a Color or a Size but not both), then the parent table is given a TypeID column and the subtype tables have an FK to both the ParentID and the TypeID. Looking at your design, in fact you would not use mutual exclusivity.
If you use the pattern of a supertype table, you do have the issue of having to insert in two parts, first to the supertype, then the subtype. Deleting also requires deleting in reverse order. But you get a great benefit of being able to get basic information such as Title and Stock out of the supertype table with a single query.
You could even create schema-bound views for each subtype, with instead-of triggers that convert inserts, updates, and deletes into operations on the base table + child table.
A Bigger Redesign
You could completely change how Colors and Sizes are related to products.
First, your patterns of "has-a" are these:
Product (has nothing)
Product->Color
Product->Size
Product->Color->Size
There is a problem here. Clearly Product is the main item that has other things (colors and sizes) but colors don't have sizes! That is an arbitrary assignment. You may as well have said that Sizes have Colors--it doesn't make a difference. This reveals that your table design may not be best, as you're trying to model orthogonal data in a parent-child type of relationship. Really, products have a ColorAndSize.
Furthermore, when a product comes in colors and sizes, what does the uniqueid in the Color table mean? Can such a product be ordered without a size, having only a color? This design is assigning a unique ID to something that (it seems to me) should never be allowed to be ordered--but you can't find this information out from the Color table, you have to compare the Color and Size tables first. It is a problem.
I would design this as: Table Product. Table Size listing all distinct sizes possible for any product ever. Table Color listing all distinct colors possible for any product ever. And table OrderableProduct that has columns ProductId, ColorID, SizeID, and UniqueID (your bar code value). Additionally, each product must have one color and one size or it doesn't exist.
Basically, Color and Size are like X and Y coordinates into a grid; you are filling in the boxes that are allowable combinations. Which one is the row and which the column is irrelevant. Certainly, one is not a child of the other.
If there are any reasonable rules, in general, about what colors or sizes can be applied to various sub-groups of products, there might be utility in a ProductType table and a ProductTypeOrderables table that, when creating a new product, could populate the OrderableProduct table with the standard set—it could still be customized but might be easier to modify than to create anew. Or, it could define the range of colors and sizes that are allowable. You might need separate ProductTypeAllowedColor and ProductTypeAllowedSize tables. For example, if you are selling T-shirts, you'd want to allow XXXS, XXS, XS, S, M, L, XL, XXL, XXXL, and XXXXL, even if most products never use all those sizes. But for soft drinks, the sizes might be 6-pack 8oz, 24-pack 8oz, 2 liter, and so on, even if each soft drink is not offered in that size (and soft drinks don't have colors).
In this new scheme, you only have one table to query to find the correct orderable product. With proper indexes, it should be blazing fast.
Your Question
You asked:
in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance?
Any column or set of columns that you use to repeatedly look up data must have an index! Any other pattern will result in a full table scan each time, which will be awful performance. I am sure that these indexes will make your queries lightning fast as it will take only one leaf-level read per table.
There's an easier way to generate unique IDs using three separate auto_increment columns. Just prepend a letter to the ID to uniquify it:
Colors:
C0000001
C0000002
C0000003
Sizes:
S0000001
S0000002
S0000003
...
Products:
P0000001
P0000002
P0000003
...
A few advantages:
You don't need to serialize creation of ids across tables to ensure uniqueness. This will give better performance.
You don't actually need to store the letter in the table. All IDs in the same table start with the same letter, so you only need to store the number. This means that you can use an ordinary auto_increment column to generate your IDs.
If you have an ID you only need to check the first character to see which table it can be found in. You don't even need to make a query to the database if you just want to know whether it's a product ID or a size ID.
A disadvantage:
It's no longer a number. But you can get around that by using 1,2,3 instead of C,S,P.
Your query will be pretty much efficient, as long as you have an index on unique_id, on every table and indices on the joining columns.
You could turn those UNION into UNION ALL but the won't be any differnce on performance, for this query.
This is a bit different. I don't understand the intended behaviour if stocks exists in more than one of the {product,color,zsize} tables. (UNION will remove duplicates, but for the row-as-a-whole, eg the {product_id,stock} tuples. That makes no sense to me. I just take the first. (Note the funky self-join!!)
SELECT p.title
, COALESCE (p2.stock, c.stock, s.stock) AS stock
FROM product p
LEFT JOIN product p2 on p2.id = p.id AND p2.unique_id = 10
LEFT JOIN color c on c.id_product = p.id AND c.unique_id = 10
LEFT JOIN zsize s on s.id_product = p.id AND s.unique_id = 10
WHERE COALESCE (p2.stock, c.stock, s.stock) IS NOT NULL
;