SQL query to search an unique ID that can be in three different tables - sql

I have three tables that control products, colors and sizes. Products can have or not colors and sizes. Colors can or not have sizes.
product color size
------- ------- -------
id id id
unique_id id_product (FK from product) id_product (FK from version)
stock unique_id id_version (FK from version)
title stock unique_id
stock
The unique_id column, that is present in all tables, is a serial type (autoincrement) and its counter is shared with the three tables, basically it works as a global unique ID between them.
It works fine, but i am trying to increase the query performance when i have to select some fields based in the unique_id.
As i don't know where is the unique_id that i am looking for, i am using UNION, like below:
select title, stock
from product
where unique_id = 10
UNION
select p.title, c.stock
from color c
join product p on c.id_product = p.id
where c.unique_id = 10
UNION
select p.title, s.stock
from size s
join product p on s.id_product = p.id
where s.unique_id = 10;
Is there a better way to do this? Thanks for any suggestion!
EDIT 1
Based on #ErwinBrandstetter and #ErikE answers i decided to use the below query. The main reasons is:
1) As unique_id has indexes in all tables, i will get a good performance
2) Using the unique_id i will find the product code, so i can get all columns i need using a another simple join
SELECT
p.title,
ps.stock
FROM (
select id as id_product, stock
from product
where unique_id = 10
UNION
select id_product, stock
from color
where unique_id = 10
UNION
select id_product, stock
from size
where unique_id = 10
) AS ps
JOIN product p ON ps.id_product = p.id;

PL/pgSQL function
To solve the problem at hand, a plpgsql function like the following should be faster:
CREATE OR REPLACE FUNCTION func(int)
RETURNS TABLE (title text, stock int) LANGUAGE plpgsql AS
$BODY$
BEGIN
RETURN QUERY
SELECT p.title, p.stock
FROM product p
WHERE p.unique_id = $1; -- Put the most likely table first.
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, c.stock
FROM color c
JOIN product p ON c.id_product = p.id
WHERE c.unique_id = $1;
END;
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, s.stock
FROM size s
JOIN product p ON s.id_product = p.id
WHERE s.unique_id = $1;
END IF;
END;
$BODY$;
Updated function with table-qualified column names to avoid naming conflicts with OUT parameters.
RETURNS TABLE requires PostgreSQL 8.4, RETURN QUERY requires version 8.2. You can substitute both for older versions.
It goes without saying that you need to index the columns unique_id of every involved table. id should be indexed automatically, being the primary key.
Redesign
Ideally, you can tell which table from the ID alone. You could keep using one common sequence, but add 100000000 for the first table, 200000000 for the second and 300000000 for the third - or whatever suits your needs. This way, the least significant part of the number is easily distinguishable.
A plain integer spans numbers from -2147483648 to +2147483647, move to bigint if that's not enough for you. I would stick to integer IDs, though, if possible. They are smaller and faster than bigint or text.
CTEs (experimental!)
If you cannot create a function for some reason, this pure SQL solution might do a similar trick:
WITH x(uid) AS (SELECT 10) -- provide unique_id here
, a AS (
SELECT title, stock
FROM x, product
WHERE unique_id = x.uid
)
, b AS (
SELECT p.title, c.stock
FROM x, color c
JOIN product p ON c.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM a)
AND c.unique_id = x.uid
)
, c AS (
SELECT p.title, s.stock
FROM x, size s
JOIN product p ON s.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM b)
AND s.unique_id = x.uid
)
SELECT * FROM a
UNION ALL
SELECT * FROM b
UNION ALL
SELECT * FROM c;
I am not sure whether it avoids additional scans like I hope. Would have to be tested. This query requires at least PostgreSQL 8.4.
Upgrade!
As I just learned, the OP runs on PostgreSQL 8.1.
Upgrading alone would speed up the operation a lot.
Query for PostgreSQL 8.1
As you are limited in your options, and a plpgsql function is not possible, this function should perform better than the one you have. Test with EXPLAIN ANALYZE - available in v8.1.
SELECT title, stock
FROM product
WHERE unique_id = 10
UNION ALL
SELECT p.title, ps.stock
FROM product p
JOIN (
SELECT id_product, stock
FROM color
WHERE unique_id = 10
UNION ALL
SELECT id_product, stock
FROM size
WHERE unique_id = 10
) ps ON ps.id_product = p.id;

I think it's time for a redesign.
You have things that you're using as bar codes for items that are basically all the same in one respect (they are SerialNumberItems), but have been split into multiple tables because they are different in other respects.
I have several ideas for you:
Change the Defaults
Just make each product required to have one color "no color" and one size "no size". Then you can query any table you want to find the info you need.
SuperType/SubType
Without too much modification you could use the supertype/subtype database design pattern.
In it, there is a parent table where all the distinct detail-level identifiers live, and the shared columns of the subtype tables go in the supertype table (the ways that all the items are the same). There is one subtype table for each different way that the items are distinct. If mutual exclusivity of the subtype is required (you can have a Color or a Size but not both), then the parent table is given a TypeID column and the subtype tables have an FK to both the ParentID and the TypeID. Looking at your design, in fact you would not use mutual exclusivity.
If you use the pattern of a supertype table, you do have the issue of having to insert in two parts, first to the supertype, then the subtype. Deleting also requires deleting in reverse order. But you get a great benefit of being able to get basic information such as Title and Stock out of the supertype table with a single query.
You could even create schema-bound views for each subtype, with instead-of triggers that convert inserts, updates, and deletes into operations on the base table + child table.
A Bigger Redesign
You could completely change how Colors and Sizes are related to products.
First, your patterns of "has-a" are these:
Product (has nothing)
Product->Color
Product->Size
Product->Color->Size
There is a problem here. Clearly Product is the main item that has other things (colors and sizes) but colors don't have sizes! That is an arbitrary assignment. You may as well have said that Sizes have Colors--it doesn't make a difference. This reveals that your table design may not be best, as you're trying to model orthogonal data in a parent-child type of relationship. Really, products have a ColorAndSize.
Furthermore, when a product comes in colors and sizes, what does the uniqueid in the Color table mean? Can such a product be ordered without a size, having only a color? This design is assigning a unique ID to something that (it seems to me) should never be allowed to be ordered--but you can't find this information out from the Color table, you have to compare the Color and Size tables first. It is a problem.
I would design this as: Table Product. Table Size listing all distinct sizes possible for any product ever. Table Color listing all distinct colors possible for any product ever. And table OrderableProduct that has columns ProductId, ColorID, SizeID, and UniqueID (your bar code value). Additionally, each product must have one color and one size or it doesn't exist.
Basically, Color and Size are like X and Y coordinates into a grid; you are filling in the boxes that are allowable combinations. Which one is the row and which the column is irrelevant. Certainly, one is not a child of the other.
If there are any reasonable rules, in general, about what colors or sizes can be applied to various sub-groups of products, there might be utility in a ProductType table and a ProductTypeOrderables table that, when creating a new product, could populate the OrderableProduct table with the standard set—it could still be customized but might be easier to modify than to create anew. Or, it could define the range of colors and sizes that are allowable. You might need separate ProductTypeAllowedColor and ProductTypeAllowedSize tables. For example, if you are selling T-shirts, you'd want to allow XXXS, XXS, XS, S, M, L, XL, XXL, XXXL, and XXXXL, even if most products never use all those sizes. But for soft drinks, the sizes might be 6-pack 8oz, 24-pack 8oz, 2 liter, and so on, even if each soft drink is not offered in that size (and soft drinks don't have colors).
In this new scheme, you only have one table to query to find the correct orderable product. With proper indexes, it should be blazing fast.
Your Question
You asked:
in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance?
Any column or set of columns that you use to repeatedly look up data must have an index! Any other pattern will result in a full table scan each time, which will be awful performance. I am sure that these indexes will make your queries lightning fast as it will take only one leaf-level read per table.

There's an easier way to generate unique IDs using three separate auto_increment columns. Just prepend a letter to the ID to uniquify it:
Colors:
C0000001
C0000002
C0000003
Sizes:
S0000001
S0000002
S0000003
...
Products:
P0000001
P0000002
P0000003
...
A few advantages:
You don't need to serialize creation of ids across tables to ensure uniqueness. This will give better performance.
You don't actually need to store the letter in the table. All IDs in the same table start with the same letter, so you only need to store the number. This means that you can use an ordinary auto_increment column to generate your IDs.
If you have an ID you only need to check the first character to see which table it can be found in. You don't even need to make a query to the database if you just want to know whether it's a product ID or a size ID.
A disadvantage:
It's no longer a number. But you can get around that by using 1,2,3 instead of C,S,P.

Your query will be pretty much efficient, as long as you have an index on unique_id, on every table and indices on the joining columns.
You could turn those UNION into UNION ALL but the won't be any differnce on performance, for this query.

This is a bit different. I don't understand the intended behaviour if stocks exists in more than one of the {product,color,zsize} tables. (UNION will remove duplicates, but for the row-as-a-whole, eg the {product_id,stock} tuples. That makes no sense to me. I just take the first. (Note the funky self-join!!)
SELECT p.title
, COALESCE (p2.stock, c.stock, s.stock) AS stock
FROM product p
LEFT JOIN product p2 on p2.id = p.id AND p2.unique_id = 10
LEFT JOIN color c on c.id_product = p.id AND c.unique_id = 10
LEFT JOIN zsize s on s.id_product = p.id AND s.unique_id = 10
WHERE COALESCE (p2.stock, c.stock, s.stock) IS NOT NULL
;

Related

How to design a query in WHERE clause of all column that contain same data value?

I have a table, the columns are:
Respondent_ID, classical, gospel, pop, kpop, country, folk, rock, metal ... (all genre of music)
there are 16 columns of different type of genre of music,
and data value is Never, Rarely, Sometimes or Very frequently
SELECT *
FROM genre_frequency
WHERE
I want to design a query which show results of all columns in the table what has the value 'Very Frequently', can anyone lend me a hand here? I'm still new to this, please help anyone...
Could put the same criteria under every genre field with OR operator - very messy. Or could use a VBA custom function.
Or could normalize data structure so you have fields: RespondentID, Genre, Frequency. A UNION query can rearrange data to this normalized structure (unpivot). There is a limit of 50 SELECT lines and there is no builder or wizard for UNION - must type or copy/paste in SQL View.
SELECT Respondent_ID, "classical" AS Genre, classical AS Frequency FROM genre_frequency
UNION SELECT Respondent_ID, "gospel", gospel FROM genre_frequency
... {continue for additional genre columns};
Now use that query like a table in subsequent queries. Just cannot edit data.
SELECT * FROM qryUNION WHERE Frequency="Very frequently";
UNION query can perform slowly with very large dataset. Probably would be best to redesign table. Could save this rearranged data to a table. If you want to utilize lookup tables for Genre and Frequency in order to save ID keys instead of full descriptive text, that can also be accommodated in redesign.
You should normalize your schema. This one has the problem that it requires you to alter the table whenever you want to add or remove a genre.
You must have at least three tables:
Table Respondent: Respondent_ID (PK), Name, etc.
Table Genre: Genre_ID (PK), Name
Table Respondent_Genre: Respondent_ID (PK, FK), Genre_ID (PK, FK), Frequency
This also easily allows you to alter the name of a genre or to add additional attributes to a genre like sub-genre or an annotation like (1930–present).
Optionally, you could also have a lookup table for Frequencies and then include the Frequency_ID in Respondent_Genre instead the Frequency as text.
Then you can write a query like this
SELECT r.Name, g.Name, rg.Frequency
FROM
(Respondent r
INNER JOIN Respondent_Genre rg
ON r.Respondent_ID = rg.Respondent_ID)
INNER JOIN Genre g
ON rg.Genre_ID = g.Genre_ID
WHERE
rg.Frequency = 'Very Frequently'

Implementing LIMIT against rows of single table while using LEFT JOIN

Given a fairly stereotypical scenario with an item table referenced by an images table holding multiple images for the same item, I'm trying to figure out how to retrieve a specific number of items, while collecting all of the image rows.
The setup is trivial, and looks like:
CREATE TABLE items (
id INTEGER PRIMARY KEY,
...
...
)
CREATE TABLE images (
item_id INTEGER PRIMARY KEY,
url STRING,
...
...
)
LIMIT 20 clamps the maximum number of rows in the entire result set, and incidentally truncates mid-way through an item record.
I'm currently doing two queries, but besides being architecturally not at all ideal, it's practically proving quite awkward to coordinate (which makes a lot of sense since I'm definitely doing it wrong). I'm not finding any info on how to coordinate LIMIT with LEFT JOINs, so I thought I'd ask. Thanks!
NB. Similar questions to this do indeed exist, but are asking how to do things like retrieving eg the first (say) 5 images for each item. I'm looking to retrieve (say) 10 items and get all the images for each.
A query like:
SELECT * FROM items ORDER BY ? LIMIT 10
will return 10 rows (at most) from items.
You need to provide the column(s) for the ORDER BY clause if you have some specific sorting condition in mind.
If not then remove the ORDER BY clause, but in this case nothing guarantees the resultset that you will get.
So all you have to do is LEFT join the above query to images:
SELECT it.*, im.*
FROM (SELECT * FROM items ORDER BY ? LIMIT 10) it
LEFT JOIN images im
ON im.item_id = it.id

How do I write an SQL query whose where clause is contained in another query?

I have two tables, one for Customer and one for Item.
In Customer, I have a column called "preference", which stores a list of hard criteria expressed as a WHERE clause in SQL e.g. "item.price<20 and item.category='household'".
I'd like a query that works like this:
SELECT * FROM item WHERE interpret('SELECT preference FROM customer WHERE id = 1')
Which gets translated to this:
SELECT * FROM item WHERE item.price < 20 and item.category = 'household'
Example data model:
CREATE TABLE customer (
cust_id INT
preference VARCHAR
);
CREATE TABLE item (
item_id INT
price DECIMAL(19,4)
category VARCHAR
);
# Additional columns omitted for brevity
I've looked up casting and dynamic SQL but I haven't been able to figure out how I should do this.
I'm using PostgreSQL 9.5.1
I'm going to assume that preference is the same as my made up item_id column. You may need to tweak it to fit your case. For future questions like this it may pay to give us the table structures you are working with, it really helps us out!
What you are asking for is a subquery:
select *
from item
where item_id in (select
preference
from
customer
where id = 1)
What I would suggest though is a join:
select item.*
from item
join customer on item.item_id = customer.preference
where item.price<20 and
item.category='household'
customer.id = 1
I decided to change my schema instead, as it was getting pretty messy to store the criteria in preferences in that manner.
I restricted the kinds of preferences that could be specified, then stored them as columns in Customer.
After that, all the queries I wanted could be expressed as joins.

one-to-one or one-to-many: one table with hundred columns or moving columns to an extra table

i am trying to find the best DB design for the following problem.
i have 20000 data sets which look like this:
1. id, name, color, width, xxxx ... 150 attributes
2. id, name, color, width, xxxx ... 150 attributes
3. ...
which means that i have 20000 entities and 150 attributes like color, width etc for each one of them.
i need all these attributes and maybe 15 are being used more than others. this is being used in a web application and it has to perform.
solutions i thought about:
normalized two tables approach:
id, name and a few "more important" attributes in one main table
in another table (one-to-one relation): id and other less important attributes, every one of the in different column
everything in one monster table:
id, name, color, width ...
normalized two table approach (one-to-many):
main table with: id and name
in another table (one-to-many relation): other table with: id, attr_name, value
i like [3] most but i am not sure is this going to perform if i need a lot of data because every "id" has 150 values. and i would have to things like
SELECT mt.id, mt.name, at.attr_name, at.value
FROM main_table mt
INNER JOIN attr_table at ON at.id = mt.id
AND at.attr_name IN ('width', 'color', 'a', 'b', 'c' .....)
AND at.id IN (1,3,9...)
ORDER BY 1
having maybe 15-20 different values in "attr_name IN (...)" does not look optimal. and if i need 10-30 different datasets (which i usually do), it looks even less appealing.
output of this would be probably 200-300 lines and I would have to normalize this output in the code.
[2] is pretty dirty and simple but i am not sure how does it perform. and having 150 columns in one monster table also does not look optimal.
i like on this approach that i can do a lot of stuff in sql and not later in code like: attr1 - attr2 ... (like "max_width - width" or weight - max_weight/4).
[1] i don't like this one because it does not look clean to have "some" attributes in one table and all other attributes of the same type in an another
What is the best solution for this specific problem?
I found some similar but not same questions:
Best to have hundreds of columns or split into multiple tables?
Is it better to have many columns, or many tables?
With so few rows as 20,000 I would have no doubt about going full normalized. In my opinion even with much more rows I would still do it in instead of the additional JSON's full new set of problems and weaknesses.
output of this would be probably 200-300 lines and I would have to normalize this output in the code
Create a view to not have to do the join at every query
create view the_view as
select id, mt.name, at.attr_name, at.value
from
main_table mt
inner join
attr_table at using (id)
Filter when selecting
select *
from the_view
where
attr_name = 'color' and "value" = 'red'
and
attr_name = 'width' and "value" = '30'
and
id in (1, 3, 9)
150 columns suggest that the attributes list is not stable. If you create a table with 150 columns you will have to be always altering the table to add new columns. The normalized approach is flexible. You create attributes at will simple by adding a row to a table.
There should be 3 tables; main_table, main_table_attr_table and attr_table. The main_table_attr_table is a n to n connection between the other two.

What is the best way to implement this SQL query?

I have a PRODUCTS table, and each product can have multiple attributes so I have an ATTRIBUTES table, and another table called ATTRIBPRODUCTS which sits in the middle. The attributes are grouped into classes (type, brand, material, colour, etc), so people might want a product of a particular type, from a certain brand.
PRODUCTS
product_id
product_name
ATTRIBUTES
attribute_id
attribute_name
attribute_class
ATTRIBPRODUCTS
attribute_id
product_id
When someone is looking for a product they can select one or many of the attributes. The problem I'm having is returning a single product that has multiple attributes. This should be really simple I know but SQL really isn't my thing and past a certain point I get a bit lost in the logic. The problem is I'm trying to check each attribute class separately so I want to end up with something like:
SELECT DISTINCT products.product_id
FROM attribproducts
INNER JOIN products ON attribproducts.product_id = products.product_id
WHERE (attribproducts.attribute_id IN (9,10,11)
AND attribproducts.attribute_id IN (60,61))
I've used IN to separate the blocks of attributes of different classes, so I end up with the products which are of certain types, but also of certain brands. From the results I've had it seems to be that AND between the IN statements that's causing the problem.
Can anyone help a little? I don't have the luxury of completely refactoring the database unfortunately, there is a lot more to it than this bit, so any suggestions how to work with what I have will be gratefully received.
Take a look at the answers to the question SQL: Many-To-Many table AND query. It's the exact same problem. Cletus gave there 2 possible solutions, none of which very trivial (but then again, there simply is no trivial solution).
SELECT DISTINCT products.product_id
FROM products p
INNER JOIN attribproducts ptype on p.product_id = ptype.product_id
INNER JOIN attribproducts pbrand on p.product_id = pbrand.product_id
WHERE ptype.attribute_id IN (9,10,11)
AND pbrand.attribute_id IN (60,61)
Try this:
select * from products p, attribproducts a1, attribproducts a2
where p.product_id = a1.product_id
and p.product_id = a2.product_id
and a1.attribute_id in (9,10,11)
and a2.attribute_id in (60,61);
This will return no rows because you're only counting rows that have a number that's (either 9, 10, 11) AND (either 60, 61).
Because those sets don't intersect, you'll get no rows.
If you use OR instead, it'll give products with attributes that are in the set 9, 10, 11, 60, 61, which isn't what you want either, although you'll then get multiple rows for each product.
You could use that select as an subquery in a GROUP BY statement, grouping by the quantity of products, and order that grouping by the number of shared attributes. That will give you the highest matches first.
Alternatively (as another answer shows), you could join with a new copy of the table for each attribute set, giving you only those products that match all attribute sets.
It sounds like you have a data schema that is GREAT for storage but terrible for selecting/reporting. When you have a data structure of OBJECT, ATTRIBUTE, OBJECT-ATTRIBUTE and OBJECT-ATTRIBUTE-VALUE you can store many objects with many different attributes per object. This is sometime referred to as "Vertical Storage".
However, when you want to retrieve a list of objects with all of their attributes values, it is an variable number of joins you have to make. It is much easier to retrieve data when it is stored horizonatally (Defined columns of data)
I have run into this scenario several times. Since you cannot change the existing data structure. My suggest would be to write a "layer" of tables on top. Dynamically create a table for each object/product you have. Then dynamically create static columns in those new tables for each attribute. Pretty much you need to "flatten" your vertically stored attribute/values into static columns. Convert from a vertical architecture into a horizontal ones.
Use the "flattened" tables for reporting, and use the vertical tables for storage.
If you need sample code or more details, just ask me.
I hope this is clear. I have not had much coffee yet :)
Thanks,
- Mark
You can use multiple inner joins -- I think this would work:
select distinct product_id
from products p
inner join attribproducts a1 on a1.product_id=p.product_id
inner join attribproducts a2 on a1.product_id=p.product_id
where a1.attribute_id in (9,10,11)
and a2.attribute_id in (60,61)