Check if multiple values ALL EXIST in a table - sql

SQL, SQL Server 2016
I've got a table "Characteristics" (from a catalog) and for a product (that comes with a list of characteristics). I need to check, if every item of the list is contained in Characteristics.
Only if all items of the list are present in the table, the catalog is considered valid.
The List of characteristics is simply a table with
ID CHARACTERISTIC
1 Blue
1 Yellow
1 Big
2 Pointy
...
For one item I can do a query like:
SELECT CatalogNumber FROM CHARACTERISTICS
WHERE EXISTS (SELECT * FROM CHARACTERISTICS WHERE Item = ID AND CHARACTERISTIC = 'Characteristic1')
AND EXISTS (SELECT * FROM CHARACTERISTICS WHERE Item = ID AND CHARACTERISTIC = 'Characteristic2')
But since the number of characteristics for each item in the list is different for each item, this approach doesn't work.
Is there a way to check, if all characteristics are in the catalog without resorting to a cursor and a loop?
Thank you in advance
Wolfgang

Select id from Characteristics
group by id
having count(*) = (select count(distinct Characteristic) from Characteristics);
DBfiddle demo

Related

SQL query on one to many relationship with multiple filters

I have some experience with SQL but I still couldn't find out how can I do the following query performance efficient.
I have 2 tables - Box and Item. Box has id attribute which is the primary key (and some more), and Item has box_id, type, name. Each table has billions of records, each box has on average 10 items.
I want to query all the boxes that have at least one item with a given type, and at least one item with a given name (could be the same item or different). The query should be paginated with page size of 10.
I used single column indexing on all Item attributes. The following query for that (the first page) takes a very long duration (more than a minute):
SELECT Box.id FROM Box WHERE
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.type = 'my_type')) AND
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.name = 'my_name'))
LIMIT 10
I think that the problem is making the intersection between boxes filtered in each part of the query (querying with just one of the constraints returns about million records). I am using Aurora PostgreSQL 9.6.6.
You haven't responded to the clarifications so I will assume a few things:
You want ALL the boxes, not just 10 of them.
There's a typo when comparing by name. Should be: Item.name = 'my_name'
You said "I have indexed all Item attributes." I would assume you have single column indexes for all the columns of the Item table.
The column id of Box is the primary key, and therefore it already has an index on it.
Now, my take is the indexes you are using are not optimal for this query since they only include columns separately. If you don't already have them, please try creating the following indexes:
create index ix1 on Item (box_id, type);
create index ix2 on Item (box_id, name);
Yes, both of them. Try the query again and see how long does it take.
If still slow, please post the explain plan, using:
EXPLAIN ANALYZE
SELECT Box.id
FROM Box
WHERE
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.type = 'my_type'))
AND
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.name = 'my_name'))
INTERSECT is another option.
SELECT Box_id FROM Item
WHERE Item.type = 'my_type'
INTERSECT
SELECT Box_id FROM Item
WHERE Item.name = 'my_name'
Note: INTERSECT returns distinct values so no need for an outer query to get the list of distinct Box_id values that meet your criteria. This query does return orphan items (items with a box_id not in the box table) so an outer query might be required if this is the case.
Something like this?
SELECT DISTINCT ON (Box.id) Box.*
FROM Box
JOIN Item I1 ON I1.box_id = Box.id AND I1.type = 'my_type'
JOIN Item I2 ON I2.box_id = Box.id AND I2.name = 'my_name'
ORDER BY Box.id;
JOINs filters results by item's type and name.

How do I write an SQL query whose where clause is contained in another query?

I have two tables, one for Customer and one for Item.
In Customer, I have a column called "preference", which stores a list of hard criteria expressed as a WHERE clause in SQL e.g. "item.price<20 and item.category='household'".
I'd like a query that works like this:
SELECT * FROM item WHERE interpret('SELECT preference FROM customer WHERE id = 1')
Which gets translated to this:
SELECT * FROM item WHERE item.price < 20 and item.category = 'household'
Example data model:
CREATE TABLE customer (
cust_id INT
preference VARCHAR
);
CREATE TABLE item (
item_id INT
price DECIMAL(19,4)
category VARCHAR
);
# Additional columns omitted for brevity
I've looked up casting and dynamic SQL but I haven't been able to figure out how I should do this.
I'm using PostgreSQL 9.5.1
I'm going to assume that preference is the same as my made up item_id column. You may need to tweak it to fit your case. For future questions like this it may pay to give us the table structures you are working with, it really helps us out!
What you are asking for is a subquery:
select *
from item
where item_id in (select
preference
from
customer
where id = 1)
What I would suggest though is a join:
select item.*
from item
join customer on item.item_id = customer.preference
where item.price<20 and
item.category='household'
customer.id = 1
I decided to change my schema instead, as it was getting pretty messy to store the criteria in preferences in that manner.
I restricted the kinds of preferences that could be specified, then stored them as columns in Customer.
After that, all the queries I wanted could be expressed as joins.

SQL Selecting rows with same value of foreign key

Does anyone know how to select all rows from a table with same value of FK without giving its value ? I have a database with warehouse. It has sectors and items of certain values in each sector . I want to select the sectors where overall value of items bigger than a certain number with a single query . And i want the query to be universal - it should sum up overall values of items in every sector of the warehouse ( without specyfing number of secotrs or how many sectors are there ) Anyone knows how to do it ? I don't need a full query, just a way to say my database that it to sum up all values in certain sectors. SectorID is the Foreign Key and Item is the table ( with ItemID as public key and Value as value of item )
I would make use of a combination of queries. Basically, this problem can be solved as below:
Assuming the presence of ID columns in both the Item and Sector tables. Let the value that acts as the threshold T (a certain number returned by a single query as stated above):
Use an inner query to select sector.sectorid, Item.itemid and Item.value by joining the Sector and Item tables on the Item.SectorID = Sector.SectorID Where Item.value > T
Sum(Item.value) on the result obtained from the inner query above and GROUP BY(SECTORID), GROUPBY(ITEMID).
You seem to want a group by query. This is pretty basic, so I assume you are pretty new to SQL:
select SectorId, sum(itemValue) as TotalItemValue
from warehouse w
group by SectorId
having sum(itemValue) > YOURVALUEHERE;
If you want the items in the sectors, then you can get that with a join or in:
select *
from warehouse w
where SectorId in (select SectorId
from warehouse
group by SectorId
having sum(itemValue) > YOURVALUEHERE
)

SQL query to search an unique ID that can be in three different tables

I have three tables that control products, colors and sizes. Products can have or not colors and sizes. Colors can or not have sizes.
product color size
------- ------- -------
id id id
unique_id id_product (FK from product) id_product (FK from version)
stock unique_id id_version (FK from version)
title stock unique_id
stock
The unique_id column, that is present in all tables, is a serial type (autoincrement) and its counter is shared with the three tables, basically it works as a global unique ID between them.
It works fine, but i am trying to increase the query performance when i have to select some fields based in the unique_id.
As i don't know where is the unique_id that i am looking for, i am using UNION, like below:
select title, stock
from product
where unique_id = 10
UNION
select p.title, c.stock
from color c
join product p on c.id_product = p.id
where c.unique_id = 10
UNION
select p.title, s.stock
from size s
join product p on s.id_product = p.id
where s.unique_id = 10;
Is there a better way to do this? Thanks for any suggestion!
EDIT 1
Based on #ErwinBrandstetter and #ErikE answers i decided to use the below query. The main reasons is:
1) As unique_id has indexes in all tables, i will get a good performance
2) Using the unique_id i will find the product code, so i can get all columns i need using a another simple join
SELECT
p.title,
ps.stock
FROM (
select id as id_product, stock
from product
where unique_id = 10
UNION
select id_product, stock
from color
where unique_id = 10
UNION
select id_product, stock
from size
where unique_id = 10
) AS ps
JOIN product p ON ps.id_product = p.id;
PL/pgSQL function
To solve the problem at hand, a plpgsql function like the following should be faster:
CREATE OR REPLACE FUNCTION func(int)
RETURNS TABLE (title text, stock int) LANGUAGE plpgsql AS
$BODY$
BEGIN
RETURN QUERY
SELECT p.title, p.stock
FROM product p
WHERE p.unique_id = $1; -- Put the most likely table first.
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, c.stock
FROM color c
JOIN product p ON c.id_product = p.id
WHERE c.unique_id = $1;
END;
IF NOT FOUND THEN
RETURN QUERY
SELECT p.title, s.stock
FROM size s
JOIN product p ON s.id_product = p.id
WHERE s.unique_id = $1;
END IF;
END;
$BODY$;
Updated function with table-qualified column names to avoid naming conflicts with OUT parameters.
RETURNS TABLE requires PostgreSQL 8.4, RETURN QUERY requires version 8.2. You can substitute both for older versions.
It goes without saying that you need to index the columns unique_id of every involved table. id should be indexed automatically, being the primary key.
Redesign
Ideally, you can tell which table from the ID alone. You could keep using one common sequence, but add 100000000 for the first table, 200000000 for the second and 300000000 for the third - or whatever suits your needs. This way, the least significant part of the number is easily distinguishable.
A plain integer spans numbers from -2147483648 to +2147483647, move to bigint if that's not enough for you. I would stick to integer IDs, though, if possible. They are smaller and faster than bigint or text.
CTEs (experimental!)
If you cannot create a function for some reason, this pure SQL solution might do a similar trick:
WITH x(uid) AS (SELECT 10) -- provide unique_id here
, a AS (
SELECT title, stock
FROM x, product
WHERE unique_id = x.uid
)
, b AS (
SELECT p.title, c.stock
FROM x, color c
JOIN product p ON c.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM a)
AND c.unique_id = x.uid
)
, c AS (
SELECT p.title, s.stock
FROM x, size s
JOIN product p ON s.id_product = p.id
WHERE NOT EXISTS (SELECT 1 FROM b)
AND s.unique_id = x.uid
)
SELECT * FROM a
UNION ALL
SELECT * FROM b
UNION ALL
SELECT * FROM c;
I am not sure whether it avoids additional scans like I hope. Would have to be tested. This query requires at least PostgreSQL 8.4.
Upgrade!
As I just learned, the OP runs on PostgreSQL 8.1.
Upgrading alone would speed up the operation a lot.
Query for PostgreSQL 8.1
As you are limited in your options, and a plpgsql function is not possible, this function should perform better than the one you have. Test with EXPLAIN ANALYZE - available in v8.1.
SELECT title, stock
FROM product
WHERE unique_id = 10
UNION ALL
SELECT p.title, ps.stock
FROM product p
JOIN (
SELECT id_product, stock
FROM color
WHERE unique_id = 10
UNION ALL
SELECT id_product, stock
FROM size
WHERE unique_id = 10
) ps ON ps.id_product = p.id;
I think it's time for a redesign.
You have things that you're using as bar codes for items that are basically all the same in one respect (they are SerialNumberItems), but have been split into multiple tables because they are different in other respects.
I have several ideas for you:
Change the Defaults
Just make each product required to have one color "no color" and one size "no size". Then you can query any table you want to find the info you need.
SuperType/SubType
Without too much modification you could use the supertype/subtype database design pattern.
In it, there is a parent table where all the distinct detail-level identifiers live, and the shared columns of the subtype tables go in the supertype table (the ways that all the items are the same). There is one subtype table for each different way that the items are distinct. If mutual exclusivity of the subtype is required (you can have a Color or a Size but not both), then the parent table is given a TypeID column and the subtype tables have an FK to both the ParentID and the TypeID. Looking at your design, in fact you would not use mutual exclusivity.
If you use the pattern of a supertype table, you do have the issue of having to insert in two parts, first to the supertype, then the subtype. Deleting also requires deleting in reverse order. But you get a great benefit of being able to get basic information such as Title and Stock out of the supertype table with a single query.
You could even create schema-bound views for each subtype, with instead-of triggers that convert inserts, updates, and deletes into operations on the base table + child table.
A Bigger Redesign
You could completely change how Colors and Sizes are related to products.
First, your patterns of "has-a" are these:
Product (has nothing)
Product->Color
Product->Size
Product->Color->Size
There is a problem here. Clearly Product is the main item that has other things (colors and sizes) but colors don't have sizes! That is an arbitrary assignment. You may as well have said that Sizes have Colors--it doesn't make a difference. This reveals that your table design may not be best, as you're trying to model orthogonal data in a parent-child type of relationship. Really, products have a ColorAndSize.
Furthermore, when a product comes in colors and sizes, what does the uniqueid in the Color table mean? Can such a product be ordered without a size, having only a color? This design is assigning a unique ID to something that (it seems to me) should never be allowed to be ordered--but you can't find this information out from the Color table, you have to compare the Color and Size tables first. It is a problem.
I would design this as: Table Product. Table Size listing all distinct sizes possible for any product ever. Table Color listing all distinct colors possible for any product ever. And table OrderableProduct that has columns ProductId, ColorID, SizeID, and UniqueID (your bar code value). Additionally, each product must have one color and one size or it doesn't exist.
Basically, Color and Size are like X and Y coordinates into a grid; you are filling in the boxes that are allowable combinations. Which one is the row and which the column is irrelevant. Certainly, one is not a child of the other.
If there are any reasonable rules, in general, about what colors or sizes can be applied to various sub-groups of products, there might be utility in a ProductType table and a ProductTypeOrderables table that, when creating a new product, could populate the OrderableProduct table with the standard set—it could still be customized but might be easier to modify than to create anew. Or, it could define the range of colors and sizes that are allowable. You might need separate ProductTypeAllowedColor and ProductTypeAllowedSize tables. For example, if you are selling T-shirts, you'd want to allow XXXS, XXS, XS, S, M, L, XL, XXL, XXXL, and XXXXL, even if most products never use all those sizes. But for soft drinks, the sizes might be 6-pack 8oz, 24-pack 8oz, 2 liter, and so on, even if each soft drink is not offered in that size (and soft drinks don't have colors).
In this new scheme, you only have one table to query to find the correct orderable product. With proper indexes, it should be blazing fast.
Your Question
You asked:
in PostgreSQL, so do you think if i use indexes on unique_id i will get a satisfactory performance?
Any column or set of columns that you use to repeatedly look up data must have an index! Any other pattern will result in a full table scan each time, which will be awful performance. I am sure that these indexes will make your queries lightning fast as it will take only one leaf-level read per table.
There's an easier way to generate unique IDs using three separate auto_increment columns. Just prepend a letter to the ID to uniquify it:
Colors:
C0000001
C0000002
C0000003
Sizes:
S0000001
S0000002
S0000003
...
Products:
P0000001
P0000002
P0000003
...
A few advantages:
You don't need to serialize creation of ids across tables to ensure uniqueness. This will give better performance.
You don't actually need to store the letter in the table. All IDs in the same table start with the same letter, so you only need to store the number. This means that you can use an ordinary auto_increment column to generate your IDs.
If you have an ID you only need to check the first character to see which table it can be found in. You don't even need to make a query to the database if you just want to know whether it's a product ID or a size ID.
A disadvantage:
It's no longer a number. But you can get around that by using 1,2,3 instead of C,S,P.
Your query will be pretty much efficient, as long as you have an index on unique_id, on every table and indices on the joining columns.
You could turn those UNION into UNION ALL but the won't be any differnce on performance, for this query.
This is a bit different. I don't understand the intended behaviour if stocks exists in more than one of the {product,color,zsize} tables. (UNION will remove duplicates, but for the row-as-a-whole, eg the {product_id,stock} tuples. That makes no sense to me. I just take the first. (Note the funky self-join!!)
SELECT p.title
, COALESCE (p2.stock, c.stock, s.stock) AS stock
FROM product p
LEFT JOIN product p2 on p2.id = p.id AND p2.unique_id = 10
LEFT JOIN color c on c.id_product = p.id AND c.unique_id = 10
LEFT JOIN zsize s on s.id_product = p.id AND s.unique_id = 10
WHERE COALESCE (p2.stock, c.stock, s.stock) IS NOT NULL
;

Distinct Values in SQL Query - Advanced

I have searched high and low and have tried for hours to manipulate the various other queries that seemed to fit but I've had no joy.
I have several Tables in Microsoft SQL Server 2005 that I'm trying to join, an example of which is:
Company Table (Comp_CompanyId, Comp_Name)
GroupCode_Link Table (gcl_c_groupcodelinkid, gcl_c_groupcodeid, gcl_c_companyid)
GroupCode Table (grp_c_groupcodeid, grp_c_groupcode, grp_c_name)
ItemCode Table (itm_c_itemcodeid, itm_c_name, itm_c_itemcode, itm_c_group)
ItemCode_Link Table (icl_c_itemcodelinkid, icl_c_companyid, icl_c_groupcodeid, icl_c_itemcodeid)
I'm using Link Tables to associate a Group to a Company, and an Item to a Group, so a Company could have multiple groups, with multiple items in each group.
Now, I'm trying to create an Advanced Find Function that will allow a user to enter, for example, an Item Code and the result should display those companies that have that item, sounds nice and simple!
However, I haven't done something right, if I use the following query ' if the company has this item OR this item, display it's name', I get the company appearing twice in the result set, once for each item.
What I need is to be able to say is:
"Show me a list of companies that have these items (displaying each company only once!)"
I've had a go at using COUNT, DISTINCT and HAVING but have failed on each as my query knowledge isn't up to it!
First, from your description it sounds like you might have a problem with your E-R (entity-relationship) model. Your description tells me that your E-R model looks something like this:
Associative entities (CompanyGroup, GroupItem) exist to implement many-to-many relationships (since many-to-many isn't supported directly by relational databases).
Nothing wrong with that if a group can exist within multiple companies or an item across multiple groups. It would seem more likely that, at least, each group is specific to a company (I can see items existing across multiple companies and/or groups: more than one company retails, for instance, Cuisinart food processors). If that is the case, a better E-R model would be to make each group a dependent entity with a CompanyID that is a component of its primary key. It's a dependent entity because the group doesn't have an independent existence: it's created by/on behalf of and exists for its parent company. If the company goes away, the group(s) tied to it go away. No your E-R model looks like this:
From that, we can write the query you need:
select *
from Company c
where exists ( select *
from GroupItem gi
where gi.ItemID in ( desired-itemid-1 , ... , desired-itemid-n )
and gi.CompanyID = c.CompanyID
)
As you can see, dependent entities are a powerful thing. Because of the key propagation, queries tend to get simpler. With the original data model, the query would be somewhat more complex:
select *
from Company c
where exists ( select *
from CompanyGroup cg
join GroupItem gi on gi.GroupId = cg.GroupID
where gi.ItemID in ( desired-itemid-1 , ... , desired-itemid-n )
and cg.CompanyID = c.CompanyID
)
Cheers!
SELECT *
FROM company c
WHERE (
SELECT COUNT(DISTINCT icl_c_itemcodeid)
FROM GroupCode_Link gl
JOIN ItemCode_Link il
ON il.icl_c_groupcodeid = gcl_c_groupcodeid
WHERE gl.gcl_c_companyid = c.Comp_CompanyId
AND icl_c_companyid = c.Comp_CompanyId
AND icl_c_itemcodeid IN (#Item1, #Item2)
) >= 2
Replace >= 2 with >= 1 if you want "any item" instead of "all items".
If you need to show companies that have item1 AND item2, you can use Quassnoi's answer.
If you need to show companies that have item1 OR item2, then you can use this:
SELECT
*
FROM
company
WHERE EXISTS
(
SELECT
icl_c_itemcodeid
FROM
GroupCode_Link
INNER JOIN
ItemCode_Link
ON icl_c_groupcodeid = gcl_c_groupcodeid
AND icl_c_itemcodeid IN (#item1, #item2)
WHERE
gcl_c_companyid = company.Comp_CompanyId
AND
icl_c_companyid = company.Comp_CompanyId
)
I would write something like the code below:
SELECT
c.Comp_Name
FROM
Company AS c
WHERE
EXISTS (
SELECT
1
FROM
GroupCode_Link AS gcl
JOIN
ItemCode_Link AS icl
ON
gcl.gcl_c_groupcodeid = icl.icl_c_groupcodeid
JOIN
ItemCode AS itm
ON
icl.icl_c_itemcodeid = itm.itm_c_itemcodeid
WHERE
c.Comp_CompanyId = gcl.gcl_c_companyid
AND
itm.itm_c_itemcode IN (...) /* here provide list of one or more Item Codes to look for */
);
but I see there's a icl_c_companyid column in the ItemCode_Link so using GroupCode_Link table is not necessary?