SQL query on one to many relationship with multiple filters - sql

I have some experience with SQL but I still couldn't find out how can I do the following query performance efficient.
I have 2 tables - Box and Item. Box has id attribute which is the primary key (and some more), and Item has box_id, type, name. Each table has billions of records, each box has on average 10 items.
I want to query all the boxes that have at least one item with a given type, and at least one item with a given name (could be the same item or different). The query should be paginated with page size of 10.
I used single column indexing on all Item attributes. The following query for that (the first page) takes a very long duration (more than a minute):
SELECT Box.id FROM Box WHERE
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.type = 'my_type')) AND
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.name = 'my_name'))
LIMIT 10
I think that the problem is making the intersection between boxes filtered in each part of the query (querying with just one of the constraints returns about million records). I am using Aurora PostgreSQL 9.6.6.

You haven't responded to the clarifications so I will assume a few things:
You want ALL the boxes, not just 10 of them.
There's a typo when comparing by name. Should be: Item.name = 'my_name'
You said "I have indexed all Item attributes." I would assume you have single column indexes for all the columns of the Item table.
The column id of Box is the primary key, and therefore it already has an index on it.
Now, my take is the indexes you are using are not optimal for this query since they only include columns separately. If you don't already have them, please try creating the following indexes:
create index ix1 on Item (box_id, type);
create index ix2 on Item (box_id, name);
Yes, both of them. Try the query again and see how long does it take.
If still slow, please post the explain plan, using:
EXPLAIN ANALYZE
SELECT Box.id
FROM Box
WHERE
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.type = 'my_type'))
AND
(EXISTS (SELECT 1 FROM Item WHERE Item.box_id = Box.id AND Item.name = 'my_name'))

INTERSECT is another option.
SELECT Box_id FROM Item
WHERE Item.type = 'my_type'
INTERSECT
SELECT Box_id FROM Item
WHERE Item.name = 'my_name'
Note: INTERSECT returns distinct values so no need for an outer query to get the list of distinct Box_id values that meet your criteria. This query does return orphan items (items with a box_id not in the box table) so an outer query might be required if this is the case.

Something like this?
SELECT DISTINCT ON (Box.id) Box.*
FROM Box
JOIN Item I1 ON I1.box_id = Box.id AND I1.type = 'my_type'
JOIN Item I2 ON I2.box_id = Box.id AND I2.name = 'my_name'
ORDER BY Box.id;
JOINs filters results by item's type and name.

Related

Check if multiple values ALL EXIST in a table

SQL, SQL Server 2016
I've got a table "Characteristics" (from a catalog) and for a product (that comes with a list of characteristics). I need to check, if every item of the list is contained in Characteristics.
Only if all items of the list are present in the table, the catalog is considered valid.
The List of characteristics is simply a table with
ID CHARACTERISTIC
1 Blue
1 Yellow
1 Big
2 Pointy
...
For one item I can do a query like:
SELECT CatalogNumber FROM CHARACTERISTICS
WHERE EXISTS (SELECT * FROM CHARACTERISTICS WHERE Item = ID AND CHARACTERISTIC = 'Characteristic1')
AND EXISTS (SELECT * FROM CHARACTERISTICS WHERE Item = ID AND CHARACTERISTIC = 'Characteristic2')
But since the number of characteristics for each item in the list is different for each item, this approach doesn't work.
Is there a way to check, if all characteristics are in the catalog without resorting to a cursor and a loop?
Thank you in advance
Wolfgang
Select id from Characteristics
group by id
having count(*) = (select count(distinct Characteristic) from Characteristics);
DBfiddle demo

An Exclude Query with Link tables

I have two tables
**Item**
ID
...
**ShapeItem**
ID
ItemID
ShapeID
RegionID
(There are other columns and tables but they are not relevant to this question)
I can return all items that have a specific shape id in a specific region id successfully using an INNER JOIN
SELECT ID FROM Item INNER JOIN ShapeItem ON Item.ID = ShapeItem.ItemID WHERE ShapeID = 2
However I want to reverse this logic and return all items that DONT have a specific shape so my first thought was to do
SELECT ID FROM Item INNER JOIN ShapeItem ON Item.ID = ShapeItem.ItemID WHERE ShapeID <> 2
however this did not produce the required result, this returned all items that had a shape that was not the specific shape, but it did not account for those items that did not have any shapes at all.
My next thought was to use a LEFT JOIN but this returned every item with null values, (over 400,000)
I am currently stuck on this, can you suggest a way forward for me please?
Summary
I want to return all items that dont have a specific shape, including those items that are not referenced at all in the ShapeItem table.
SQL SERVER COMPACT 4.0
C#
Visual Studio 2012

Multiple Many-to-many bi-directional self-inner-joins without repeating whole query

I have a data model such that items can have many-to-many relationships with other items in the same table using a second table to define relationships. Let's call the primary table items, keyed by item_id and the relationships table item_assoc with columns item_id and other_item_id and assoc_type. Generally, you might use a union to pick up on relationships that may be defined in either direction in the item_assoc table, but you would wind up repeating other parts of the same query just to be sure to pick up associations defined in either direction.
Let's say that you're trying to put together a fairly complex query similar to the following where you want to find a list of items that have related items that COULD have associated cancellation items, but select those that do not have cancellation items:
select
orig.*
from items as orig
join item_assoc as orig2related
on orig.item_id = orig2related.item_id
join items as related
on orig2related.other_item_id = related.item_id
and orig2related.assoc_type = 'Related'
left join item_assoc as related2cancel
on related.item_id = related2cancel.item_id
left join items as cancel
on related2cancel.other_item_id = cancel.item_id
and related2cancel.assoc_type = 'Cancellation'
where cancel.item_id is null
This query obviously only picks up items whose relationships are defined in one direction. For a less complex query, I might solve this by adding a union at the bottom for every permutation of the reverse relationships, but I think that would make the query unnecessarily long and hard to understand.
Is there a way I can define both directions of each relationship without repeating the other parts of the query?
A UNION within item_assoc could help. Assuming you have a DB without a WITH clause you would have to define a view
CREATE VIEW bidirec_item_assoc AS
(
SELECT item_id, other_item_id, assoc_type, 1 as direction FROM item_assoc
UNION
SELECT other_item_id, item_id, assoc_type, 2 as direction FROM item_assoc
)
You can now use bidirec_item_assoc in your queries where you have used items_assoc before.
Edited Out: You could add columns for direction and relationtype, of course
Simplify, simplify, simplify: Don't involve tables in the query that aren't needed.
The following query should be equivalent to your sample query and more expressive of your intent:
select i.*
from items i
where not exists ( select *
from item_assoc r
join item_assoc c on c.item_id = r.item_id
and c.assoc_type = 'Cancellation'
where r.item_id = i.item_id
and r.assoc_type = 'Related'
)
It should select the set of items that aren't related to an item that has been cancelled. There's not need to join against the items table 3 times.
Further, your original query will have duplicate rows: every row in the first item table (orig) will be duplicated once for every related item.

SQL Selecting rows with same value of foreign key

Does anyone know how to select all rows from a table with same value of FK without giving its value ? I have a database with warehouse. It has sectors and items of certain values in each sector . I want to select the sectors where overall value of items bigger than a certain number with a single query . And i want the query to be universal - it should sum up overall values of items in every sector of the warehouse ( without specyfing number of secotrs or how many sectors are there ) Anyone knows how to do it ? I don't need a full query, just a way to say my database that it to sum up all values in certain sectors. SectorID is the Foreign Key and Item is the table ( with ItemID as public key and Value as value of item )
I would make use of a combination of queries. Basically, this problem can be solved as below:
Assuming the presence of ID columns in both the Item and Sector tables. Let the value that acts as the threshold T (a certain number returned by a single query as stated above):
Use an inner query to select sector.sectorid, Item.itemid and Item.value by joining the Sector and Item tables on the Item.SectorID = Sector.SectorID Where Item.value > T
Sum(Item.value) on the result obtained from the inner query above and GROUP BY(SECTORID), GROUPBY(ITEMID).
You seem to want a group by query. This is pretty basic, so I assume you are pretty new to SQL:
select SectorId, sum(itemValue) as TotalItemValue
from warehouse w
group by SectorId
having sum(itemValue) > YOURVALUEHERE;
If you want the items in the sectors, then you can get that with a join or in:
select *
from warehouse w
where SectorId in (select SectorId
from warehouse
group by SectorId
having sum(itemValue) > YOURVALUEHERE
)

how to select owned items or items that have been shared in access 2003

I have two tables, Items and Items_People. Each item has an id and a userid (the person who owns the item). Items_People, the table that shows who the item has been shared with, has an itemid and a userid. I want to get a list of items that the user owns or items that have been shared with that user
Here is what i have so far:
SELECT * FROM Items
WHERE id IN (SELECT itemid as id FROM Items_People where userid = 1)
OR userid=1
This does work, but I'm not sure if a nested select with WHERE IN is the fastest way of doing it. Should I be using some kind of join?
Do some testing to see which runs faster for you. I believe this query will work from what you stated in the question.
SELECT *
FROM Items
OUTER JOIN Items_People on Items.id = Items_People.itemid AND Items_People.userid = 1
WHERE Items.userid=1
Obviously, run this query first to make sure it gives you the same results. Then test each query to see if you notice any difference in speed.