Multiple Many-to-many bi-directional self-inner-joins without repeating whole query - sql

I have a data model such that items can have many-to-many relationships with other items in the same table using a second table to define relationships. Let's call the primary table items, keyed by item_id and the relationships table item_assoc with columns item_id and other_item_id and assoc_type. Generally, you might use a union to pick up on relationships that may be defined in either direction in the item_assoc table, but you would wind up repeating other parts of the same query just to be sure to pick up associations defined in either direction.
Let's say that you're trying to put together a fairly complex query similar to the following where you want to find a list of items that have related items that COULD have associated cancellation items, but select those that do not have cancellation items:
select
orig.*
from items as orig
join item_assoc as orig2related
on orig.item_id = orig2related.item_id
join items as related
on orig2related.other_item_id = related.item_id
and orig2related.assoc_type = 'Related'
left join item_assoc as related2cancel
on related.item_id = related2cancel.item_id
left join items as cancel
on related2cancel.other_item_id = cancel.item_id
and related2cancel.assoc_type = 'Cancellation'
where cancel.item_id is null
This query obviously only picks up items whose relationships are defined in one direction. For a less complex query, I might solve this by adding a union at the bottom for every permutation of the reverse relationships, but I think that would make the query unnecessarily long and hard to understand.
Is there a way I can define both directions of each relationship without repeating the other parts of the query?

A UNION within item_assoc could help. Assuming you have a DB without a WITH clause you would have to define a view
CREATE VIEW bidirec_item_assoc AS
(
SELECT item_id, other_item_id, assoc_type, 1 as direction FROM item_assoc
UNION
SELECT other_item_id, item_id, assoc_type, 2 as direction FROM item_assoc
)
You can now use bidirec_item_assoc in your queries where you have used items_assoc before.
Edited Out: You could add columns for direction and relationtype, of course

Simplify, simplify, simplify: Don't involve tables in the query that aren't needed.
The following query should be equivalent to your sample query and more expressive of your intent:
select i.*
from items i
where not exists ( select *
from item_assoc r
join item_assoc c on c.item_id = r.item_id
and c.assoc_type = 'Cancellation'
where r.item_id = i.item_id
and r.assoc_type = 'Related'
)
It should select the set of items that aren't related to an item that has been cancelled. There's not need to join against the items table 3 times.
Further, your original query will have duplicate rows: every row in the first item table (orig) will be duplicated once for every related item.

Related

SQL select with three tables

Hi guys I'm new with databases and I'm trying to make a query where I join 3 tables. I could make it and I want to clean up the result. I want to know how can I delete the column "pin" from users table and maybe some "ids" columns.
Select * from "wish-list"
Join products
On "wish-list".id = products.holiday_id
Join users
On "wish-list".user_id = users.id
Where "wish-list".id = 1
You need to specify which columns you really need in your output. At the moment you are using
SELECT * which outputs all columns of all joined tables.
Here is what it should look like:
SELECT holiday, products.description, users.pin FROM "wish-list"
JOIN products ON "wish-list".id = products.holiday_id
JOIN users ON "wish-list".user_id = users.id
WHERE "wish-list".id = 1
It's important that you reference all columns which are not your main entity (here wish-list) with tablename.column (products.description and not only description). It will work without referencing strictly but only if the column name is unique in your query.
Furthermore you can rename columns. This is useful for example if you want to get the id's of the product table and the wish-list table.
SELECT product.id AS product_id, id AS wishlist_id FROM "wish-list"
...
Hope that helps!

How to do a query with multiple foreign keys pointing to one table?

I have a PostgreSQL database with two tables (person, item). The person table consists of id, name, and let‘s say 5 item columns with foreign keys referencing to the item table. The item table consists of id, name and description.
I want to do a query now that list the person.id, person.name and the 5 item.name. How can I achieve this? I know something with JOIN but I don‘t get it right now.
You have a problem with your data model. You should not be storing lists of items in separate columns. There are multiple alternatives. The typical solution is a separate table with one row per person and item. You can also store the items as arrays or JSON.
But to answer your question, you need multiple joins:
select p.*, i1.name as item_name_1, i2.name as item_name_2
from person p left join
items i1
on p.item_id_1 = i1.id left join
items i2
on p.item_id_2 = i1.id left join
. . . -- and so on

Best way to filter union of data from 2 tables by value in shared 3rd table

For sake of example, let's assume 3 tables:
PHYSICAL_ITEM
ID
SELLER_ID
NAME
COST
DIMENSIONS
WEIGHT
DIGITAL_ITEM
ID
SELLER_ID
NAME
COST
DOWNLOAD_PATH
SELLER
ID
NAME
Item IDs are guaranteed unique across both item tables. I want to select, in order, with a type label, all item IDs for a given seller. I've come up with:
Query A
SELECT PI.ID AS ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM PI
JOIN SELLER S ON PI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
UNION
SELECT DI.ID AS ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM DI
JOIN SELLER S ON DI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
ORDER BY ID
Query B
SELECT ITEM.ID, ITEM.TYPE
FROM (SELECT ID, SELLER_ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM
UNION
SELECT ID, SELLER_ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM) AS ITEM
JOIN SELLER ON ITEM.SELLER_ID = SELLER.ID
WHERE SELLER.NAME = 'name'
ORDER BY ITEM.ID
Query A seems like it would be the most efficient, but it also looks unnecessarily duplicative (2 table joins to the same table, 2 where clauses on the same table column). Query B looks cleaner in a way to me (no duplication), but it also looks much less efficient, since it has a subquery. Is there a way to get the best of both worlds, so to speak?
In both cases, replace the union with union all. Union unnecessarily removes duplicates.
I would expect Query A to be more efficient, because the optimizer has more information when doing the join (although I think Oracle is pretty good with using indexes even after a union). In addition, the first query reduces the amount of data before the union.
This is, however, only an opinion. The real test is to time the two queries -- multiple times to avoid cache fill delays -- to see which is better.

Recursive SQL question

I need to find all the categories on the current level or below that have items, or have subcategories with items.
Categories have CategoryID, ParentCategoryID.
Items have a CategoryID.
I have most of the solution using a stored procedure:
AS
WITH get_cat_hier
AS
(
Select e.CategoryID, e.ParentCategoryID, From Categories AS e
where e.ParentCategoryId = #ParentCategoryId
union ALL
Select e.CategoryID, e.ParentCategoryID, From Categories e
inner join get_cat_hier AS ecte on ecte.CategoryID = e.ParentCategoryID
)
select DISTINCT e.CategoryID from Categories as e
inner join items as item on (item.CategoryID = e.CategoryID) -- *******Problem*****
where
(e.CategoryID in (select CategoryID FROM get_cat_hier AS CategoryID)
)
Unfortunately, this returns only the categories with items, and not categories with sub-categories with items. I need to replace the "item.CategoryID = e.CategoryID" with a recursive call somehow.
I'm not sure if this is a new solution or one that is in development. Doing this type of reporting is much much easier if you use nested sets to represent hierarchies. Joe Celko has some great articles on this topic.
I did request tracking system a number of years ago where there was a deep hierarchy for the chain of command. The reporting had to be for an individual and all their subordinates.
You should consider using nested sets and not using a parent pointer system.
http://en.wikipedia.org/wiki/Nested_set_model
Take out your where clause (I don't think it is needed)
Leave your item join and then also join to the get_cat_heir between your item table and categories table.
I think it's the CTE's anchor member that causes the wrong result set.
Its WHERE clause should actually be where e.CategoryId = #ParentCategoryId.

Optimizing MySQL Query

We have a query that is currently killing our database and I know there has to be a way to optimize it. We have 3 tables:
items - table of items where each items has an associated object_id, length, difficulty_rating, rating, avg_rating & status
lists - table of lists which are basically lists of items created by our users
list_items - table with 2 columns: list_id, item_id
We've been using the following query to display a simple HTML table that shows each list and a number of attributes related to the list including averages of attributes of the included list items:
select object_id, user_id, slug, title, description, items,
city, state, country, created, updated,
(select AVG(rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_rating',
(select AVG(avg_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_avg_rating',
(select AVG(length) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_length',
(select AVG(difficulty_rating) from items
where object_id IN
(select object_id from list_items where list_id=lists.object_id)
AND status="A"
) as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby LIMIT $start,$step
The reason why we haven't broken this up in 1 query to get all the lists and subsequent lookups to pull the averages for each list is because we want the user to be able to sort on the averages columns (i.e. 'order by avg_difficulty').
Hopefully my explanation makes sense. There has to be a much more efficient way to do this and I'm hoping that a MySQL guru out there can point me in the right direction. Thanks!
It looks like you can replace all the subqueries with joins:
SELECT l.object_id,
l.user_id,
<other columns from lists>
AVG(i.rating) as avgrating,
AVG(i.avg_rating) as avgavgrating,
<other averages>
FROM lists l
LEFT JOIN list_items li
ON li.list_id = l.object_id
LEFT JOIN items i
ON i.object_id = li.object_id
AND i.status = 'A'
WHERE l.user_id = $user_id AND l.status = 'A'
GROUP BY l.object_id, l.user_id, <other columns from lists>
That would save a lot of work for the DB engine.
Here how to find the bottleneck:
Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.
To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
A couple of things to consider:
Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.
Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...
Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.
-aj
That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.
Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.
The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.
Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance, by Fabian Pascal).
I will rewrite your query, below, to highlight that:
let LI =
select object_id from list_items where list_id=lists.object_id
in
select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
(select AVG(rating) from items where object_id IN LI AND status="A") as 'avg_rating',
(select AVG(avg_rating) from items where object_id IN LI AND status="A") as 'avg_avg_rating',
(select AVG(length) from items where object_id IN LI AND status="A") as 'avg_length',
(select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
from lists
where user_id=$user_id AND status="A"
order by $orderby
LIMIT $start, $step
Note: this is only the first step to refactor that beast.
I wonder: why people rarely - if at all - use views, even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.