Recursive SQL question - sql

I need to find all the categories on the current level or below that have items, or have subcategories with items.
Categories have CategoryID, ParentCategoryID.
Items have a CategoryID.
I have most of the solution using a stored procedure:
AS
WITH get_cat_hier
AS
(
Select e.CategoryID, e.ParentCategoryID, From Categories AS e
where e.ParentCategoryId = #ParentCategoryId
union ALL
Select e.CategoryID, e.ParentCategoryID, From Categories e
inner join get_cat_hier AS ecte on ecte.CategoryID = e.ParentCategoryID
)
select DISTINCT e.CategoryID from Categories as e
inner join items as item on (item.CategoryID = e.CategoryID) -- *******Problem*****
where
(e.CategoryID in (select CategoryID FROM get_cat_hier AS CategoryID)
)
Unfortunately, this returns only the categories with items, and not categories with sub-categories with items. I need to replace the "item.CategoryID = e.CategoryID" with a recursive call somehow.

I'm not sure if this is a new solution or one that is in development. Doing this type of reporting is much much easier if you use nested sets to represent hierarchies. Joe Celko has some great articles on this topic.
I did request tracking system a number of years ago where there was a deep hierarchy for the chain of command. The reporting had to be for an individual and all their subordinates.
You should consider using nested sets and not using a parent pointer system.
http://en.wikipedia.org/wiki/Nested_set_model

Take out your where clause (I don't think it is needed)
Leave your item join and then also join to the get_cat_heir between your item table and categories table.

I think it's the CTE's anchor member that causes the wrong result set.
Its WHERE clause should actually be where e.CategoryId = #ParentCategoryId.

Related

Multiple Many-to-many bi-directional self-inner-joins without repeating whole query

I have a data model such that items can have many-to-many relationships with other items in the same table using a second table to define relationships. Let's call the primary table items, keyed by item_id and the relationships table item_assoc with columns item_id and other_item_id and assoc_type. Generally, you might use a union to pick up on relationships that may be defined in either direction in the item_assoc table, but you would wind up repeating other parts of the same query just to be sure to pick up associations defined in either direction.
Let's say that you're trying to put together a fairly complex query similar to the following where you want to find a list of items that have related items that COULD have associated cancellation items, but select those that do not have cancellation items:
select
orig.*
from items as orig
join item_assoc as orig2related
on orig.item_id = orig2related.item_id
join items as related
on orig2related.other_item_id = related.item_id
and orig2related.assoc_type = 'Related'
left join item_assoc as related2cancel
on related.item_id = related2cancel.item_id
left join items as cancel
on related2cancel.other_item_id = cancel.item_id
and related2cancel.assoc_type = 'Cancellation'
where cancel.item_id is null
This query obviously only picks up items whose relationships are defined in one direction. For a less complex query, I might solve this by adding a union at the bottom for every permutation of the reverse relationships, but I think that would make the query unnecessarily long and hard to understand.
Is there a way I can define both directions of each relationship without repeating the other parts of the query?
A UNION within item_assoc could help. Assuming you have a DB without a WITH clause you would have to define a view
CREATE VIEW bidirec_item_assoc AS
(
SELECT item_id, other_item_id, assoc_type, 1 as direction FROM item_assoc
UNION
SELECT other_item_id, item_id, assoc_type, 2 as direction FROM item_assoc
)
You can now use bidirec_item_assoc in your queries where you have used items_assoc before.
Edited Out: You could add columns for direction and relationtype, of course
Simplify, simplify, simplify: Don't involve tables in the query that aren't needed.
The following query should be equivalent to your sample query and more expressive of your intent:
select i.*
from items i
where not exists ( select *
from item_assoc r
join item_assoc c on c.item_id = r.item_id
and c.assoc_type = 'Cancellation'
where r.item_id = i.item_id
and r.assoc_type = 'Related'
)
It should select the set of items that aren't related to an item that has been cancelled. There's not need to join against the items table 3 times.
Further, your original query will have duplicate rows: every row in the first item table (orig) will be duplicated once for every related item.

MS Access Distinct Records in Recordset

So, I once again seem to have an issue with MS Access being finicky, although it seems to also be an issue when trying similar queries in SSMS (SQL Server Management Studio).
I have a collection of tables, loosely defined as follows:
table widget_mfg { id (int), name (nvarchar) }
table widget { id (int), name (nvarchar), mfg_id (int) }
table widget_component { id (int), name (nvarchar), widget_id (int), component_id }
table component { id (int), name (nvarchar), ... } -- There are ~25 columns in this table
What I'd like to do is query the database and get a list of all components that a specific manufacturer uses. I've tried some of these queries:
SELECT c.*, wc.widget_id, w.mfg_id
FROM ((widget_component wc INNER JOIN widget w ON wc.widget_id = w.id)
INNER JOIN widget_manufacturer wm on w.mfg_id = wm.id)
INNER JOIN component c on c.id = wc.component_id
WHERE wm.id = 1
The previous example displays duplicates of any part that is contained in multiple widget_component lists for different widgets.
I've also tried doing:
SELECT DISTINCT c.id, c.name, wc.widget_id, w.mfg_id
FROM component c, widget_component wc, widget w, widget_manufacturer wm
WHERE wm.id=w.mfg_id AND wm.id = 1
This doesn't display anything at all. I was reading about sub-queries, but I do not understand how they work or how they would apply to my current application.
Any assistance in this would be beneficial.
As an aside, I am not very good with either MS Access or SQL in general. I know the basics, but not a lot beyond that.
Edit:
I just tried this code, and it works to get all the component.id's while limiting them to a single entry each. How do I go about using the results of this to get a list of all the rest of the component data (component.*) where the id's from the first part are used to select this data?
SELECT DISTINCT c.part_no
FROM component c, widget w, widget_component wc, widget_manufacturer wm
WHERE(((c.id=wc.component_id AND wc.widget_id=w.id AND w.mfg_id=wm.id AND wm.id=1)))
(P.S. this is probably not the best way to do this, but I am still learning SQL.)
What I'd like to do is query the database and get a list of all
components that a specific manufacturer uses
There are several ways to do this. IN is probably the easiest to write
SELECT c.*
FROM component c
WHERE c.id IN (SELECT c.component_id
FROM widget w
INNER JOIN widget_component c
ON w.id = c.widget_id
WHERE w.mfg_id = 123)
The IN sub query finds all the component ids that a specific manufacturer uses. The outer query then selects any component.id that is that result. It doesn't matter if its in there once or 1000 times it will only get the component record once.
The other ways of doing this are using an EXISTS sub query or using a join to the query (but then you do need to de-dup it)
It sounds like your component -to- widget relationship is one-to-many. Hence the duplicates. (i.e., the same component is used by more than one widget).
Your Select is almost OK --
SELECT c.*, wc.widget_id, w.mfg_id
but the wc.widget_id is causing the duplicates (per the assumption above).
So remove wc.widget_id from the SELECT, or else aggregate it (min, max, count, etc.). Removing is easier. If you agregate, remember to add a group by clause.
Try this:
SELECT DISTINCT c.*, w.mfg_id
Also -- FWIW, it's generally a better practice to use field names, instead of the *

Distinct Values in SQL Query - Advanced

I have searched high and low and have tried for hours to manipulate the various other queries that seemed to fit but I've had no joy.
I have several Tables in Microsoft SQL Server 2005 that I'm trying to join, an example of which is:
Company Table (Comp_CompanyId, Comp_Name)
GroupCode_Link Table (gcl_c_groupcodelinkid, gcl_c_groupcodeid, gcl_c_companyid)
GroupCode Table (grp_c_groupcodeid, grp_c_groupcode, grp_c_name)
ItemCode Table (itm_c_itemcodeid, itm_c_name, itm_c_itemcode, itm_c_group)
ItemCode_Link Table (icl_c_itemcodelinkid, icl_c_companyid, icl_c_groupcodeid, icl_c_itemcodeid)
I'm using Link Tables to associate a Group to a Company, and an Item to a Group, so a Company could have multiple groups, with multiple items in each group.
Now, I'm trying to create an Advanced Find Function that will allow a user to enter, for example, an Item Code and the result should display those companies that have that item, sounds nice and simple!
However, I haven't done something right, if I use the following query ' if the company has this item OR this item, display it's name', I get the company appearing twice in the result set, once for each item.
What I need is to be able to say is:
"Show me a list of companies that have these items (displaying each company only once!)"
I've had a go at using COUNT, DISTINCT and HAVING but have failed on each as my query knowledge isn't up to it!
First, from your description it sounds like you might have a problem with your E-R (entity-relationship) model. Your description tells me that your E-R model looks something like this:
Associative entities (CompanyGroup, GroupItem) exist to implement many-to-many relationships (since many-to-many isn't supported directly by relational databases).
Nothing wrong with that if a group can exist within multiple companies or an item across multiple groups. It would seem more likely that, at least, each group is specific to a company (I can see items existing across multiple companies and/or groups: more than one company retails, for instance, Cuisinart food processors). If that is the case, a better E-R model would be to make each group a dependent entity with a CompanyID that is a component of its primary key. It's a dependent entity because the group doesn't have an independent existence: it's created by/on behalf of and exists for its parent company. If the company goes away, the group(s) tied to it go away. No your E-R model looks like this:
From that, we can write the query you need:
select *
from Company c
where exists ( select *
from GroupItem gi
where gi.ItemID in ( desired-itemid-1 , ... , desired-itemid-n )
and gi.CompanyID = c.CompanyID
)
As you can see, dependent entities are a powerful thing. Because of the key propagation, queries tend to get simpler. With the original data model, the query would be somewhat more complex:
select *
from Company c
where exists ( select *
from CompanyGroup cg
join GroupItem gi on gi.GroupId = cg.GroupID
where gi.ItemID in ( desired-itemid-1 , ... , desired-itemid-n )
and cg.CompanyID = c.CompanyID
)
Cheers!
SELECT *
FROM company c
WHERE (
SELECT COUNT(DISTINCT icl_c_itemcodeid)
FROM GroupCode_Link gl
JOIN ItemCode_Link il
ON il.icl_c_groupcodeid = gcl_c_groupcodeid
WHERE gl.gcl_c_companyid = c.Comp_CompanyId
AND icl_c_companyid = c.Comp_CompanyId
AND icl_c_itemcodeid IN (#Item1, #Item2)
) >= 2
Replace >= 2 with >= 1 if you want "any item" instead of "all items".
If you need to show companies that have item1 AND item2, you can use Quassnoi's answer.
If you need to show companies that have item1 OR item2, then you can use this:
SELECT
*
FROM
company
WHERE EXISTS
(
SELECT
icl_c_itemcodeid
FROM
GroupCode_Link
INNER JOIN
ItemCode_Link
ON icl_c_groupcodeid = gcl_c_groupcodeid
AND icl_c_itemcodeid IN (#item1, #item2)
WHERE
gcl_c_companyid = company.Comp_CompanyId
AND
icl_c_companyid = company.Comp_CompanyId
)
I would write something like the code below:
SELECT
c.Comp_Name
FROM
Company AS c
WHERE
EXISTS (
SELECT
1
FROM
GroupCode_Link AS gcl
JOIN
ItemCode_Link AS icl
ON
gcl.gcl_c_groupcodeid = icl.icl_c_groupcodeid
JOIN
ItemCode AS itm
ON
icl.icl_c_itemcodeid = itm.itm_c_itemcodeid
WHERE
c.Comp_CompanyId = gcl.gcl_c_companyid
AND
itm.itm_c_itemcode IN (...) /* here provide list of one or more Item Codes to look for */
);
but I see there's a icl_c_companyid column in the ItemCode_Link so using GroupCode_Link table is not necessary?

SQL query: self-referencing foreign key relationship

I have two tables, tabSparePart and tabSparePartCategory. Every spare part belongs to a spare part category. I need all spare parts that belong to a specific category. But the problem is that a spare part category could be a "subcategory" of another, they reference each other (the "main categories" have 'null' in this FK column).
Let's say I need all spare parts with fiSparePartCategory=1 and all spare parts that belong to a category that is a "subcategory" of category=1.
How to write the SQL query that returns all spare parts regardless of how many levels of subcategories there are. I hope you understand my requirement.
The following is an illustration of what I have. How to make it dynamic so that it works regardless of the number of subcategories?
Thanks, Tim
Link to image: http://www.bilder-hochladen.net/files/4709-lg-jpg.html
EDIT: Following is an other static approach which works when there is only one level of subcategory:
SELECT SparePartName
FROM tabSparePart
WHERE (fiSparePartCategory IN
(SELECT idSparePartCategory
FROM tabSparePartCategory
WHERE (idSparePartCategory = 1) OR
(fiSparePartCategory = 1)))
You can use a recursive Common Table Expression for this.
In your case, you would need to get all sparepart category ids for a specific main category id and join that with the spareparts. Something like this:
WITH SparePartCategories(CategoryId) AS
(
SELECT c.idSparePartCategory
FROM tabSparePartCategory c
WHERE c.idSparePartCategory = 1
UNION ALL
SELECT c.idSparePartCategory
FROM tabSparePartCategory c
JOIN SparePartCategories parent ON c.fiSparePartCategory = parent.CategoryId
)
SELECT sp.SparePartName
FROM tabSparePart sp
JOIN SparePartCategories spc ON sp.fiSparePartCategory = spc.CategoryId

What is the best way to implement this SQL query?

I have a PRODUCTS table, and each product can have multiple attributes so I have an ATTRIBUTES table, and another table called ATTRIBPRODUCTS which sits in the middle. The attributes are grouped into classes (type, brand, material, colour, etc), so people might want a product of a particular type, from a certain brand.
PRODUCTS
product_id
product_name
ATTRIBUTES
attribute_id
attribute_name
attribute_class
ATTRIBPRODUCTS
attribute_id
product_id
When someone is looking for a product they can select one or many of the attributes. The problem I'm having is returning a single product that has multiple attributes. This should be really simple I know but SQL really isn't my thing and past a certain point I get a bit lost in the logic. The problem is I'm trying to check each attribute class separately so I want to end up with something like:
SELECT DISTINCT products.product_id
FROM attribproducts
INNER JOIN products ON attribproducts.product_id = products.product_id
WHERE (attribproducts.attribute_id IN (9,10,11)
AND attribproducts.attribute_id IN (60,61))
I've used IN to separate the blocks of attributes of different classes, so I end up with the products which are of certain types, but also of certain brands. From the results I've had it seems to be that AND between the IN statements that's causing the problem.
Can anyone help a little? I don't have the luxury of completely refactoring the database unfortunately, there is a lot more to it than this bit, so any suggestions how to work with what I have will be gratefully received.
Take a look at the answers to the question SQL: Many-To-Many table AND query. It's the exact same problem. Cletus gave there 2 possible solutions, none of which very trivial (but then again, there simply is no trivial solution).
SELECT DISTINCT products.product_id
FROM products p
INNER JOIN attribproducts ptype on p.product_id = ptype.product_id
INNER JOIN attribproducts pbrand on p.product_id = pbrand.product_id
WHERE ptype.attribute_id IN (9,10,11)
AND pbrand.attribute_id IN (60,61)
Try this:
select * from products p, attribproducts a1, attribproducts a2
where p.product_id = a1.product_id
and p.product_id = a2.product_id
and a1.attribute_id in (9,10,11)
and a2.attribute_id in (60,61);
This will return no rows because you're only counting rows that have a number that's (either 9, 10, 11) AND (either 60, 61).
Because those sets don't intersect, you'll get no rows.
If you use OR instead, it'll give products with attributes that are in the set 9, 10, 11, 60, 61, which isn't what you want either, although you'll then get multiple rows for each product.
You could use that select as an subquery in a GROUP BY statement, grouping by the quantity of products, and order that grouping by the number of shared attributes. That will give you the highest matches first.
Alternatively (as another answer shows), you could join with a new copy of the table for each attribute set, giving you only those products that match all attribute sets.
It sounds like you have a data schema that is GREAT for storage but terrible for selecting/reporting. When you have a data structure of OBJECT, ATTRIBUTE, OBJECT-ATTRIBUTE and OBJECT-ATTRIBUTE-VALUE you can store many objects with many different attributes per object. This is sometime referred to as "Vertical Storage".
However, when you want to retrieve a list of objects with all of their attributes values, it is an variable number of joins you have to make. It is much easier to retrieve data when it is stored horizonatally (Defined columns of data)
I have run into this scenario several times. Since you cannot change the existing data structure. My suggest would be to write a "layer" of tables on top. Dynamically create a table for each object/product you have. Then dynamically create static columns in those new tables for each attribute. Pretty much you need to "flatten" your vertically stored attribute/values into static columns. Convert from a vertical architecture into a horizontal ones.
Use the "flattened" tables for reporting, and use the vertical tables for storage.
If you need sample code or more details, just ask me.
I hope this is clear. I have not had much coffee yet :)
Thanks,
- Mark
You can use multiple inner joins -- I think this would work:
select distinct product_id
from products p
inner join attribproducts a1 on a1.product_id=p.product_id
inner join attribproducts a2 on a1.product_id=p.product_id
where a1.attribute_id in (9,10,11)
and a2.attribute_id in (60,61)