Two way relationships in SQL queries - sql

I have a small database that is used to track parts. for the sake of this example the table looks like this:
PartID (PK), int
PartNumber, Varchar(50), Unique
Description, Varchar(255)
I have a requirement to define that certain parts are classified as similar to each other.
To do this I have setup a second table that looks like this:
PartID, (PK), int
SecondPartID, (PK), int
ReasonForSimilarity, Varchar(255)
Then a many-to-many relationship has been setup between the two tables.
The problem comes when I need to report on the parts that are considered similar because the relationship is two way I.E. if part XYZ123 is similar to ABC678 then ABC678 is considered to be similar to XYZ123. So if I wanted to list all parts that are similar to a given part I either need to ensure the relationship is setup in both directions (which is bad because data is duplicated) or need to have 2 queries that look at the table in both directions. Neither of these solutions feels right to me.
So, how should this problem be approached? Can this be solved with SQL alone or does my design need to change to accommodate the business requirement?
Consider the following parts XYZ123, ABC123, ABC234, ABC345, ABC456 & EFG456 which have been entered into the existing structure entered above. You could end up with data that looks like this (omitting the reason field which is irrelevant at this point):
PartID, SecondPartID
XYZ123, ABC123
XYZ123, ABC234
XYZ123, ABC345
XYZ123, ABC456
EFG456, XYZ123
My user wants to know "Which parts are similar to XYZ123". This could be done using a query like so:
SELECT SecondPartID
FROM tblRelatedParts
WHERE PartID = 'XYZ123'
The problem with this though is it will not pick out part EFG456 which is related to XYZ123 despite the fact that the parts have been entered the other way round. It is feasible that this could happen depending on which part the user is currently working with and the relationship between the parts will always be two-way.
The problem I have with this though is that I now need to check that when a user sets up a relationship between two parts it does not already exist in the other direction.
#Goran
I have done some initial tests using your suggestion and this is how I plan to approach the problem using your suggestion.
The data listed above is entered into the new table (Note that I have changed the partID to part number to make the example clearer; the semantics of my problem haven't changed though)
The table would look like this:
RelationshipID, PartNumber
1, XYZ123
1, ABC123
2, XYZ123
2, ABC234
3, XYZ123
3, ABC345
4, XYZ123
4, ABC456
5, EFG456
5, XYZ123
I can then retrieve a list of similar parts using a query like this:
SELECT PartNumber
FROM tblPartRelationships
WHERE RelationshipID ANY (SELECT RelationshipID
FROM tblPartRelationships
WHERE PartNumber = 'XYZ123')
I'll carry out some more tests and if this works I'll feedback and accept the answer.

I've dealt with this issue by setting up a relationship table.
Part table:
PartID (PK), int
PartNumber, Varchar(50), Unique
Description, Varchar(255)
PartRelationship table:
RelationshipId (FK), int
PartID (FK), int
Relationship table:
RelationshipId (PK), int
Now similar parts simply get added to Relationship table:
RelationshipId, PartId
1,1
1,2
Whenever you add another part with relationshipId = 1 it is considered similar to any part with relationshipId = 1.
Possible API solutions for adding relationships:
Create new relationship for each list of similar parts. Let client load, change and update the entire list whenever needed.
Retrieve relationship(s) for a similar object. Filter the list by some criteria so that only one remains or let client choose from existing relationships. Create, remove PartRelationship record as needed.
Retrieve list of relationships from Relationship table. Let client specify parts and relationships. Create, remove PartRelationship records as needed.

Add a CHECK constraint e.g.
CHECK (PartID < SecondPartID);

I know this is old but why not just do this query with your original schema? Less tables and rows.
SELECT SecondPartID
FROM tblRelatedParts
WHERE PartID = 'XYZ123'
UNION
SELECT PartID
FROM tblRelatedParts
WHERE SecondPartID = 'XYZ123'
I am dealing with a similar issue and looking at the two approaches and wondering why you thought the schema with the relationship table was better. It seems like the original issue still exists in the sense that you still need to manage the relationships between them from both directions.

How about having two rows for each similarity. For example if you have objects A, B similar you will have in your relation table
A B
B A
I know you will double your relation data, but they are integers so it won't over kill your database. Instead you have some gains:
you won't use union. Union is over kill in any dbms. Especially when you have order by or group by
you can implement more specific relation: a is in relation with b, but b is not in relation with a. For example John can replace Dave, but Dave cannot replace John.

Related

In SQL Server I need to change data structure of relationships (FK)

Ok I wasn't entirely sure what to title this question, so here's the situation.
I'm big on data integrity... Meaning as many constraints and rules that I can use I want to use in SQL Server and not rely on the application.
So I have a website that has a business directory, and those businesses can create a post.
So I have two tables like this:
tbl_Business ( BusinessID, Title, etc. )
tbl_Business_Post ( PostID, BusinessID, PostTitle, etc. )
There's a FK relationship for the column BusinessID between the two tables. A post cannot exist in the tbl_Business_Post table without the BusinessID existing in the tbl_Business table.
So pretty standard...
I've recently added classifieds to the site. So now I have two more tables:
tbl_Classified ( ClassifiedID, SellerID, ClassifiedTitle, etc. )
tbl_Classified_Seller ( SellerID, SellerName, etc. )
What I'm wanting to do is take advantage of my tbl_Business_Post table to include classifieds in that as well. Think of its usage like a feed... So the site will show recent posts from businesses and classifieds all in one feed.
Here's where I need guidance.
I was tempted to remove the FK relationship on the tbl_Business_Posts...
I thought about creating another separate Posts table that holds the classifieds posts.
Is there a way to make a conditional FK relationship based on a column? For example, if it's a business posting the BusinessID must exist in the Business table, or if its a classifieds post, the SellerID must exist in the Seller table?
Or should I create a separate table to hold the classifieds posts and UNION both the tables on the query?
You might question why I have a "Posts" table and that's hard to explain... but I do need it for the way the site is organized and how the feed works.
It's just that the posts table is perfect and I wanted to combine all posts and organize them by type (Ie: 'business', 'classified', 'etc.') as there might be more later.
So it comes down to, what's the best way to organize this to sustain data integrity from SSMS?
Thank you for guidance.
======== EDIT =========
Full explanation of tbl_Business_Post
PostID PK
Post_Type int <-- 1-21 is business types, 22 for classified type
BusinessID INT <-- This is the FK currently for the tbl_Business
SiblingID INT <-- This is the ID of the related item they're posting on. So for example, if they post a story about one of their products, this is the ProductID, if it's a service, this is the ServiceID.
Post_Title <-- Depending on the post, this could be a Product title, a service title, etc.
So if I changed the structure so it's as follows:
PostID PK
Post_Type int
BusinessID INT <-- this is populated on insert if it's a business.
SellerID INT <-- This is populated on insert if it's a classified seller
SiblingID INT <-- This is either the classifiedID or ProductID, SeviceID, etc. Depending on post type.
So leaning toward Peter's 1st solution/example... interested in the proper way to create check constraints or triggers on this so that if the type is 1-21, it makes sure BusinessID exists in the Business table, or if it's type 22, make sure the SellerID exists in the seller table.
Even going further with this:
If Post_Type = 22, I should make sure that not only is the Seller in the seller table, but the SiblingID is also the ClassifiedID in the Classified table.
1) There's no way to do this kind of conditional FK you're thinking of. What you need here is basically a FK from tbl_Business_Post which points logically to one of two tables, depending on the value in another column of tbl_Business_Post. This situation is what people encounter quite often. But in a relational DB this is not a very native idea.
So OK, this cannot be enforced with a FK. Instead, you can probably enforce this with a trigger or check constraint on tbl_Business_Post.
2) Alternatively, you can do the below.
Create some table tbl_Basic_Post, put there all columns which pertain to the post itself (e.g. PostTitle) and not to the parent entity which this post record belongs/points to (Business or Classified). Then create two other tables which point via a FK to the tbl_Basic_Post table like e.g.
tbl_Business_Post.Basic_Post_ID (FK)
tbl_Classified_Post.Basic_Post_ID (FK)
Put in these two tables the columns which are Business_Post/Classified_Post-specific
(you see, this is basically inheritable in relational DB terms).
Also, make each of these two tables have FKs to their respective parent tables
tbl_Business and tbl_Classified too. Now these FKs become unconditional (in your sense).
To get business posts you join tbl_Basic_Post and tbl_Business_Post.
To get classified posts you join tbl_Basic_Post and tbl_Classified_Post.
Both approaches have their pros and cons.
Approach 1) is simple, does not lead to the creation of too many tables; but it's not trivial to enforce the data integrity.
Approach 2) does not require anything special to enforce data integrity but leads to the creation of more tables.

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

database performance around storing and querying bi-directional relationships

I'm looking to determine whether it is better from a performance and coding perspective to store two associated database records as a single row (and search both columns for a specific record since the value could be in either place) or create a second row for that association and only search one column.
An example will help hopefully:
UserTable
userID INTEGER,
firstName VARCHAR2(20),
lastName VARCHAR2(20)
2 rows:
1, John, Smith
2, Terry, Jenkins
Second table (to track relationship between the two)
RelationshipTable
relationshipID INTEGER,
userID1 INTEGER,
userID2 INTEGER
Now to store a relationship between john and terry I could do:
Option1 (1 row):
relationshipID, userID1, userID2
1, 1, 2
Then to look for any relationship that terry is a part of i would have to do something like
SELECT *
FROM RelationshipTable
WHERE userID1 = [terrysID] OR userID2 = [terrysID]
Or I could go with 2 rows and inserting each ID in the association into a specific column.
Option2 (2 rows):
relationshipID, userID1, userID2
1, 1, 2
2, 2, 1
and find any relationships that terry is a part of by:
SELECT *
FROM RelationshipTable
WHERE userID1 = [terrysID]
I'm not sure which is better.
I could setup indexes on both columns which would help with the first option. However, I would still have to do some results post-processing to determine which column in the resultset has the ID that is not terry's. And i think the coding is a bit messier since I'd have to repeat that logic in multiple places.
On the other-hand, the second approach effectively doubles the amount of data, and even scarier, duplicates data without adding any real "business value". So if that relationship ever ended I would have to ensure I deleted both records (or soft-deleted or whatever we chose to do).
I never know if I would be searching for John's relationship's or Terry's relationship's so I cannot intelligently insert either ID into a specific column at time of relationship creation.
Thoughts? There might be a third option that I haven't thought of that is the better? Something like creating a view on the table that creates the two rows for querying but without actually duplicating the data? Obviously that would create additional overhead on the system.
Edit:
This looks like a similar question, but I am not sure any answer accurately satisfies what I am looking for.
Two way relationships in SQL queries
Thanks!
In terms of clarity and ease of use, I'd go with option 1. This has the drawback of a bug allowing 1 to relate to 2 and also 2 to relate to 1 which would be redundant. However, that would be up to the front end to stop (you can't do everything in the DB).
Your postprocessing can be totally avoided by not using the simple select you gave, but by using this:
SELECT relationshipId, user1Id, user2Id
FROM RelationshipTable
WHERE userID1 = [terrysID]
union all
SELECT relationshipId, user2Id, user1Id
from RelationshipTable
where userID2 = [terrysID]
This will mean that [terrysId] will always be the first of the pair. If you have indexes on both columns, then it should be pretty efficient too.

Database structure, one big entity for multiple entities

Suppose that I have a store-website where user can leave comments about any product.
Suppose that I have tables(entities) in my website database: let it be 'Shoes', 'Hats' and 'Skates'.
I don't want to create separate "comments" table for every entity (like 'shoes_comments', 'hats_comments', 'skates_comments').
My idea is to somehow store all the comments in one big table.
One way to do this, that I thought of, is to create a table:
table (comments):
ID (int, Primary Key),
comment (text),
Product_id (int),
isSkates (boolean),
isShoes (boolean),
isHats (boolean)
and like flag for every entity that could have comments.
Then when I want to get comments for some product the SELECT query would look like:
SELECT comment
FROM comments, ___SOMETABLE___
WHERE ____SOMEFLAG____ = TRUE
AND ___SOMETABLE___.ID = comments.Product_id
Is this an efficient way to implement database for needed functionality?
What other ways i can do this?>
Sorry, this feels odd.
Do you indeed have one separate table for each product type? Don't they have common fields (e.g. name, description, price, product image, etc.)?
My recommendation as for tables: product for common fields, comments with foreign key to product but no hasX columns, hat with only the fields that are specific to the hat product line. The primary key in hat is either the product PK or an individual unique value (then you'd need an extra field for the foreign key to product).
I would recommend you to make one table for the comments and use a foreign key of other tables in the comments table.
The "normalized" way to do this is to add one more entity (say, "Product") that groups all characteristics common to shoes, hats and skates (including comments)
+-- 0..1 [Shoe]
|
[Product] 1 --+-- 0..1 [Hat]
1 |
| +-- 0..1 [Skate]
*
[Comment]
Besides performance considerations, the drawback here is that there is nothing in the data model preventing a row in Product to be referenced both by a row in Shoe and one in Hat.
There are other alternatives too (each with perks & flaws) - you might want to read something about "jpa inheritance strategies" - you'll find java-specific articles that discuss your same issue (just ignore the java babbling and read the rest)
Personally, I often end up using a single table for all entities in a hierarchy (shoes, hats and skates in our case) and sacrificing constraints on the altar of performance and simplicity (eg: not null in a field that is mandatory for shoes but not for hats and skates).

Basic question: how to properly redesign this schema

I am hopping on a project that sits on top of a Sql Server 2008 DB with what seems like an inefficient schema to me. However, I'm not an expert at anything SQL, so I am seeking for guidance.
In general, the schema has tables like this:
ID | A | B
ID is a unique identifier
A contains text, such as animal names. There's very little variety; maybe 3-4 different values in thousands of rows. This could vary with time, but still a small set.
B is one of two options, but stored as text. The set is finite.
My questions are as follows:
Should I create another table for names contained in A, with an ID and a value, and set the ID as the primary key? Or should I just put an index on that column in my table? Right now, to get a list of A's, it does "select distinct(a) from table" which seems inefficient to me.
The table has a multitude of columns for properties of A. It could be like: Color, Age, Weight, etc. I would think that this is better suited in a separate table with: ID, AnimalID, Property, Value. Each property is unique to the animal, so I'm not sure how this schema could enforce this (the current schema implies this as it's a column, so you can only have one value for each property).
Right now the DB is easily readable by a human, but its size is growing fast and I feel like the design is inefficient. There currently is not index at all anywhere. As I said I'm not a pro, but will read more on the subject. The goal is to have a fast system. Thanks for your advice!
This sounds like a database that might represent a veterinary clinic.
If the table you describe represents the various patients (animals) that come to the clinic, then having properties specific to them are probably best on the primary table. But, as you say column "A" contains a species name, it might be worthwhile to link that to a secondary table to save on the redundancy of storing those names:
For example:
Patients
--------
ID Name SpeciesID Color DOB Weight
1 Spot 1 Black/White 2008-01-01 20
Species
-------
ID Species
1 Cocker Spaniel
If your main table should be instead grouped by customer or owner, then you may want to add an Animals table and link it:
Customers
---------
ID Name
1 John Q. Sample
Animals
-------
ID CustomerID SpeciesID Name Color DOB Weight
1 1 1 Spot Black/White 2008-01-01 20
...
As for your original column B, consider converting it to a boolean (BIT) if you only need to store two states. Barring that, consider CHAR to store a fixed number of characters.
Like most things, it depends.
By having the animal names directly in the table, it makes your reporting queries more efficient by removing the need for many joins.
Going with something like 3rd normal form (having an ID/Name table for the animals) makes you database smaller, but requires more joins for reporting.
Either way, make sure to add some indexes.