SQL Server FK same table - sql

I'm thinking of adding a relationship table to a database and I'd like to include a sort of reverse relation functionality by using a FK pointing to a PK within the same table. For example, Say I have table RELATIONSHIP with the following:
ID (PK) Relation ReverseID (FK)
1 Parent 2
2 Child 1
3 Grandparent 4
4 Grandchild 3
5 Sibling 5
First, is this even possible? Second, is this a good way to go about this? If not, what are your suggestions?

1) It is possible.
2) It may not be as desirable in your case as you might want - you have cycles, as opposed to an acyclic structure - because of this if your FK is in place you cannot insert any of those rows as they are. One possibility is that after allowing NULLs in your ReverseID column in your table DDL, you would have to INSERT all the rows with NULL ReverseID and then doing an UPDATE to set the ReverseID columns which will now have valid rows to reference. Another possibility is to disable the foregin key or don't create it until the data is in a completely valid state and then apply it.
3) You would have to do an operation like this almost every time, and if EVERY relationship has an inverse you either wouldn't be able to enforce NOT NULL in the schema or you would regularly be disabling and re-enabling constraints.
4) The sibling situation is the same.
I would be fine using the design if this is controlled in some way and you understand the implications.

Related

Inserting Data into Tables and Integrity

This is my first post, so please excuse me for any obvious or simple questions as I am very new to programming and all my projects are a first to me.
I am currently working on my first database project. A relational database using Oracle sql. I'm new on my course, so I am not sure on all the concepts yet, but working at it.
I have used some modelling software to help me construct a 13 table database. I have setup all my columns and assigned primary and foreign keys to all 13 tables. What I am looking to do now is insert 10 rows of test data into each table. I have done the parent tables but am confused about the child tables. When I assign ID numbers to all the parent tables primary keys, will the child tables foreign keys be populated at the same time?
I have not used sequences yet as I'm not 100% how to make them work, but instead inputted my own values like 100, 101, 102 etc. I know those values need to be in the foreign key, but wouldn't manually inserting them into many tables get confusing?
Is there an easier approach to this or am I over complicating the process?
I will need to use some queries later but I just want to be happy that the data is sound.
Thanks for your help
Rob
No, the child table data won't be populated automatically-- if there is a child table, that implies that there is a 0 or 1 to m relationship between the two. One row in the parent table may have 0 rows in the child table or it may have dozens so nothing could possibly be populated automatically.
If you are manually assigning primary key values, you'd need to hard code those same values as the foreign key values when you insert data into the child tables. In the real world, you wouldn't manually insert data into many tables at once, you'd have an application that did so and that knew what keys to use based on parameters passed in or by getting the currval of the sequence used to populate the primary key after inserting into the parent table.
Its necessary that data for foreign key should be present in parent table, but not the other way around.
If you want to create test data, i suggest you use something like below query.
insert into child_table(fk_column,column1,column2....)
select pk_column,'#dummy_value1#','#dummy_value2#',..
from parent_table
if you have 10 rows in parent, this will add 10 rows in child.
If you want more rows, e.g. 100 for each parent value you need to duplicate the parent data. for that use below query.
insert into child_table(fk_column,column1,column2....)
select pk_column,'#dummy_value1#','#dummy_value2#',..
from parent_table
join (select level from dual connect by level<10)
this will add 100 child values for 10 parent values..

Is it OK to have 2 FKs on a table, which point to different tables, and one of which will only be used?

Lets assume I have 2 tables:
Order -< OrderItem
I have another table with 2 FKs:
Feature
- Id
- FkOrderId
- FkOrderItemId
- Text
UPDATE
This table is linked to another called FeatureReason which is common to both types of record, be they OrderFeatures or OrderItem features.
Feature -< FeatureReason
If I had 2 feature tables to account for both types of records, would this then require 2 FeatureReason tables. Same issue here with the FeatureReason table needing to have 2 FKs, each pointing to a different master table.
An Order can have a Feature record, as can an OrderItem. Therefore either "FkOrderId" OR FkOrderItemId would be populated. Is this fine to do?
I would also seriously think about using Views to to insert/edit and read either OrderFeatures or OrderItemFeatures.
Thoughts appreciated.
I would recommend using following structure, because if you have 2 foreign keys which either of them can be null, you can have rows with both columns being null or having value.
Added the FeatureReason table too
You can do this, but why? What is your reasoning for collating these two distinct items in a single table?
I would suggest having two separate tables, OrderFeatures and OrderItemFeatures, and on those occasions that you need to query both, collate them with a union query.
It is possible to have 2 foreign keys in one table. As long as the foreign key is mapping with the primary key on another table, it's OK
By not populating FkOrderItemId or FkOrderId, will you not be violating one or other of the FK constraints?
You can populate FkOrderItemId or FkOrderId according to your needs, I'm just not sure about defining an FK where it is not mandatory to supply a FK value.
Just a thought...

SQL Server foreign key to multiple tables

I have the following database schema:
members_company1(id, name, ...);
members_company2(id, name, ...);
profiles(memberid, membertypeid, ...);
membertypes(id, name, ...)
[
{ id : 1, name : 'company1', ... },
{ id : 2, name : 'company2', ... }
];
So each profile belongs to a certain member either from company1 or company2 depending on membertypeid value
members_company1 ————————— members_company2
———————————————— ————————————————
id ——————————> memberid <——————————— id
name membertypeid name
/|\
|
|
profiles |
—————————— |
memberid ————————+
membertypeid
I am wondering if it's possible to create a foreign key in profiles table for referential integrity based on memberid and membertypeid pair to reference either members_company1 or members_company2 table records?
A foreign key can only reference one table, as stated in the documentation (emphasis mine):
A foreign key (FK) is a column or combination of columns that is used
to establish and enforce a link between the data in two tables.
But if you want to start cleaning things up you could create a members table as #KevinCrowell suggested, populate it from the two members_company tables and replace them with views. You can use INSTEAD OF triggers on the views to 'redirect' updates to the new table. This is still some work, but it would be one way to fix your data model without breaking existing applications (if it's feasible in your situation, of course)
Operating under the fact that you can't change the table structure:
Option 1
How important is referential integrity to you? Are you only doing inner joins between these tables? If you don't have to worry too much about it, then don't worry about it.
Option 2
Ok, you probably have to do something about this. Maybe you do have inner joins only, but you have to deal with data in profiles that doesn't relate to anything in the members tables. Could you create a job that runs once per day or week to clean it out?
Option 3
Yeah, that one may not work either. You could create a trigger on the profiles table that checks the reference to the members tables. This is far from ideal, but it does guarantee instantaneous checks.
My Opinion
I would go with option 2. You're obviously dealing with a less-than-ideal schema. Why make this worse than it has to be. Let the bad data sit for a week; clean the table every weekend.
No. A foreign key can reference one and only one primary key and there is no way to spread primary keys across tables. The kind of logic you hope to achieve will require use of a trigger or restructuring your database so that all members are based off a core record in a single table.
Come on you can create a table but you cannot modify members_company1 nor members_company2?
Your idea of a create a members table will require more actions when new records are inserted into members_company tables.
So you can create triggers on members_company1 and members_company2 - that is not modify?
What are the constraints to what you can do?
If you just need compatibility on selects to members_company1 and members_company2 then create a real members table and create views for members_company1 and members_company2.
A basic select does not know it is a view or a table on the other end.
CREATE VIEW dbo.members_company1
AS
SELECT id, name
FROM members
where companyID = 1
You could possible even handle insert, updates, and deletes with instead-of
INSTEAD OF INSERT Triggers
A foreign key cannot reference two tables. Assuming you don't want to correct your design by merging members_company1 and members_company2 tables, the best approach would be to:
Add two columns called member_company1_id and member_company2_id to your profiles table and create two foreign keys to the two tables and allow nulls. Then you could add a constraint to ensure 1 of the columns is null and the other is not, at all times.

Uniqueness in many-to-many

I couldn't figure out what terms to google, so help tagging this question or just pointing me in the way of a related question would be helpful.
I believe that I have a typical many-to-many relationship:
CREATE TABLE groups (
id integer PRIMARY KEY);
CREATE TABLE elements (
id integer PRIMARY KEY);
CREATE TABLE groups_elements (
groups_id integer REFERENCES groups,
elements_id integer REFERENCES elements,
PRIMARY KEY (groups_id, elements_id));
I want to have a constraint that there can only be one groups_id for a given set of elements_ids.
For example, the following is valid:
groups_id | elements_id
1 | 1
1 | 2
2 | 2
2 | 3
The following is not valid, because then groups 1 and 2 would be equivalent.
groups_id | elements_id
1 | 1
1 | 2
2 | 2
2 | 1
Not every subset of elements must have a group (this is not the power set), but new subsets may be formed. I suspect that my design is incorrect since I'm really talking about adding a group as a single entity.
How can I create identifiers for subsets of elements without risk of duplicating subsets?
That is an interesting problem.
One solution, albeit a klunky one, would be to store a concatenation of groups_id and elements_id in the groups table: 1-1-2 and make it a unique index.
Trying to do a search for duplicate groups before inserting a new row, would be an enormous performance hit.
The following query would spit out offending group ids:
with group_elements_arr as (
select groups_id, array_agg(elements_id order by elements_id) elements
from group_elements
group by groups_id )
select elements, count(*), array_agg(groups_id) offending_groups
from group_elements_arr
group by elements
having count(*) > 1;
Depending on the size of group_elements and its change rate you might get away with stuffing something along this lines into a trigger watching group_elements. If that's not fast enough you can materialize group_elements_arr into a real table managed by triggers.
And I think, the trigger should be FOR EACH STATEMENT and INITIALLY DEFERRED for easy building up a new group.
This link from user ypercube was most helpful: unique constraint on a set. In short, a bit of what everyone is saying is correct.
It's a question of tradeoffs, but here are the best options:
a) Add a hash or some other combination of element values to the groups table and make it unique, then populate the groups_elements table off of it using triggers. Pros of this method are that it preserves querying ability and enforces the constraint so long as you deny naked updates to groups_elements. Cons are that it adds complexity and you've now introduced logic like "how do you uniquely represent a set of elements" into your database.
b) Leave the tables as-is and control the access to groups_elements with your access layer, be it a stored procedure or otherwise. This has the advantage of preserving querying ability and keeps the database itself simple. However, it means that you are moving an analytic constraint into your access layer, which necessarily means that your access layer will need to be more complex. Another point is that it separates what the data should be from the data itself, which has both pros and cons. If you need faster access to whether or not a set already exists, you can attack that problem separately.

How to Delete all data from a table which contain self referencing foreign key

I have a table which has employee relationship defined within itself.
i.e.
EmpID Name SeniorId
-----------------------
1 A NULL
2 B 1
3 C 1
4 D 3
and so on...
Where Senior ID is a foreign key whose primary key table is same with refrence column EmpId
I want to clear all rows from this table without removing any constraint. How can i do this?
Deletion need to be performed like this
4, 3 , 2 , 1
How can I do this
EDIT:
Jhonny's Answer is working for me but which of the answers are more efficient.
I don't know if I am missing something, but maybe you can try this.
UPDATE employee SET SeniorID = NULL
DELETE FROM employee
If the table is very large (cardinality of millions), and there is no need to log the DELETE transactions, dropping the constraint and TRUNCATEing and recreating constraints is by far the most efficient way. Also, if there are foreign keys in other tables (and in this particular table design it would seem to be so), those rows will all have to be deleted first in all cases, as well.
Normalization says nothing about recursive/hierarchical/tree relationships, so I believe that is a red herring in your reply to DVK's suggestion to split this into its own table - it certainly is viable to make a vertical partition of this table already and also to consider whether you can take advantage of that to get any of the other benefits I list below. As DVK alludes to, in this particular design, I have often seen a separate link table to record self-relationships and other kinds of relationships. This has numerous benefits:
have many to many up AND down instead of many-to-one (uncommon, but potentially useful)
track different types of direct relationships - manager, mentor, assistant, payroll approver, expense approver, technical report-to - with rows in the relationship and relationship type tables instead of new columns in the employee table
track changing hierarchies in a temporally consistent way (including terminated employee hierarchy history) by including active indicators and effective dates on the relationship rows - this is only fully possible when normalizing the relationship into its own table
no NULLs in the SeniorID (actually on either ID) - this is a distinct advantage in avoiding bad logic, but NULLs will usually appear in views when you have to left join to the relationship table anyway
a better dedicated indexing strategy - as opposed to adding SeniorID to selected indexes you already have on Employee (especially as the number of relationship types grows)
And of course, the more information you relate to this relationship, the more strongly is indicated that the relationship itself merits a table (i.e. it is a "relation" in the true sense of the word as used in relational databases - related data is stored in a relation or table - related to a primary key), and thus a normal form for relationships might strongly indicate that the relationship table be created instead of a simple foreign key relationship in the employee table.
Benefits also include its straightforward delete scenario:
DELETE FROM EmployeeRelationships;
DELETE FROM Employee;
You'll note a striking equivalence to the accepted answer here on SO, since, in your case, employees with no senior relationship have a NULL - so in that answer the poster set all to NULL first to eliminate relationships and then remove the employees.
There is a possibly appropriate usage of TRUNCATE depending upon constraints (EmpployeeRelationships is typically able to be TRUNCATEd since its primary key is usually a composite and not a foreign key in any other table).
Try this
DELETE FROM employee;
Inside a loop, run a command that deletes all rows with an unreferenced EmpID until there are zero rows left. There are a variety of ways to write that inner DELETE command:
DELETE FROM employee WHERE EmpID NOT IN (SELECT SeniorID FROM employee)
DELETE FROM employee e1 WHERE NOT EXISTS
(SELECT * FROM employee e2 WHERE e2.SeniorID = e.EmpID
and probably a third one using a JOIN, but I'm not familiar with the SQL Server syntax for that.
One solution is to normalize this by splitting out "senior" relationship into a separate table. For the sake of generality, make that second table "empID1|empID2|relationship_type".
Barring that, you need to do this in a loop. One way is to do it:
declare #count int
select #count=count(1) from table
while (#count > 0)
BEGIN
delete employee WHERE NOT EXISTS
(select 1 from employee 'e_senior'
where employee.EmpID=e_senior.SeniorID)
select #count=count(1) from table
END