How do I implement this multi-table database design/constraint, normalized? - sql

I have data that kinda looks like this...
Elements
Class | Synthetic ID (pk)
A | 2
A | 3
B | 4
B | 5
C | 6
C | 7
Elements_Xref
ID (pk) | Synthetic ID | Real ID (fk)
. | 2 | 77-8F <--- A class
. | 3 | 30-7D <--- A class
. | 6 | 21-2A <--- C class
. | 7 | 30-7D <--- C class
So I have these elements that are assigned synthetic IDs and are grouped into classes. But these synthetic IDs are then paired with Real IDs that we actually care about. There is also a constraint that a Real ID cannot recur in a single class. How can I capture all of this in one coherent design?
I don't want to jam the Real ID into the upper table because
It is nullable (there are periods where we don't know what the Real ID of something should be).
It's a foreign key to more data.
Obviously this could be done with triggers acting as constraints, but I'm wondering if this could be implemented with regular constraints/unique indexes. Using SQL Server 2005.
I've thought about having two main tables SyntheticByClass and RealByClass and then putting IDs of those tables into another xref/link table, but that still doesn't guarantee that the classes of both elements match. Also solvable via trigger.
Edit: This is keyword stuffing but I think it has to do with normalization.
Edit^2: As indicated in the comments below, I seem to have implied that foreign keys cannot be nullable. Which is false, they can! But what cannot be done is setting a unique index on fields where NULLs repeat. Although unique indexes support NULL values, they cannot constraint more than one NULL in a set. Since the Real ID assignment is initially sparse, multiple NULL Real IDs per class is more than likely.
Edit^3: Dropped the redundant Elements.ID column.
Edit^4: General observations. There seems to be three major approaches at work, one of which I already mentioned.
Triggers. Use a trigger as a constraint to break any data operations that would corrupt the integrity of the data.
Index a view that joins the tables. Fantastic, I had no idea you could do that with views and indexes.
Create a multi-column foreign key. Didn't think of doing this, didn't know it was possible. Add the Class field to the Xref table. Create a UNIQUE constraint on (Class + Real ID) and a foreign key constraint on (Class + Synthetic ID) back to the Elements table.

Comments from before the question was made into a 'bonus' question
What you'd like to be able to do is express that the join of Elements and Elements_Xref has a unique constraint on Class and Real ID. If you had a DBMS that supported SQL-92 ASSERTION constraints, you could do it.
AFAIK, no DBMS supports them, so you are stuck with using triggers.
It seems odd that the design does not constrain Real ID to be unique across classes; from the discussion, it seems that a given Real ID could be part of several different classes. Were the Real ID 'unique unless null', then you would be able to enforce the uniqueness more easily, if the DBMS supported the 'unique unless null' concept (most don't; I believe there is one that does, but I forget which it is).
Comments before edits made 2010-02-08
The question rules out 'jamming' the Real_ID in the upper table (Elements); it doesn't rule out including the Class in the lower table (Elements_Xref), which then allows you to create a unique index on Class and Real_ID in Elements_Xref, achieving (I believe) the required result.
It isn't clear from the sample data whether the synthetic ID in the Elements table is unique or whether it can repeat with different classes (or, indeed whether a synthetic ID can be repeated in a single class). Given that there seems to be an ID column (which presumably is unique) as well as the Synthetic ID column, it seems reasonable to suppose that sometimes the synthetic ID repeats - otherwise there are two unique columns in the table for no very good reason. For the most part, it doesn't matter - but it does affect the uniqueness constraint if the class is copied to the Elements_Xref table. One more possibility; maybe the Class is not needed in the Elements table at all; it should live only in the Elements_Xref table. We don't have enough information to tell whether this is a possibility.
Comments for changes made 2010-02-08
Now that the Elements table has the Synthetic ID as the primary key, things are somewhat easier. There's a comment that the 'Class' information actually is a 'month', but I'll try to ignore that.
In the Elements_Xref table, we have an unique ID column, and then a Synthetic ID (which is not marked as a foreign key to Elements, but presumably must actually be one), and the Real ID. We can see from the sample data that more than one Synthetic ID can map to a given Real ID. It is not clear why the Elements_Xref table has both the ID column and the Synthetic ID column.
We do not know whether a single Synthetic ID can only map to a single Real ID or whether it can map to several Real ID values.
Since the Synthetic ID is the primary key of Elements, we know that a single Synthetic ID corresponds to a single Class.
We don't know whether the mapping of Synthetic ID to Real ID varies over time (it might as Class is date-related), and whether the old state has to be remembered.
We can assume that the tables are reduced to the bare minimum and that there are other columns in each table, the contents of which are not directly material to the question.
The problem states that the Real ID is a foreign key to other data and can be NULL.
I can't see a perfectly non-redundant design that works.
I think that the Elements_Xref table should contain:
Synthetic ID
Class
Real ID
with (Synthetic ID, Class) as a 'foreign key' referencing Elements, and a NOT NULL constraint on Real ID, and a unique constraint on (Class, Real ID).
The Elements_Xref table only contains rows for which the Real ID is known - and correctly enforces the uniqueness constraint that is needed.
The weird bit is that the (Synthetic ID, Class) data in Elements_Xref must match the same columns in Elements, even though the Synthetic ID is the primary key of Elements.
In IBM Informix Dynamic Server, you can achieve this:
CREATE TABLE elements
(
class CHAR(1) NOT NULL,
synthetic_id SERIAL NOT NULL PRIMARY KEY,
UNIQUE(class, synthetic_id)
);
CREATE TABLE elements_xref
(
class CHAR(1) NOT NULL,
synthetic_id INTEGER NOT NULL REFERENCES elements(synthetic_id),
FOREIGN KEY (class, synthetic_id) REFERENCES elements(class, synthetic_id),
real_id CHAR(5) NOT NULL,
PRIMARY KEY (class, real_id)
);

I would:
Create a UNIQUE constraint on Elements(Synthetic ID, Class)
Add Class column to Elements_Xref
Add a FOREIGN KEY constraint on Elements_Xref table, referring to (Synthetic ID, Class)
At this point we know for sure that Elements_Xref.Class always matches Elements.Class.
Now we need to implement "unique when not null" logic. Follow the link and scroll to section "Use Computed Columns to Implement Complex Business Rules":
Indexes on Computed Columns: Speed Up Queries, Add Business Rules
Alternatively, you can create an indexed view on (Class, RealID) with WHERE RealID IS NOT NULL in its WHERE clause - that will also enforce "unique when not null" logic.

Create an indexed view for Elements_Xref with Where Real_Id Is Not Null and then create a unique index on that view
Create View Elements_Xref_View With SchemaBinding As
Select Elements.Class, Elements_Xref.Real_Id
From Elements_Xref
Inner Join Element On Elements.Synthetic_Id = Elements_Xref.Synthetic_Id
Where Real_Id Is Not Null
Go
Create Unique Clustered Index Elements_Xref_Unique_Index
On Elements_Xref_View (Class, Real_Id)
Go
This serves no other purpose other than simulating a unique index that treats nulls properly i.e. null != null

You can
Create a view from the the result set of joining Elements_Xref and Elements together on Synthetic ID
add a unique constraint on class, and [Real ID]. In other news, this is also how you do functional indexes in MSSQL, by indexing views.
Here is some sql:
CREATE VIEW unique_const_view AS
SELECT e.[Synthetic ID], e.Class, x.[Real ID]
FROM Elements AS e
JOIN [Elements_Xref] AS x
ON e.[Synthetic ID] = x.[Synthetic ID]
CREATE UNIQUE INDEX unique_const_view_index ON unique_const_view ( Class, [Real ID] );
Now, apparently, unbeknownst to myself this solution doesn't work in Microsoft-land-place because with MS SQL Server duplicate nulls will violate a UNIQUE constraint: this is against the SQL spec. This is where the problem is discussed about.
This is the Microsoft workaround:
create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;
Not sure if that is 2005, or just 2008.

I think a trigger is your best option. Constraints can't cross to other tables to get information. Same thing with a unique index (although I suppose a materialized view with an index might be possible), they are unique within the table. When you put the trigger together, remember to do it in a set-based fashion not row-by-row and test with a multi-row insert and multi-row update where the real key is repeated in the dataset.

I don't think either of your two reasons are an obstacle to putting Real ID in Elements. If a given element has 0 or 1 Real IDs (but never more than 1), it should absolutely be in the Elements table. This would then allow you to constrain uniqueness within Class (I think).
Could you expand on your two reasons not to do this?

Create a new table real_elements with fields Real ID, Class and Synthetic ID with a primary key of Class, RealId and add elements when you actually add a RealID
This constrains Real IDs to be unique for a class and gives you a way to match a class and real ID to the synthetic ID
As for Real ID being a foreign key do you mean that if it is in two classes then the data keyed off it will be the same. If so the add another table with key Real Id. This key is then a foreign key into real_elements and any other table needing real ID as foreign key

Related

Which column for foreign key: id or any other column and why?

TL;DR
Should a foreign key always refer to the id column of another table? Why or why not? Is there a standard rule for this?
Is there a cost associated with using any other unique column other than id column for foreign key? Performance / storage? How significant? Is it frowned in the industry?
Example: this is the schema for my sample problem:
In my schema sometimes I use id column as the foreign key and sometimes use some other data column.
In vehicle_detail table I use a unique size column as a foreign key from vehicle_size table and unique color column as the foreign key from vehicle_color table.
But in vehicle_user I used the user_identifier_id as a foreign key which refers to id primary key column in user_identifier table.
Which is the correct way?
On a side note, I don't have id columns for the garage_level, garage_spaceid, vehicle_garage_status and vehicle_parking_status tables because they only have one column which is the primary key and the data they store is just at most 15 rows in each table and it is probably never going to change. Should I still have an id column in those ?
A foreign key has to target a primary key or unique constraint. It is normal to reference the primary key, because you typically want to reference an individual row in another table, and the primary key is the identifier of a table row.
From a technical point of view, it does not matter whether a foreign key references the primary key or another unique constraint, because in PostgreSQL both are implemented in the same way, using a unique index.
As to your concrete examples, there is nothing wrong with having the unique size column of vehicle_size be the target of a foreign key, although it begs the question why you didn't make size the primary key and omit the id column altogether. There is no need for each table to have an id column that is the automatically generated numeric primary key, except that there may be ORMs and other software that expect that.
A foreign key is basically a column of a different table(it is always of a different table, since that is the role it serves). It is used to join/ get data from a different table. Think of it like say school is a database and there are many different table for different aspects of student.
say by using Admission number 1234, from accounts table you can get the fees and sports table you can get the sports he play.
Now there is no rule that foreign key should be id column, you can keep it whatever you want. But,to use foreign key you should have a matching column in both tables therefore usually id column is only used. As I stated in the above example the only common thing in say sports table and accounts table would be admission number.
admn_no | sports |
+---------+------------+
| 1234 | basketball
+---------+---------+
| admn_no | fees |
+---------+---------+
| 1234 | 1000000 |
+---------+---------+
Now say using the query\
select * from accounts join sports using (admn_no);
you will get:
+---------+---------+------------+
| admn_no | fees | sports |
+---------+---------+------------+
| 1234 | 1000000 | basketball |
+---------+---------+------------+
PS: sorry for bad formatting
A foreign key is a field or a column that is used to establish a link between two tables. A FOREIGN KEY is a column (or collection of columns) in one table, that refers to the PRIMARY KEY in another table.
There is no rule that it should refer to a id column but the column it refers to should be the primary key. In real scenarios, it usually refers to Id column as in most cases it is the primary key in the tables.
OP question is about "correct way".
I will try to provide some kind of summary from existing comments and answers, general DO and general DONT for FKs.
What was already said
A. "A foreign key has to target a primary key or unique constraint"
Literally from Laurenz Albe answer and it was noted in comments
B. "stick with whatever you think will change the least"
It was noted by Adrian Klavier in comments.
Notes
There is no such general rule that PK or unique constraint must be defined on a single column.
So the question title itself must be corrected: "Which column(s) for foreign key: id or any other column(s) and why?"
Let's talk about "why".
Why: General DO, general DONT and an advice
Is there a cost associated with using any other unique column other than id column for foreign key? Performance / storage? How significant? Is it frowned in the industry?
General DO: Analyze requirements, use logic, use math (arithmetics is enough usually). There is no a single database design that's always good for all cases. Always ask yourself: "Can it be improved?". Never be content with design of existing FKs, if requirements changed or DBMS changed or storage options changed - revise design.
General DONT: Don't think that there is a single correct rule for all cases. Don't think: "if that worked in that database/table than it will work for this case too".
Let me illustrate this points with a common example.
Example: PK on id uuid field
We look into database and see a table has a unique constraint on two fields of types integer (4 bytes) + date (4 bytes)
Additionally: this table has a field id of uuid type (16 bytes)
PK is defined on id
All FKs from other tables are targeting id field
It this a correct design or not?
Case A. Common case - not OK
Let's use math:
unique constraint on int+date: it's 4+4=8 bytes
data is never changed
so it's a good candidate for primary key in this table
and nothing prevents to use it for foreign keys in related tables
So it looks like additional 16 bytes per each row + indexes costs is a mistake.
And that's a very common mistake especially in combination of MSSQL + CLUSTERED indexes on random uuids
Is it always a mistake?
No.
Consider latter cases.
Case B. Distributed system - OK
Suppose that you have a distributed system:
ServerA, ServerB, ServerC are sources of data
HeadServer - is data aggregator
data on serverA-ServerC could be duplicated: the same record could exists on several instances
aggregated data must not have duplicates
data for related tables can come from different instances: data for table with PK from serverA and data for tables with FKs from serverB-serverC
you need to log from where each record is originated
In such case existence of PK on id uuid is justified:
unique constraint allows to deduplicate records
surrogate key allows related data come from different sources
Case C. 'id' is used to expose data through API - OK
Suppose that you have an API to access data for external consumers.
There is a good unique constraint on:
client_id: incrementing integer in range 1..100000
invoice_date: dates '20100101'..'20210901'
And a surrogate key on id with random uuids.
You can create external API in forms:
/server/invoice/{client_id}/{invoice_date}
/server/invoice/{id}
From security POV /{id} is superior by reasons:
it's impossible to deduce from one uuid value existence of other
it's easier to implement authorization system for entities of different types. E.g. entityA has natural key on int, entityB on bigint' and entityC on int+ byte+date`
In such case surrogate key not only justified but becames essential.
Afterword
I hope that I was clear in explanation of main correct principle: "There is no such thing as a universal correct principle".
An additional advice: avoid CASCADE UPDATE/DELETEs:
Although it depends on DBMS you use.
But in general :
"explicit is better than implicit"
CASCADEs rarely works as intended
when CASCADES works - usually they have performance problems
Thank you for your attention.
I hope this helps somebody.

Is it fine to have one table with foreign keys to many other tables?

Let's say that I have 30 tables in my database which all have a one-to-many relationship to a Notations table:
What is another way to set up the Notations table (other than containing a foreign key column for each of its "parents") that would be less taxing to the query's performance? The table is expected to grow to be very large.
(As each Notation contains a single parent_table_id with 29 null values for the remaining parent_tables - that is 29 nulls to track for every row, it seems like a lot)
ie.
SELECT text, table1_id, table2_id, ..., table30_id FROM Notations would show:
id text table1_id table2_id // table30_id
0 "buzz" 74 null // null
1 "foo" null 45 // null
2 "bar" 22 null // null
3 "fizz" 22 null // null
4 "hello" 28 null // null
5 "world" null null // 3
...etc
You are correct that storing 30 columns where one is not NULL is inefficient -- those NULL values typically occupy space in each row.
Assuming all the ids are the same type, you can simplify the structure to use:
table_name
table_id
Just two columns. Unfortunately, I am not aware of any database that allows "conditional" foreign key relationships. Although I advocate defining foreign key relationships this might be a case where you choose not to have them.
That is not fully satisfying. Although you can ensure some integrity using triggers, this lacks the "cascading" features of foreign keys. You can implement that with more triggers. Yuck!
One alternative is a separate notations table for each table:
notations_table1
text, id
notations_table2
text, id
. . .
You can then bring these together using union all:
create view notations as (
select text, id, 'table1' as table_name from notaions_table1 union all
select text, id, 'table2' as table_name from notaions_table2 union all
. . .
The underlying tables can then have properly formatted foreign key constraints -- and even cascading ones. Unfortunately, this lacks a single column as an id. The "real" id is a combination of table_name and id.
Some databases have other mechanisms that might be able to help. For instance, Postgres supports a form of inheritance that might be useful in this context.
You shouldn't have a notations table
The point of a primary-foreign key pair is that the value of the foreign key is the same as the value of the primary. A notations table does not make sense. Normalisation is an extremely important aspects of databases. We want to reduce redundant and replicated data. We don't want repetition! A notations table would introduce redundant data. In this case redundant data can refer to data we can calculate with existing data. All tables to be normalised must only contain data that relates to itself and not be calculable.
We can calculate each link due to check if foreign key field are filled and the key exists in the related table. Hence these links do not suit being in a separate table. As you mention this would be an enormous table and that's for the very reason it is only holding redundant data. There is no need for a notations table. Rather if you need to find links then just write a query to check for matches between primary and foreign keys. That is the reason why databases use foreign keys.
A notations table is only appropriate if we did not have any foreign keys. We would then require a table to tell us how each record relates to another record in a different table. But since you do have foreign keys then this table is causing repeated and redundant data.

How to reference a composite primary key into a single field?

I got this composite primary key in Table 1:
Table 1: Applicant
CreationDate PK
FamilyId PK
MemberId PK
I need to create a foreign key in Table 2 to reference this composite key. But i do not want to create three fields in Table 2 but to concatenate them in a single field.
Table 2: Sales
SalesId int,
ApplicantId -- This should be "CreationDate-FamilyId-MemberId"
What are the possible ways to achieve this ?
Note: I know i can create another field in Table 1 with the three columns concatenation but then i will have redundant info
What you're asking for is tantamount to saying "I want to treat three pieces of information as one piece of information without explicitly making it one piece of information". Which is to say that it's not possible.
That said, there are ways to make happen what you want to happen
Create a surrogate key (i.e. identity column) and use that as the FK reference
Create a computed column that is the concatenation of the three columns and use that as the FK reference
All else being equal (ease of implementation, politics, etc), I'd prefer the first. What you have is really a natural key and doesn't make a good PK if it's going to be referenced externally. Which isn't to say that you can't enforce uniqueness with a unique key; you can and should.

SQL sub-types with overlapping child tables

Consider the problem above where the 'CommonChild' entity can be a child of either sub-type A or B, but not C. How would I go about designing the physical model in a relational [SQL] database?
Ideally, the solution would allow...
for an identifying relationship between CommonChild and it's related sub-type.
a 1:N relationship.
Possible Solutions
Add an additional sub-type to the super-type and move sub-type A and B under the new sub-type. The CommonChild can then have a FK constraint on the newly created sub-type. Works for the above, but not if an additional entity is added which can have a relationship with sub-type A and C, but not B.
Add a FK constraint between the CommonChild and SuperType. Use a trigger or check constraint (w/ UDF) against the super-type's discriminator before allowing a new tuple into CommonChild. Seems straight forward, but now CommonChild almost seems like new subtype itself (which it is not).
My model is fundamentally flawed. Remodel and the problem should go away.
I'm looking for other possible solutions or confirmation of one of the above solutions I've already proposed.
Thanks!
EDIT
I'm going to implement the exclusive foreign key solution provided by Branko Dimitrijevic (see accepted answer).
I am going to make a slight modifications in this case as:
the super-type, sub-type, and "CommonChild" all have the same PKs and;
the PKs are 3 column composites.
The modification is to to create an intermediate table whose sole role is to enforce the exclusive FK constraint between the sub-types and the "CommonChild" (exact model provided by Dimitrijevic minus the "CommonChild's" attributes.). The CommonChild's PK will have a normal FK constraint to the intermediate table.
This will prevent the "CommonChild" from having 2 sets of 3 column composite FKs. Plus, since the identifying relationship is maintained from super-type to "CommonChild", [read] queries can effectively ignore the intermediate table altogether.
Looks like you need a variation of exclusive foreign keys:
CREATE TABLE CommonChild (
Id AS COALESCE(SubTypeAId, SubTypeBId) PERSISTED PRIMARY KEY,
SubTypeAId int REFERENCES SubTypeA (SuperId),
SubTypeBId int REFERENCES SubTypeB (SuperId),
Attr6 varchar,
CHECK (
(SubTypeAId IS NOT NULL AND SubTypeBId IS NULL)
OR (SubTypeAId IS NULL AND SubTypeBId IS NOT NULL)
)
);
There are couple of thing to note here:
There are two NULL-able FOREIGN KEYs.
There is a CHECK that allows exactly one of these FKs be non-NULL.
There is a computed column Id which equals one of the FKs (whichever is currently non-NULL) which is also a PRIMARY KEY. This ensures that:
One parent cannot have multiple children.
A "grandchild" table can reference the CommonChild.Id directly from its FK. The SuperType.Id is effectively popagated all the way down.
We don't have to mess with NULL-able UNIQUE constraints, which are problematic in MS SQL Server (see below).
A DBMS-agnostic way of of doing something similar would be...
CREATE TABLE CommonChild (
Id int PRIMARY KEY,
SubTypeAId int UNIQUE REFERENCES SubTypeA (SuperId),
SubTypeBId int UNIQUE REFERENCES SubTypeB (SuperId),
Attr6 varchar,
CHECK (
(SubTypeAId IS NOT NULL AND SubTypeAId = Id AND SubTypeBId IS NULL)
OR (SubTypeAId IS NULL AND SubTypeBId IS NOT NULL AND SubTypeBId = Id)
)
)
Unfortunately a UNIQUE column containing more than one NULL is not allowed by MS SQL Server, which is not the case in most DBMSes. However, you can just omit the UNIQUE constraint if you don't want to reference SubTypeAId or SubTypeBId directly.
Wondering what am I missing here?
Admittedly, it is hard without having the wording of the specific problem, but things do feel a bit upside-down.

ORACLE Table design: M:N table best practice

I'd like to hear your suggestions on this very basic question:
Imagine these three tables:
--DROP TABLE a_to_b;
--DROP TABLE a;
--DROP TABLE b;
CREATE TABLE A
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT A_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE B
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT B_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE A_TO_B
(
id NUMBER NOT NULL,
a_id NUMBER NOT NULL,
b_id NUMBER NOT NULL,
somevalue1 VARCHAR2(20) NOT NULL,
somevalue2 VARCHAR2(20) NOT NULL,
somevalue3 VARCHAR2(20) NOT NULL
) ;
How would you design table a_to_b?
I'll give some discussion starters:
synthetic id-PK column or combined a_id,b_id-PK (dropping the "id" column)
When synthetic: What other indices/constraints?
When combined: Also index on b_id? Or even b_id,a_id (don't think so)?
Also combined when these entries are referenced themselves?
Also combined when these entries perhaps are referenced themselves in the future?
Heap or Index-organized table
Always or only up to x "somevalue"-columns?
I know that the decision for one of the designs is closely related to the question how the table will be used (read/write ratio, density, etc.), but perhaps we get a 20/80 solution as blueprint for future readers.
I'm looking forward to your ideas!
Blama
I have always made the PK be the combination of the two FKs, a_id and b_id in your example. Adding a synthetic id field to this table does no good, since you never end up looking for a row based on a knowledge of its id.
Using the compound PK gives you a constraint that prevents the same instance of the relationship between a and b from being inserted twice. If duplicate entries need to be permitted, there's something wrong with your data model at the conceptual level.
The index you get behind the scenes (for every DBMS I know of) will be useful to speed up common joins. An extra index on b_id is sometimes useful, depending on the kinds of joins you do frequently.
Just as a side note, I don't use the name "id" for all my synthetic pk columns. I prefer a_id, b_id. It makes it easier to manage the metadata, even though it's a little extra typing.
CREATE TABLE A_TO_B
(
a_id NUMBER NOT NULL REFERENCES A (a_id),
b_id NUMBER NOT NULL REFERENCES B (b_id),
PRIMARY KEY (a_id, b_id),
...
) ;
It's not unusual for ORMs to require (or, in more clueful ORMs, hope for) an integer column named "id" in addition to whatever other keys you have. Apart from that, there's no need for it. An id number like that makes the table wider (which usually degrades I/O performance just slightly), and adds an index that is, strictly speaking, unnecessary. It isn't necessary to identify the entity--the existing key does that--and it leads new developers into bad habits. (Specifically, giving every table an integer column named "id", and believing that that column alone is the only key you need.)
You're likely to need one or more of these indexed.
a_id
b_id
{a_id, b_id}
{b_id, a_id}
I believe Oracle should automatically index {a_id, b_id}, because that's the primary key. Oracle doesn't automatically index foreign keys. Oracle's indexing guidelines are online.
In general, you need to think carefully about whether you need ON UPDATE CASCADE or ON DELETE CASCADE. In Oracle, you only need to think carefully about whether you need ON DELETE CASCADE. (Oracle doesn't support ON UPDATE CASCADE.)
the other comments so far are good.
also consider adding begin_dt and end_dt to the relationship. in this way, you can manage a good number of questions about each relationship through time. (consider baseline issues)