Multiple-reference foreign keys in table definition? - sql

Summary
How do I make it easy for non-programmers to write a query such as the following?
select
table_name.*
, foreign_table_1.name
, foreign_table_2.name
from
table_name
left outer join foreign_table foreign_table_1 on foreign_table_1.id = foreign_1_id
left outer join foreign_table foreign_table_2 on foreign_table_2.id = foreign_1_id
;
Context
I have a situation like this:
create table table_name (
id integer primary key autoincrement not null
, foreign_key_1_id integer not null
, foreign_key_2_id integer not null
, some_other_column varchar(255) null
);
create table foreign_table (
id integer primary key autoincrement not null
, name varchar(255) null
);
...in which both foreign_key_1_id and foreign_key_2_id reference foreign_table. (Obviously, this is simplified and abstracted.) To query and get the respective values, I might do something like this:
select
table_name.*
, foreign_table_1.name
, foreign_table_2.name
from
table_name
left outer join foreign_table foreign_table_1 on foreign_table_1.id = foreign_1_id
left outer join foreign_table foreign_table_2 on foreign_table_2.id = foreign_1_id
;
(That is, alias foreign_table in the join to link things up correctly.) This works fine. However, some of my clients want to use SQL Maestro to query the tables. This program uses the foreign key information to link up tables using a fairly straightforward interface ("Visual Query Builder"). For instance, the user can pick multiple tables and SQL Maestro will fill in the joins, like seen here:
(source: sqlmaestro.com)
(That's a diagram from their website, just for illustration.)
This strategy works great as long as foreign keys only reference one table. The multiple-reference situation seems to be confusing it because it gneerates SQL like this:
SELECT
table_name.some_other_column,
foreign_table.name
FROM
table_name
INNER JOIN foreign_table ON (table_name.foreign_key_1_id = foreign_table.id)
AND (table_name.foreign_key_2_id = foreign_table.id)
...because the foreign keys are defined as follows:
create table table_name (
id integer primary key autoincrement not null
, foreign_key_1_id integer not null
, foreign_key_2_id integer not null
, some_other_column varchar(255) null
---------------------------
-- The part that changed:
---------------------------
, foreign key (foreign_key_1_id) references foreign_table(id)
, foreign key (foreign_key_2_id) references foreign_table(id)
);
create table foreign_table (
id integer primary key autoincrement not null
, name varchar(255) not null
);
That's a problem because you only get 1 foreign_table.name value back, whereas there are often 2 separate values.
Question
How would I go about defining foreign keys to handle this situation? (Is it possible or does it make sense to do so? I wouldn't think that it would make a big difference in constraint checking, so I've thought that that is a reason I can't find any information.) My end goal is to make querying this information easy for my clients to do by themselves, and although this situation doesn't happen every day, it's time-consuming / frustrating to have to help people through it every time it comes up.
If there isn't a way of solving my foreign key problem this way, can you suggest any alternatives? I already have some ways of getting people this information though views, but people often need to have more flexibility than that.

It seems to me that your definition is just fine, and SQL Maestro is incorrectly interpreting two foreign keys to the same table as the same foreign key, so I would alert them to that fact so that they can fix it.

Related

Inner join performance

I have a table that has a lot of foreign keys that I will need to inner join so I can search. There might be upwards of 10 of them, meaning I'd have to do 10 inner joins. Each of the tables being joined may only be a few rows, compared to the massive (millions of rows) table that I am joining them with.
I just need to know if the joins are a fast way (using only Postgres) to do this, or if there might be a more clever way I can do it using subqueries or something.
Here is some made up data as an example:
create table a (
a_id serial primary key,
name character varying(32)
);
create table b (
b_id serial primary key,
name character varying(32)
);
--just 2 tables for simplicity, but i really need like 10 or more
create table big_table (
big_id serial primary key,
a_id int references a(a_id),
b_id int references b(b_id)
);
--filter big_table based on the name column of a and b
--big_table only contains fks to a and b, so in this example im using
--left joins so i can compare by the name column
select big_id,a.name,b.name from big_table
left join a using (a_id)
left join b using (b_id)
where (? is null or a.name=?) and (? is null or b.name=?);
Basically, joins are a fast way. Which way might be the fastest depends on the exact requirements. A couple of hints:
The purpose of your WHERE clause is unclear. It seems you intend to join to all look-up tables and include a condition for each, while you only actually need some of them. That's inefficient. Rather use dynamic-sql and only include in the query what you actually need.
With the current query, since all of your fk columns in the main table can be NULL, you must use LEFT JOIN instead of JOIN or you will exclude rows with NULL values in the fk columns.
The name columns in the look-up tables should certainly be defined NOT NULL. And I would not use the non-descriptive column name "name", that's an unhelpful naming convention. I also would use text instead of varchar(32).

SQL query, search data and return project name

Hey I'm looking for some help in creating a stored procedure.
Here are the details
I have a table called Partners which holds the partner information (Columns, PartnerID and partnername) I also have another table called ProjectPartners which holds the link between the project and the partners columns( PPID, Partner1, partner2, partner3....partner25) and I have a further table called ProjectDetails which holds the information on the project columns( ProjectDID, Project) The foreign key for projectpartners is within Projectdetails.
I'm looking to create a stored procedure that allows me to enter a partner name, this then displays the projects they are included within. I already have some mock code but it doesn't seem to work.
#partnername nvarchar(50)
AS
SET NOCOUNT ON;
SELECT ProjectDID, Project
FROM Projectdetails
WHERE Partners.PartnerName = #partnername
Any help will be much appreciated
You are missing the joins through your table schema to get the necessary data.
Take a read of this MSDN article about joins.
select ProjectDetails.ProjectDID, ProjectDetails.Project
from ProjectDetails
join ProjectPartners on ProjectPartners.ProjectDID = ProjectDetails.ProjectDID
join Partners on Partners.PartnerId = ProjectPartners.PPID
where Partners.PartnerName = #partnerName
You haven't described the relationship between ProjectPartners and Partner, so I am assuming that the PPID column on ProjectPartners is the relationship
You have also mentioned that your ProjectPartners table has the columns PPID, Partner1, partner2, partner3....partner25. Are you only planning on having 25 partners. If you have 26 will you add a new column? You might want to address that.
Also in column naming conventions, some are a bit muddled.
You have PPID on ProjectPartners. I presume this means ProjectPartnersId.
On the table ProjectDetails you have the column ProjectDID.
This is slightly inconsistent. I guess it should either be PDID on ProjectDetails or ProjectPID on ProjectPartners
Personally, I have always had always had a preference for plain old Id as my Identity column.
UPDATE:
Based on your comments below, it sounds like you might have something a little fundamental wrong with your tables:
create table Partners (
Id int not null primary key identity,
PartnerName nvarchar(100) not null)
go
create table ProjectDetails(
Id int not null primary key identity,
Project nvarchar(100) not null)
go
create table ProjectPartners (
PartnersId int not null,
ProjectDetailsId int not null
)
go
alter table ProjectPartners add constraint FK_ProjectPartners_PartnersId_Partners_Id foreign key (PartnersId) references Partners(Id)
alter table ProjectPartners add constraint FK_ProjectPartners_ProjectDetailsId_ProjectDetails_Id foreign key (ProjectDetailsId) references ProjectDetails(Id)
go
I would suggest changing your database schema to one that is a bit more flexible as per the one provided above.
This will prevent the ever growing ProjectPartners table by adding a new column each time you have a new partner.
It will fix all issues with your foreign keys and make your tables a bit more intuitive.
This would now yield the SQL:
select ProjectDetails.Project, ProjectDetails.Id
from ProjectDetails
join ProjectPartners on ProjectPartners.ProjectDetailsId = ProjectDetails.Id
join Partners on Partners.Id = ProjectPartners.PartnersId
where Partners.PartnerName= #partnerName

SQL Relation and Query

I am trying to create a database that contains two tables. I have included the create_tables.sql code if this helps. I am trying to set the relationship to make the STKEY the defining key so that a query can be used to search for thr key and show what issues this student has been having. At the moment when I search using:
SELECT *
FROM student, student_log
WHERE 'tilbun' like student.stkey
It shows all the issues in the table regardless of the STKEY. I think I may have the foreign key set incorrectly. I have included the create_tables.sql here.
CREATE TABLE `student`
(
`STKEY` VARCHAR(10),
`first_name` VARCHAR(15),
`surname` VARCHAR(15),
`year_group` VARCHAR(4),
PRIMARY KEY (STKEY)
)
;
CREATE TABLE `student_log`
(
`issue_number` int NOT NULL AUTO_INCREMENT,
`STKEY` VARCHAR(10),
`date_field` DATETIME,
`issue` VARCHAR(150),
PRIMARY KEY (issue_number),
INDEX (STKEY),
FOREIGN KEY (STKEY) REFERENCES student (STKEY)
)
;
Cheers for the help.
Though you have correctly defined the foreign key relationship in the tables, you must still specify a join condition when performing the query. Otherwise, you'll get a cartesian product of the two tables (all rows of one times all rows of the other)
SELECT
student.*,
student_log.*
FROM student INNER JOIN student_log ON student.STKEY = student_log.STKEY
WHERE student.STKEY LIKE 'tilbun'
And note that rather than using an implicit join (comma-separated list of tables), I have used an explicit INNER JOIN, which is the preferred modern syntax.
Finally, there's little use to using a LIKE clause instead of = unless you also use wildcard characters
WHERE student.STKEY LIKE '%tilbun%'

Should you use single table inheritance or multiple tables that are union-ed in a view?

Let's say you have a notes table. The note can be about a particular account, orderline or order.
Notes that are about the account do not apply to any specific orderline or order.
Notes that are about the orderline also apply to the parent order and the account that is attached to the order.
Notes that are on the order also apply to the attached account but not the orderline.
NOTES table
[Id] [int] IDENTITY(1,1) NOT NULL
[NoteTypeId] [smallint] NOT NULL
[AccountId] [int] NULL
[OrderId] [int] NULL
[OrderLineId] [int] NULL,
[Note] [varchar](300) NOT NULL
The idea is that if I view a client I can see all notes that are in someway related. Initially I created a notes table for each of the above and union-ed them in a view.
The problem here comes with editing/deleting a record. Notes can be edited/deleted on the particular item or in a generic notes view of the account or order. This method made that more difficult.
Then I switched to the Single Table Inheritance pattern. My notes table has nullable values for AccountId, OrderId and OrderLineId. I also added the NoteTypeId to identify the record explicitly. Much easier to manage update/delete scenarios.
I have some problems & questions still with this approach.
Integrity - Although complex constraints can be set in SQL and/or in code, most DBAs would not like the STI approach.
The idea of bunch of nulls is debated (although I believe performance in SQL 2008 has improved based on the storage of null values)
A table in a RDBMS does not have to represent an object in code. Normalization in a table doesn't say that the table has to be a unique object. I believe the two previous sentences to be true, what say you?
Discussed some here.
Is an overuse of nullable columns in a database a "code smell"? I'd have to say I agree with Ian but I'd like some opposite views as well.
Although complex constraints can be set in SQL and/or in code, most DBAs would not like the STI approach.
Because you need additional logic (CHECK constraint or trigger) to implement the business rule that a note refers to only one of the entities - account, order, orderline.
It's more scalable to implement a many-to-many table between each entity and the note table.
There's no need for an ALTER TABLE statement to add yet another nullable foreign key (there is a column limit, not that most are likely to reach it)
A single note record can be associated with multiple entities
No impact to existing records if a new entity & many-to-many table is added
It seems the STI would work OK in your case?. If I read your requirements correctly, the entity inheritance would be a chain:
Note <- AccountNote(AccountId) <- AccountAndOrderNote(OrderId) <-AccountAndOrderAndOrderLineNote (OrderLineId)
Integrity:
Surely not an issue? Each of AccountId, OrderId and OrderLineId can be FK'd to their respective tables (or be NULL)
If on the other hand, if you removed AccountId, OrderId and OrderLineId (I'm NOT recommending BTW!) and instead just ObjectId and NoteTypeId, then you couldn't add RI and would have a really messy CASE WHEN type Join.
Performance:
Since you say that AccountId must always be present, I guess it could be non-null, and since OrderLine cannot exist without Order, an Index of (AccountId, OrderId) or (AccountId, OrderId, OrderLineId) seems to make sense (Depending on selectability vs narrowness tradeoffs on the average #OrderLines per Order)
But OMG Ponies is right about messy ALTER TABLEs to extend this to new note types, and the indexing will cause headaches if new notes aren't Account-derived.
HTH
Initially I [created a] separate notes
table for each of the above and
union-ed them in a view.
This makes me wonder if you've considered using multi-table structure without NULLable columns where each note gets a unique ID regardless of type. You could present the data in the 'single table inheritance' (or similar) in a query without using UNION.
Below is a suggested structure. I've changed NoteTypeId to a VARCHAR to make the different types clearer and easier to read (you didn't enumerate the INTEGER values anyhow):
CREATE TABLE Notes
(
Id INTEGER IDENTITY(1,1) NOT NULL UNIQUE,
NoteType VARCHAR(11) NOT NULL
CHECK (NoteType IN ('Account', 'Order', 'Order line')),
Note VARCHAR(300) NOT NULL,
UNIQUE (Id, NoteType)
);
CREATE TABLE AccountNotes
(
Id INTEGER NOT NULL UNIQUE,
NoteType VARCHAR(11)
DEFAULT 'Account'
NOT NULL
CHECK (NoteType = 'account'),
FOREIGN KEY (Id, NoteType)
REFERENCES Notes (Id, NoteType)
ON DELETE CASCADE,
AccountId INTEGER NOT NULL
REFERENCES Accounts (AccountId)
);
CREATE TABLE OrderNotes
(
Id INTEGER NOT NULL UNIQUE,
NoteType VARCHAR(11)
DEFAULT 'Order'
NOT NULL
CHECK (NoteType = 'Order'),
FOREIGN KEY (Id, NoteType)
REFERENCES Notes (Id, NoteType)
ON DELETE CASCADE,
OrderId INTEGER NOT NULL
REFERENCES Orders (OrderId)
);
CREATE TABLE OrderLineNotes
(
Id INTEGER NOT NULL UNIQUE,
NoteType VARCHAR(11)
DEFAULT 'Order line'
NOT NULL
CHECK (NoteType = 'Order line'),
FOREIGN KEY (Id, NoteType)
REFERENCES Notes (Id, NoteType)
ON DELETE CASCADE,
OrderLineId INTEGER NOT NULL
REFERENCES OrderLines (OrderLineId)
);
To present the data in the 'single table inheritance' structure (i.e. all JOINs and no UNIONs):
SELECT N1.Id, N1.NoteType, N1.Note,
AN1.AccountId,
ON1.OrderId,
OLN1.OrderLineId
FROM Notes AS N1
LEFT OUTER JOIN AccountNotes AS AN1
ON N1.Id = AN1.Id
LEFT OUTER JOIN OrderNotes AS ON1
ON N1.Id = ON1.Id
LEFT OUTER JOIN OrderLineNotes AS OLN1
ON N1.Id = OLN1.Id;
Consider that the above structure has full data integrity constraints. To do the same using the 'single table inheritance' structure would require many more CHECK constraints with many, many conditions for nullable columns e.g.
CHECK (
(
AccountId IS NOT NULL
AND OrderId IS NULL
AND OrderLineId IS NULL
)
OR
(
AccountId IS NULL
AND OrderId IS NOT NULL
AND OrderLineId IS NULL
)
OR
(
AccountId IS NULL
AND OrderId IS NULL
AND OrderLineId IS NOT NULL
)
);
CHECK (
(
NoteType = 'Account'
AND AccountId IS NOT NULL
)
OR
(
NoteType = 'Order'
AND OrderId IS NOT NULL
)
OR
(
NoteType = 'Order line'
AND OrdereLineId IS NOT NULL
)
);
etc etc
I'd wager that most application developers using 'single table inheritance' would not be bothered to create these data integrity constraints if it occurred to them to do so at all (that's not meant to sound rude, just a difference in priorities to us who care more about the 'back end' than the 'front end' :)

Foreign Key Relationships and "belongs to many"

edit - Based on the responses below, I'm going to revisit my design. I think I can avoid this mess by being a little bit more clever with how I set out my business objects and rules. Thanks everyone for your help!
--
I have the following model:
S belongs to T
T has many S
A,B,C,D,E (etc) have 1 T each, so the T should belong to each of A,B,C,D,E (etc)
At first I set up my foreign keys so that in A, fk_a_t would be the foreign key on A.t to T(id), in B it'd be fk_b_t, etc. Everything looks fine in my UML (using MySQLWorkBench), but generating the yii models results in it thinking that T has many A,B,C,D (etc) which to me is the reverse.
It sounds to me like either I need to have A_T, B_T, C_T (etc) tables, but this would be a pain as there are a lot of tables that have this relationship. I've also googled that the better way to do this would be some sort of behavior, such that A,B,C,D (etc) can behave as a T, but I'm not clear on exactly how to do this (I will continue to google more on this)
EDIT - to clarify, a T can only belong to one of A, or B, or C, (etc) and not two A's, nor an A and a B (that is, it is not a many to many). My question is in regards to how to describe this relationship in the Yii Framework models - eg, (A,B,C,D,...) HAS_ONE T , and T belongs to (A,B,C,D,...). From a business use case, this all makes sense, but I'm not sure if I have it correctly set up in the database, or if I do, that I need to use a "behavior" in Yii to make it understand the relationship. #rwmnau I understand what you mean, I hope my clarification helps.
UML:
Here's the DDL (auto generated). Just pretend that there is more than 3 tables referencing T.
-- -----------------------------------------------------
-- Table `mydb`.`T`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`T` (
`id` INT NOT NULL AUTO_INCREMENT ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `mydb`.`S`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`S` (
`id` INT NOT NULL AUTO_INCREMENT ,
`thing` VARCHAR(45) NULL ,
`t` INT NOT NULL ,
PRIMARY KEY (`id`) ,
INDEX `fk_S_T` (`id` ASC) ,
CONSTRAINT `fk_S_T`
FOREIGN KEY (`id` )
REFERENCES `mydb`.`T` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `mydb`.`A`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`A` (
`id` INT NOT NULL AUTO_INCREMENT ,
`T` INT NOT NULL ,
`stuff` VARCHAR(45) NULL ,
`bar` VARCHAR(45) NULL ,
`foo` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) ,
INDEX `fk_A_T` (`T` ASC) ,
CONSTRAINT `fk_A_T`
FOREIGN KEY (`T` )
REFERENCES `mydb`.`T` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `mydb`.`B`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`B` (
`id` INT NOT NULL AUTO_INCREMENT ,
`T` INT NOT NULL ,
`stuff2` VARCHAR(45) NULL ,
`foobar` VARCHAR(45) NULL ,
`other` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) ,
INDEX `fk_A_T` (`T` ASC) ,
CONSTRAINT `fk_A_T`
FOREIGN KEY (`T` )
REFERENCES `mydb`.`T` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `mydb`.`C`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `mydb`.`C` (
`id` INT NOT NULL AUTO_INCREMENT ,
`T` INT NOT NULL ,
`stuff3` VARCHAR(45) NULL ,
`foobar2` VARCHAR(45) NULL ,
`other4` VARCHAR(45) NULL ,
PRIMARY KEY (`id`) ,
INDEX `fk_A_T` (`T` ASC) ,
CONSTRAINT `fk_A_T`
FOREIGN KEY (`T` )
REFERENCES `mydb`.`T` (`id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
Your problem is in part that you have no way to distinguish which of the tables it is in relation to.
Further if you can only have one record that matches any of three or four other tables, this is not a normal relationship and cannot be modelled using normal techniques. A trigger can ensure this is true but with only the column of id in it what prevents it from matching an id in table A of 10 and anid in table C of 10 (violating the rules).
BTW naming columns ID is usually a poor choice for maintenance. It is much clearer what is going on if you name the column with table name for PKs and use the exact name of the Pk for FKs.
An alternative solution for you is to have in the middle table a column for each type of id and a trigger to ensure that only one of them has values, but this is a pain to query if you need all the ids. A compound PK of id and idtype could work to ensure no repeats within a type, but to have no repeats at all, you will need a trigger.
This is a dilemma that comes up fairly regularly, and there is no perfect solution IMHO.
However I would recommend the following:
Combine the S and T table. I don't see any real need for the T table.
Invert the way the A/B/C tables relate to the S (formerly T) table. By this I mean remove the FK on the A/B/C side and create nullable FK columns on the S side. So now your S table has three additional nullable columns: A_ID, B_ID, C_ID.
Create a check constraint on the S table, ensuring that exactly one of these columns always has a value (or none of them has a value if that is allowed).
If having exactly one value is the rule, you can also create a unique constraint across these three columns to ensure that only one S can be related to an A/B/C.
If no value in any of these columns is allowed, the above rule will have to be enforced with a check constraint as well.
Update After Your Comment
Ok, then I would forget about inverting the relationships, and keep the FK on the A/B/C side. I would still enforce the uniqueness of usage using a check constraint, but it would need to cross tables and will likely look different for each flavor of SQL (e.g. SQL Server requires a UDF to go across tables in a check constraint). I still think you can nuke the T table.
Regarding the ORM side of things, I don't know yii at all, so can't speak to that. But if you enforce the relationship at the database level, how you implement it via code shouldn't matter, as the database is responsible for the integrity of the data (they will just look like vanilla relationships to the ORM). However, it may present a problem with trapping the specific error that comes up if at runtime the check constraint's rule is violated.
I should also mention that if there is a large (or even reasonably large) amount of data going into the tables in question, the approach I am recommending might not be the best, as your check constraint will have to check all 20 tables to enforce the rule.
You only require a table in the middle if it's a many-to-many relationship, and it doesn't sound like that's the case, so don't worry about those.
Your question isn't clear - can a T belong to more than 1 A, more than 1 B, and so on? Or does a single T belong to each of A-E, and to no others? It's the difference between a 1-to-1 relationship (each T has exactly one each of A-E) and a 1-to-many relationship (each A-E has exactly 1 T, but a T can belong to many A, many B, and so on). Does this make sense?
Also, I'd second the request for some more info in your question to help solidify what you're asking for.
I have to face a similar situation some weeks ago (not my own db, I prefer to combine all the tables into one, unless in very specific situations).
The solution I implemented was: In "T" model file I did something like this at relations() function:
'id_1' => array(self::BELONGS_TO, 'A', 'id'),
'id_2' => array(self::BELONGS_TO, 'B', 'id'),
'id_3' => array(self::BELONGS_TO, 'C', 'id'),
I hope this helps you.
Regards.