I tried to create a foreign key on one of my tables, referencing a column of a table in a different schema.
Something like that:
ALTER TABLE my_schema.my_table ADD (
CONSTRAINT my_fk
FOREIGN KEY (my_id)
REFERENCES other_schema.other_table(other_id)
)
Since I had the necessary grants, this worked fine.
Now I wonder if there are reasons for not referencing tables in a different schema, or anything to be careful about?
No problem doing this. Schemas really have no impact when establishing foreign key relationships between tables. Just make sure the appropriate people have the permissions necessary for the schemas you intend to use.
If you're in an organization where different people have authority over different schemas, I think it's good practice to give the other schema the ability to disable, or even drop and recreate, your constraint.
For example, they could need to drop or truncate their table and then reload it to handle some (very weird) support issue. Unless you want to get called in the middle of the night, I recommend giving them the ability to temporarily remove your constraint. (I also recommend setting your own alerts so that you'll know if any of your external constraints get disabled or dropped). When you're crossing organizational/schema lines, you want to play well with others. The index that Vincent mentioned is another part of that.
This will work exactly as a foreign key that references a table in its own schema.
As with regular foreign keys, don't forget to index my_id if the parent key is ever updated or if you delete entries from the parent table (unindexed foreign keys can be a source of massive contention and the index is usually useful anyway).
The only thing I ran into was making sure the permission existed on the other schema. The usual stuff - if those permission(s) disappear for whatever reason, you'll hear about it.
One reason this can cause problems is you need to be careful to delete things in the right order. This can be good or bad depending on how important it is to never have orphans in your tables.
Related
I have a fairly simple design, as follows:
What I want to achieve in my grouping_individual_history is marked in red:
when a session is deleted, I want to cascade delete the grouping_history....
when a grouping is deleted, I just want the child field to be nullified
It seems that MSSQL will not allow me to have more than one FK that does something else than no action ... It'll complain with:
Introducing FOREIGN KEY constraint 'FK_grouping_individual_history_grouping' on table 'grouping_individual_history' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
I've already read this post (https://www.mssqltips.com/sqlservertip/2733/solving-the-sql-server-multiple-cascade-path-issue-with-a-trigger/), although it's not quite the same scenario it seems to me.
I've tried doing a INSTEAD OF DELETE trigger on my grouping table, but it wont accept it, because in turn, my grouping table has another FK (fkSessionID) that does a cascade delete... So, the fix would be to change it all, in all affected tables with FKs. The chain is long though, and we cannot consider it.
For one thing, can someone explain to me why SQL Server is giving me the issue for this very simple scenario in the first place? I just don't understand it.
Is there another workaround I could use (besides just removing the foreign key link from my grouping_individual_history table)?
Is there a best practice in that is closest to one of these examples?
CREATE TABLE TABLE1
(
ID NUMBER(18) CONSTRAINT TABLE1_PK PRIMARY KEY,
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
or
CREATE TABLE TABLE1
(
ID NUMBER(18),
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
ALTER TABLE TABLE1 ADD CONSTRAINT TABLE1_PK
PRIMARY KEY (ID)
USING INDEX (CREATE UNIQUE INDEX IDX_TABLE1_PK ON TABLE1 (ID));
Is either scenario going to result in a better outcome in general? The first option is much more readable, but perhaps there are reasons why the latter is preferable.
Definitely personal preference. I prefer to do as much as I can in the single CREATE TABLE statement simply because I find it more concise. Most everything I need is described right there.
Sometimes that's not possible. Say you have two tables with references to each, or you want to load up a table with a bunch of data first, so you add the additional indexes after the table is loaded.
You'll find many tool that create schemas from DBs will separate them (mostly because it's always correct -- define all the tables, then define all of the relationships).
But personally, if practical, I find having it all in one place is best.
When building a deployment script that is eventually going to be run by someone else later on, I prefer splitting the scripts a fair bit. If something goes wrong, it's a bit easier to tell from the logs what exactly failed.
My table creation script will usually only have NOT NULL constraints. The PK, unique and FK constraints will be added afterwards.
This is a minor point though, and I don't have anything in particular against combining it all in one big CREATE TABLE statement.
You may find that your workplace already has a standard in place. e.g. my current client requires separate scripts for the CREATE TABLE, then more separate scripts for constraints, indexes, etc.
The exception, of course, is index-organized tables which must have a PK constraint declared upfront.
It's a personal preference to define any attributes or defaults for a field in the actual create statement. One thing I noticed is your second statement won't work since you haven't specified the id field is NOT NULL.
I guess it's a personal best practice for readability that I specify the table's primary key upfront.
Another thing to consider when creating the table is how you want items identified, uniquely or composite. ALTER TABLE is good for creating composite keys after the fact.
I'm creating a database with several sql files
1 file creates the tables.
1 file adds constraints.
1 file drops constraints.
The primary is a constraint however I've been told by someone to define your primary key in the table definition but not given a reason why.
Is it better to define the primary key as a constraint that can be added and dropped or is it better to do it in the table definition.
My current thinking is to do it in the table definition because doing it as a removable constraint could potentially lead to some horrible issues with duplicate keys.
But dropping constraints could lead to serious issues anyway so it is expected that if someone did drop the primary key, they would have taken appropriate steps to avoid problems as they should have for any other data entry
A primary key is a constraint, but a constraint is not necessarily a primary key. Short of doing some major database surgery, there should never be a need to drop a primary key, ever.
Defining the primary key along with the table is good practice - if you separate the table and the key definition, that opens the window to the key definition getting lost or forgotten. Given that any decent database design utterly depends on consistent keys, you don't ever want to have even the slightest chance that your primary keys aren't functioning properly.
From a maintainability perspective I would say that it is better to have the Primary Key in the table definition as it is a very good indicator of what the table will most likely be used for.
The other constraints are important as well though and your argument holds.
All of this is somewhat platform specific, but a primary key is a logical concept, whereas a constraint (or unique index, or whatever) is a physical thing that implements the logical concept of "primary key".
That's another reason to argue for putting it with the table itself - it's logical home - rather than the constraints file.
For effective source control it usually makes sense to have a separate script for each object (constraints included). That way you can track changes to each object individually.
There's a certain logical sense in keeping everything related to a table in one file--column definitions, keys, indexes, triggers, etc. If you never have to rebuild a very large database from SQL, that will work fine almost all the time. The few times it doesn't work well probably aren't worth changing the process of keeping all the related things together in one file.
But if you have to rebuild a very large database, or if you need to move a database onto a different server for testing, or if you just want to fiddle around with things, it makes sense to split things up. In PostgreSQL, we break things up like this. All these files are under version control.
All CREATE DOMAIN statements in one file.
Each CREATE TABLE statement in a separate file. That file includes all constraints except FOREIGN KEY constraints, expressed as ALTER TABLE statements. (More about this in a bit.)
Each table's FOREIGN KEY constraints in a separate file.
Each table's indexes for non-key columns in a separate file.
Each table's triggers in a separate file. (If a table has three triggers, all three go in one file.)
Each table's data in a separate file. (Only for tables loaded before bringing the database online.)
Each table's rules in a separate file.
Each function in a separate file. (Functions are PostgreSQL's equivalent to stored procedures.)
Without foreign key constraints, we can load tables in any order. After the tables are loaded, we can run a single script to rebuild all the foreign keys. The makefile takes care of bundling the right individual files together. (Since they're separate files, we can run them individually if we want to.)
Tables load faster if they don't have constraints. I said we put each CREATE TABLE statement in a separate file. The file includes all constraints except FOREIGN KEY constraints, expressed as ALTER TABLE statements. You can use the streaming editor sed to split those files into two pieces. One piece has the column definitions; the other piece has all the 'ALTER TABLE ADD CONSTRAINT' statements. The makefile takes care of splitting the source files and bundling them together--all the table definitions in one SQL file, and all the ALTER TABLE statements in another. Then we can run a single script to create all the tables, load the tables, then run a single script to rebuild all the constraints.
make is your friend.
If I don't need a primary key should I not add one to the database?
You do need a primary key. You just don't know that yet.
A primary key uniquely identifies a row in your table.
The fact it's indexed and/or clustered is a physical implementation issue and unrelated to the logical design.
You need one for the table to make sense.
If you don't need a primary key then don't use one. I usually have the need for primary keys, so I usually use them. If you have related tables you probably want primary and foreign keys.
Yes, but only in the same sense that it's okay not to use a seatbelt if you're not planning to be in an accident. That is, it's a small price to pay for a big benefit when you need it, and even if you think you don't need it odds are you will in the future. The difference is you're a lot more likely to need a primary key than to get in a car accident.
You should also know that some database systems create a primary key for you if you don't, so you're not saving that much in terms of what's going on in the engine.
No, unless you can find an example of, "This database would work so much better if table_x didn't have a primary key."
You can make an arguement to never use a primary key, if performance, data integrity, and normalization are not required. Security and backup/restore capabilities may not be needed, but eventually, you put on your big-boy pants and join the real world of database implementation.
Yes, a table should ALWAYS have a primary key... unless you don't need to uniquely identify the records in it. (I like to make absolute statements and immediately contradict them)
When would you not need to uniquely identify the records in a table? Almost never. I have done this before though for things like audit log tables. Data that won't be updated or deleted, and wont be constrained in any way. Essentially structured logging.
A primary key will always help with query performance. So if you ever need to query using the "key" to a "foreign key", or used as lookup then yes, craete a foreign key.
I don't know. I have used a couple tables where there is just a single row and a single column. Will always only be a single row and a single column. There is no foreign key relationships.
Why would I put a primary key on that?
A primary key is mainly formally defined to aid referencial Integrity, however if the table is very small, or is unlikely to contain unique data then it's an un-necessary overhead.
Defining indexes on the table can normally be used to imply a primary key without formally declaring one.
However you should consider that defining the Primary key can be useful for Developers and Schema generation or SQL Dev tools, as having the meta data helps understanding, and some tools rely on this to correctly define the Primary/foreign key relationships in the model.
Well...
Each table in a relational DB needs a primary key. As already noted, a primary key is data that identies a record uniquely...
You might get away with not having an "ID" field, if you have a N-M table that joins 2 different tables, but you can uniquely identifiy the record by the values from both columns you join. (Composite primary key)
Having a table without an primary key is against the first normal form, and has nothing to do in a relational DB
You should always have a primary key, even if it's just on ID. Maybe NoSQL is what you're after instead (just asking)?
That depends very much on how sure you can be that you don't need one. If you have just the slightest bit of doubt, add one - you'll thank yourself later. An indicator being if the data you store could be related to other data in your DB at one point.
One use case I can think of is a logging kind-of table, in which you simply dump one entry after the other (to properly process them later). You probably won't need a primary key there, if you're storing enough data to filter out the relevant messages (like a date). Of course, it's questionable to use a RDBMS for this.
What is the purpose of naming your constraints (unique, primary key, foreign key)?
Say I have a table which is using natural keys as a primary key:
CREATE TABLE Order
(
LoginName VARCHAR(50) NOT NULL,
ProductName VARCHAR(50) NOT NULL,
NumberOrdered INT NOT NULL,
OrderDateTime DATETIME NOT NULL,
PRIMARY KEY(LoginName, OrderDateTime)
);
What benefits (if any) does naming my PK bring?
Eg.
Replace:
PRIMARY KEY(LoginName, OrderDateTime)
With:
CONSTRAINT Order_PK PRIMARY KEY(LoginName, OrderDateTime)
Sorry if my data model is not the best, I'm new to this!
Here's some pretty basic reasons.
(1) If a query (insert, update, delete) violates a constraint, SQL will generate an error message that will contain the constraint name. If the constraint name is clear and descriptive, the error message will be easier to understand; if the constraint name is a random guid-based name, it's a lot less clear. Particulary for end-users, who will (ok, might) phone you up and ask what "FK__B__B_COL1__75435199" means.
(2) If a constraint needs to be modified in the future (yes, it happens), it's very hard to do if you don't know what it's named. (ALTER TABLE MyTable drop CONSTRAINT um...) And if you create more than one instance of the database "from scratch" and use system-generated default names, no two names will ever match.
(3) If the person who gets to support your code (aka a DBA) has to waste a lot of pointless time dealing with case (1) or case (2) at 3am on Sunday, they're quite probably in a position to identify where the code came from and be able to react accordingly.
To identify the constraint in the future (e.g. you want to drop it in the future), it should have a unique name. If you don't specify a name for it, the database engine will probably assign a weird name (e.g. containing random stuff to ensure uniqueness) for you.
It keeps the DBAs happy, so they let your schema definition into the production database.
When your code randomly violates some foreign key constraint, it sure as hell saves time on debugging to figure out which one it was. Naming them greatly simplifies debugging your inserts and your updates.
It helps someone to know quickly what constraints are doing without having to look at the actual constraint, as the name gives you all the info you need.
So, I know if it is a primary key, unique key or default key, as well as the table and possibly columns involved.
By correctly naming all constraints, You can quickly associate a particular constraint with our data model. This gives us two real advantages:
We can quickly identify and fix any errors.
We can reliably modify or drop constraints.
By naming the constraints you can differentiate violations of them. This is not only useful for admins and developers, but your program can also use the constraint names. This is much more robust than trying to parse the error message. By using constraint names your program can react differently depending on which constraint was violated.
Constraint names are also very useful to display appropriate error messages in the user’s language mentioning which field caused a constraint violation instead of just forwarding a cryptic error message from the database server to the user.
See my answer on how to do this with PostgreSQL and Java.
While the OP's example used a permanent table, just remember that named constraints on temp tables behave like named constraints on permanent tables (i.e. you can't have multiple sessions with the exact same code handling the temp table, without it generating an error because the constraints are named the same). Because named constraints must be unique, if you absolutely must name a constraint on a temp table try to do so with some sort of randomized GUID (like SELECT NEWID() ) on the end of it to ensure that it will uniquely-named across sessions.
Another good reason to name constraints is if you are using version control on your database schema. In this case, if you have to drop and re-create a constraint using the default database naming (in my case SQL Server) then you will see differences between your committed version and the working copy because it will have a newly generated name. Giving an explicit name to the constraint will avoid this being flagged as a change.