SQL: Where should the Primary Key be defined

SQL: Where should the Primary Key be defined - sql

I'm creating a database with several sql files
1 file creates the tables.
1 file adds constraints.
1 file drops constraints.
The primary is a constraint however I've been told by someone to define your primary key in the table definition but not given a reason why.
Is it better to define the primary key as a constraint that can be added and dropped or is it better to do it in the table definition.
My current thinking is to do it in the table definition because doing it as a removable constraint could potentially lead to some horrible issues with duplicate keys.
But dropping constraints could lead to serious issues anyway so it is expected that if someone did drop the primary key, they would have taken appropriate steps to avoid problems as they should have for any other data entry

A primary key is a constraint, but a constraint is not necessarily a primary key. Short of doing some major database surgery, there should never be a need to drop a primary key, ever.
Defining the primary key along with the table is good practice - if you separate the table and the key definition, that opens the window to the key definition getting lost or forgotten. Given that any decent database design utterly depends on consistent keys, you don't ever want to have even the slightest chance that your primary keys aren't functioning properly.

From a maintainability perspective I would say that it is better to have the Primary Key in the table definition as it is a very good indicator of what the table will most likely be used for.
The other constraints are important as well though and your argument holds.

All of this is somewhat platform specific, but a primary key is a logical concept, whereas a constraint (or unique index, or whatever) is a physical thing that implements the logical concept of "primary key".
That's another reason to argue for putting it with the table itself - it's logical home - rather than the constraints file.

For effective source control it usually makes sense to have a separate script for each object (constraints included). That way you can track changes to each object individually.

There's a certain logical sense in keeping everything related to a table in one file--column definitions, keys, indexes, triggers, etc. If you never have to rebuild a very large database from SQL, that will work fine almost all the time. The few times it doesn't work well probably aren't worth changing the process of keeping all the related things together in one file.
But if you have to rebuild a very large database, or if you need to move a database onto a different server for testing, or if you just want to fiddle around with things, it makes sense to split things up. In PostgreSQL, we break things up like this. All these files are under version control.
All CREATE DOMAIN statements in one file.
Each CREATE TABLE statement in a separate file. That file includes all constraints except FOREIGN KEY constraints, expressed as ALTER TABLE statements. (More about this in a bit.)
Each table's FOREIGN KEY constraints in a separate file.
Each table's indexes for non-key columns in a separate file.
Each table's triggers in a separate file. (If a table has three triggers, all three go in one file.)
Each table's data in a separate file. (Only for tables loaded before bringing the database online.)
Each table's rules in a separate file.
Each function in a separate file. (Functions are PostgreSQL's equivalent to stored procedures.)
Without foreign key constraints, we can load tables in any order. After the tables are loaded, we can run a single script to rebuild all the foreign keys. The makefile takes care of bundling the right individual files together. (Since they're separate files, we can run them individually if we want to.)
Tables load faster if they don't have constraints. I said we put each CREATE TABLE statement in a separate file. The file includes all constraints except FOREIGN KEY constraints, expressed as ALTER TABLE statements. You can use the streaming editor sed to split those files into two pieces. One piece has the column definitions; the other piece has all the 'ALTER TABLE ADD CONSTRAINT' statements. The makefile takes care of splitting the source files and bundling them together--all the table definitions in one SQL file, and all the ALTER TABLE statements in another. Then we can run a single script to create all the tables, load the tables, then run a single script to rebuild all the constraints.
make is your friend.

Related

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.

I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

What is the best practice DDL for creating Tables? Single statement with all objects or many individual statements creating and altering?

Is there a best practice in that is closest to one of these examples?
CREATE TABLE TABLE1
(
ID NUMBER(18) CONSTRAINT TABLE1_PK PRIMARY KEY,
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
or
CREATE TABLE TABLE1
(
ID NUMBER(18),
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
ALTER TABLE TABLE1 ADD CONSTRAINT TABLE1_PK
PRIMARY KEY (ID)
USING INDEX (CREATE UNIQUE INDEX IDX_TABLE1_PK ON TABLE1 (ID));
Is either scenario going to result in a better outcome in general? The first option is much more readable, but perhaps there are reasons why the latter is preferable.

Definitely personal preference. I prefer to do as much as I can in the single CREATE TABLE statement simply because I find it more concise. Most everything I need is described right there.
Sometimes that's not possible. Say you have two tables with references to each, or you want to load up a table with a bunch of data first, so you add the additional indexes after the table is loaded.
You'll find many tool that create schemas from DBs will separate them (mostly because it's always correct -- define all the tables, then define all of the relationships).
But personally, if practical, I find having it all in one place is best.

When building a deployment script that is eventually going to be run by someone else later on, I prefer splitting the scripts a fair bit. If something goes wrong, it's a bit easier to tell from the logs what exactly failed.
My table creation script will usually only have NOT NULL constraints. The PK, unique and FK constraints will be added afterwards.
This is a minor point though, and I don't have anything in particular against combining it all in one big CREATE TABLE statement.
You may find that your workplace already has a standard in place. e.g. my current client requires separate scripts for the CREATE TABLE, then more separate scripts for constraints, indexes, etc.
The exception, of course, is index-organized tables which must have a PK constraint declared upfront.

It's a personal preference to define any attributes or defaults for a field in the actual create statement. One thing I noticed is your second statement won't work since you haven't specified the id field is NOT NULL.
I guess it's a personal best practice for readability that I specify the table's primary key upfront.
Another thing to consider when creating the table is how you want items identified, uniquely or composite. ALTER TABLE is good for creating composite keys after the fact.

Setting the right foreign key on insert

Morning all,
I'm doing a lot of work to drag a database (SQL Server 2005, in 2000 compatibility mode) kicking and screaming towards having a sane design.
At the moment, all the tables' primary keys are nvarchar(32), and are set using uniqId() (oddly, this gets run through a special hashing function, no idea why)
So in several phases, I'm making some fundamental changes:
Introducing ID_int columns to each table, auto increment and primary key
Adding some extra indexing, removing unused indexes, dropping unused columns
This phase has worked well so far, test db seems a bit faster, total index sizes for each table are MUCH smaller.
My problem is with the next phase: foreign keys. I need to be able to set these INT foreign keys on insert in the other tables.
There are several applications pointing at this DB, only one of which I have much control over. It also contains many stored procs and triggers.
I can't physically make all the changes needed in one go.
So what I'd like to be able to do is add the integer FKs to each table and have them automatically set to the right thing on insert.
To illustrate this with an example:
Two tables, Call and POD, linked pod.Call_ID -> Call.Call_ID. This is an nvarchar(32) field.
I've altered call such that Call_ID_int is identity, auto increment, primary key. I need to add POD.Call_ID_int such that, on insert, it gets the right value from Call.Call_ID_int.
I'm sure I could do this with a BEFORE trigger, but I'd rather avoid this for maintenance and speed reasons.
I thought I could do this with a constraint, but after much research found I can't. I tried this:
alter table POD
add constraint
pf_callIdInt
default([dbo].[map_Call_ID_int](Call_ID))
for Call_ID_int
Where the map_Call_ID_int function takes the Call_ID and returns the right Call_ID_int, but I get this error:
The name "Call_ID" is not permitted in this context. Valid expressions
are constants, constant expressions, and (in some contexts) variables.
Column names are not permitted.
Any ideas how I can achieve this?
Thanks very much in advance!
-Oli

Triggers are the easiest way.
You'll have odd concurrency issues with defaults based on UDFs too (like you would for CHECK constraints).
Another trick is to use views to hide schema changes but still with triggers to intercept DML. So your "old" table no longer exists only as a view on "new" table. A write to the "old" table/view actually happens on the new table.

Changing a table's primary key column referenced by foreign key in other tables

In our DB (on SQL Server 2005) we have a "Customers" table, whose primary key is Client Code, a surrogate, bigint IDENTITY(1,1) key; the table is referenced by a number of other tables in our DB thru a foreign key.
A new CR implementation we are estimating would require us to change ID column type to varchar, Client Code generation algorithm being shifted from a simple numeric progression to a strict 2-char representation, with codes ranging from 01 to 99, then progressing like this:
1A -> 2A -> ... -> 9A -> 1B -> ... 9Z
I'm fairly new to database design, but I smell some serious problems here. First of all, what about this client code generation algorithm? What if I need a Client Code to go beyond 9Z code limit?
The I have some question: would this change be feasible, the table being already filled with a fair amount of data, and referenced by multiple entities? If so, how would you approach this problem, and how would you implement Client Code generation?

I would leave the primary key as it is and would create another key (unique) on the client code generated.
I would do that anyway. It's always better to have a short number primary key instead of long char keys.
In some situation you might prefer a GUID (for replication purposes) but a number int/bigint is alway preferable.
You can read more here and here.

My biggest concern with what you are proposing is that you will be limited to 360 primary records. That seems like a small number.
Performing the change is a multi-step operation. You need to create the new field in the core table and all its related tables.
To do an in-place update, you need to generate the code in the core table. Then you need to update all the related tables to have the code based on the old id. Then you need to add the foreign key constraint to all the related tables. Then you need to remove the old key field from all the related tables.
We only did that in our development server. When we upgraded the live databases, we created a new database for each and copied the data over using a python script that queried the old database and inserted into the new database. I now update that script for every software upgrade so the core engine stays the same, but I can specify different tables or data modifications. I get the bonus of having a complete backup of the original database if something unexpected happens when upgrading production.
One strong argument in favor of a non-identity/guid code is that you want a human readable/memorable code and you need to be able to move records between two systems.
Performance is not necessarily a concern in SQL Server 2005 and 2008. We recently went through a change where we moved from int ids everywhere to 7 or 8 character "friendly" record codes. We expected to see some kind of performance hit, but we in fact saw a performance improvement.
We also found that we needed a way to quickly generate a code. Our codes have two parts, a 3 character alpha prefix and a 4 or 5 digit suffix. Once we had a large number of codes (15000-20000) we were finding it to slow to parse the code into prefix and suffix and find the lowest unused code (it took several seconds). Because of this, we also store the prefix and the suffix separately (in the primary key table) so that we can quickly find the next available lowest code with a particular prefix. The cached prefix and suffix made the search almost fee.
We allow changing of the codes and they changed values propagate by cascade update rules on the foreign key relationship. We keep an identity key on the core code table to simplify the update of the code.
We don't use an ORM, so I don't know what specific things to be aware of with that. We also have on the order of 60,000 primary keys in our biggest instance, but have hundreds of tables related and tables with millions of related values to the code table.
One big advantage that we got was, in many cases, we did not need to do a join to perform operations. Everywhere in the software the user references things by friendly code. We don't have to do a lookup of the int ID (or a join) to perform certain operations.

The new code generation algorithm isn't worth thinking about. You can write a program to generate all possible codes in just a few lines of code. Put them in a table, and you're practically done. You just need to write a function to return the smallest one not yet used. Here's a Ruby program that will give you all the possible codes.
# test.rb -- generate a peculiar sequence of two-character codes.
i = 1
('A'..'Z').each do |c|
(1..9).each do |n|
printf("'%d%s', %d\n", n, c, i)
i += 1
end
end
The program will create a CSV file that you should be able to import easily into a table. You need two columns to control the sort order. The new values don't naturally sort the way your requirements specify.
I'd be more concerned about the range than the algorithm. If you're right about the requirement, you're limited to 234 client codes. If you're wrong, and the range extends from "1A" to "ZZ", you're limited to less than a thousand.
To implement this requirement in an existing table, you need to follow a careful procedure. I'd try it several times in a test environment before trying it on a production table. (This is just a sketch. There are a lot of details.)
Create and populate a two-column table to map
existing bigints to the new CHAR(2).
Create new CHAR(2) columns in all the
tables that need them.
Update all the new CHAR(2) columns.
Create new NOT NULL UNIQUE or PRIMARY KEY constraints and new FOREIGN KEY constraints on the new CHAR(2) columns.
Rewrite user interface code (?) to target the new columns. (Might not be necessary if you rename the new CHAR(2) and old BIGINT columns.)
Set a target date to drop the old BIGINT columns and constraints.
And so on.

Not really addressing whether this is a good idea or not, but you can change your foreign keys to cascade the updates. What will happen once you're done doing that is that when you update the primary key in the parent table, the corresponding key in the child table will be updated accordingly.

Foreign key reference to table in another schema

I tried to create a foreign key on one of my tables, referencing a column of a table in a different schema.
Something like that:
ALTER TABLE my_schema.my_table ADD (
CONSTRAINT my_fk
FOREIGN KEY (my_id)
REFERENCES other_schema.other_table(other_id)
)
Since I had the necessary grants, this worked fine.
Now I wonder if there are reasons for not referencing tables in a different schema, or anything to be careful about?

No problem doing this. Schemas really have no impact when establishing foreign key relationships between tables. Just make sure the appropriate people have the permissions necessary for the schemas you intend to use.

If you're in an organization where different people have authority over different schemas, I think it's good practice to give the other schema the ability to disable, or even drop and recreate, your constraint.
For example, they could need to drop or truncate their table and then reload it to handle some (very weird) support issue. Unless you want to get called in the middle of the night, I recommend giving them the ability to temporarily remove your constraint. (I also recommend setting your own alerts so that you'll know if any of your external constraints get disabled or dropped). When you're crossing organizational/schema lines, you want to play well with others. The index that Vincent mentioned is another part of that.

This will work exactly as a foreign key that references a table in its own schema.
As with regular foreign keys, don't forget to index my_id if the parent key is ever updated or if you delete entries from the parent table (unindexed foreign keys can be a source of massive contention and the index is usually useful anyway).

The only thing I ran into was making sure the permission existed on the other schema. The usual stuff - if those permission(s) disappear for whatever reason, you'll hear about it.

One reason this can cause problems is you need to be careful to delete things in the right order. This can be good or bad depending on how important it is to never have orphans in your tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas