SQL Server database design with foreign keys - sql

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.

I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

Related

A different way to Model a portion of this ERD

I have a very simple table diagram from modeling my application. The problem is I am second guessing my relation between Vendor and VendorOrder. The VendorOrders table should store all vendororders in the system. To get all orders for a certain apartment, you would just use the PK and FK relationship to gather that data. Is there anything I should improve with the overall design?
Diagram:
There's three things I see that you could improve this by doing.
Create an intersection table between your Apartment and Resident tables called ApartmentResidents, where each table references the intersection table with a one to many relationship. In this ERD, it only allows for one resident to be registered to an apartment. If a resident lives in more than one apartment for the lifetime of this database, you'll need to register them as an entirely new resident.
Intersection table example
In your Vendor table, instead of using a name as your primary key I would create an id instead. Using things that have a real-world value as your primary key can get messy for a number of reasons:
If two vendors have the same name, like "Johnson's Repair", you'll need to misspell one of them for it to be a valid key.
If you typo a vendor's name, you're also going to contain a reference to that typo in the foreign key tables (Which also might make it not show in results if you do a select query for the correct spelling).
Placing an index on a string is less performant than if you put it on an auto incrementing integer key.
(Optional) I usually name my database tables pluralized, like "Apartments", or "Vendors". It makes the SQL syntax read more like a sentence inside the query. If I remember right that's also one of the things that SQL's creator was going for too with the syntax design.

What could be the purpose of primary keys on all tables being derived from a single table?

I've really been scratching my head over this and don't know how to ask the question well enough to find an answer on Google or StackOverflow etc.
There is a very old system used at work - I don't have access to the server side so can't view its tables, but I do know its an SQL database and have done enough experimenting with the API to see what adding to each table does, and I'm questioning how it allocates primary keys;
It has a lot of tables, each with a primary key as expected, but the primary key on any/all of its tables seems to be allocated so that there is absolutely no duplication of primary keys anywhere in the system.
e.g.
add row to table 1 get pk = 1
add row to table 2 get pk = 2
add row to table 1 again, get pk = 3
add row to table 10 and get pk = 4
Is this method some sort of old database technique?
What could be the purpose of doing this?
There are more funny nuances that I won't get into detail of, e.g. a certain range of pk's being allocated for certain tables but I just wanted to see if anyone recognises the main principle here and if there's a point to it, or if it's just bad / weird design
A primary key only needs to be unique within a single table. There is no such thing as a primary key across multiple tables.
This might be useful under some circumstances. For instance, this would allow all entities to be represented in a single table. This can be handy for "generic" information, such as adding comments to the entities.
More prosaically, I have seen this in older Oracle databases. Oracle did not have any automated mechanism for generating ids, so this required using a sequence. As a matter of convenience, laziness, or design, multiple tables might use the same sequence -- resulting in the behavior that you see.

Table referenced by other tables having different PKs

I would like to create a table called "NOTES". I was thinking this table would contain a "table_name" VARCHAR(100) which indicates what table put in the note, a "key" or multiple "key" columns representing the primary key values of the table that this note applies to and a "note" field VARCHAR(MAX). When other tables use this table they would supply THEIR primary key(s) and their "table_name" and get all the notes associated with the primary key(s) they supplied. The problem is that other tables might have 1, 2 or more PKs so I am looking for ideas on how I can design this...
What you're suggesting sounds a little convoluted to me. I would suggest something like this.
Notes
------
Id - PK
NoteTypeId - FK to NoteTypes.Id
NoteContent
NoteTypes
----------
Id - PK
Description - This could replace the "table_name" column you suggested
SomeOtherTable
--------------
Id - PK
...
Other Columns
...
NoteId - FK to Notes.Id
This would allow you to keep your data better normalized, but still get the relationships between data that you want. Note that this assumes a 1:1 relationship between rows in your other tables and Notes. If that relationship will be many to one, you'll need a cross table.
Have a look at this thread about database normalization
What is Normalisation (or Normalization)?
Additionally, you can check this resource to learn more about foreign keys
http://www.w3schools.com/sql/sql_foreignkey.asp
Instead of putting the other table name's and primary key's in this table, have the primary key of the NOTES table be NoteId. Create an FK in each other table that will make a note, and store the corresponding NoteId's in the other tables. Then you can simply join on NoteId from all of these other tables to the NOTES table.
As I understand your problem, you're attempting to "abstract" the auditing of multiple tables in a way that you might abstract a class in OOP.
While it's a great OOP design principle, it falls flat in databases for multiple reasons. Perhaps the largest single reason is that if you cannot envision it, neither will someone (even you) looking at it later have an easy time reassembling the data. Smaller that that though, is that while you tend to think of a table as a container and thus similar to an object, in reality they are implemented instances of this hypothetical container you are trying to put together and operate better if you treat them as such. By creating an audit table specific to a table or a subset of tables that share structural similarity and data similarity, you increase the performance of your database and you won't run in to strange trigger or select related issues later.
And you can't envision it not because you're not good at what you're doing, but rather, the structure is not conducive to database logging.
Instead, I would recommend that you create separate logging tables that manage the auditing of each table you want to audit or log. In fact, some fast google searches show many scripts already written to do much of this tasking for you: Example of one such search
You should create these individual tables and then if you want to be able to report on multiple table or even all tables at once, you can create a stored procedure (if you want to make queries based on criterion) or a view with an included SELECT statement that JOINs and/or UNIONs the tables you are interested in - in a fashion that makes sense to the report type. You'll still have to write new objects in to the view, but even with your original table design, you'd have to account for that.

Trying to verify understanding of foreign keys SQL Server

So I'm working on just a learning project to expose myself to doing some things I do not get to do at work. I'm just making a simple bug and case tracking app (I know there are a million this is just to work with some tools I don't get to). So I was designing my database and realized I've never actually used Foreign Keys before in any of my projects, I've used them before but never actually setting up a column as a FK. So I've designed my database as follows, which I think is close to correct (at least for the initial layout).
Database Image http://drop.io/download/public/uurp1vxej0abpwu7jsee/895663dedff577359b900cda1726f115b24be90c/Asset/28368622/v3/large_thumbnail
However When I try to add the FK's to the linking Tables I get an error saying, "The tables present in the relationship must have the same number of columns". I'm doing this by in SQLSMS by going to the Keys 'folder' and adding a FK. Is there something that I am doing wrong here, I don't understand why the tables would have to have the same number of columns for me to add a FK relationship between the tables?
When defining a foreign key relationship there are four things you need to specify:
The referring table (foreign key table)
The columns from the referring table that will form the foreign key.
The referred table (primary key table)
The columns from the referred table that form the key.
I'm guessing that you have selected one of the above incorrectly - probably you have forgotten that you need to state the column involved in the relationship twice - once for each table.
Here's a screenshot of how it should look - note that the column is specified twice. In my example the columns have slightly different names to demonstrate that they don't always need to have the same name, but in your case the column names will be the same.

ID fields in SQL tables: rule or law?

Just a quick database design question: Do you ALWAYS use an ID field in EVERY table, or just most of them? Clearly most of your tables will benefit, but are there ever tables that you might not want to use an ID field?
For example, I want to add the ability to add tags to objects in another table (foo). So I've got a table FooTag with a varchar field to hold the tag, and a fooID field to refer to the row in foo. Do I really need to create a clustered index around an essentially arbitrary ID field? Wouldn't it be more efficient to use fooID and my text field as the clustered index, since I will almost always be searching by fooID anyway? Plus using my text in the clustered index would keep the data sorted, making sorting easier when I have to query my data. The downside is that inserts would take longer, but wouldn't that be offset by the gains during selection, which would happen far more often?
What are your thoughts on ID fields? Bendable rule, or unbreakable law?
edit: I am aware that the example provided is not normalized. If tagging is to be a major part of the project, with multiple tables being tagged, and other 'extras', a two-table solution would be a clear answer. However in this simplest case, would normalization be worthwhile? It would save some space, but require an extra join when running queries
As in much of programming: rule, not law.
Proof by exception: Some two-column tables exist only to form relationships between other more meaningful tables.
If you are making tables that bridge between two or more other tables and the only fields you need are the dual PK/FK's, then I don't know why you would need ID column in there as well.
ID columns generally can be very helpful, but that doesn't mean you should go peppering them in at every occasion.
As others have said, it's a general, rather than absolute, rule and there are plenty of exceptions (tables with composite keys for example).
There are some occasional but useful occasions where you might want to create an artificial ID in a table that already has a (usually composite) unique identifier. For example, in one system I've created a table to store part numbers; although the part numbers are unique, they may actually change - we add an arbitrary integer PartID. Not so common, but it's a typical real-world example.
In general what you really want is to be able if at all possible to have some kind of way to uniquely identify a record. It could be an id field or it could be a unique index (which does not have to be on just one field). Anytime I thought I could get away without creating a way to uniquely identify a record, I have been proven wrong. All tables do not have a natural key though and if they do not, you really need to have an id file of some kind. If you have a natural key, you could use that instead, but I find that even then I need an id field in most cases to prevent having to do too much updating when the natural key changes (it always seems to change). Plus having worked with literally hundreds of databases concerning many many differnt topics, I can tell you that a true natural key is rare. As others have nmentioned there is no need for an id field in a table that is simply there to join two tables that havea many to many relationship, but even this should have a unique index.
If you need to retrieve records from that table with unique id then yes. If you will retrieve them by some other composite key made up of foreign keys then no. The last thing you need is fields, data, and indexes that you do not use.
A clustered index does not need to be on primary key or a surrogate (identity column) either.
Your design, however, is not normalized. Typically for tagging, I use two tables, a table of tags (with a surrogate key) and a table of links from the tags to the subject table(s) using the surrogate key in the tag table and theprimary key in the subject table. This allows your tags to apply to different entities (photos, articles, employees, locations, products, whatever). It allows you to enforce foreign key relationships to multiple tables, and also allows you to invent tag hierarchies and other things about the tag table.
As far as the indexes on this design, it will be dictated by the usage patterns.
In general developers love having an ID field on all tables except for 'linking' tables because it makes development much easier, and I am no exception to this. DBA's on the other hand see no problem with making natural primary keys made up of 3 or 4 columns. It can be a butting of heads to try and get a good database design.