Designing a database schema for a Job Listing website? - sql

For a school project I'm making a simple Job Listing website in ASP.NET MVC (we got to choose the framework).
I've thought about it awhile and this is my initial schema:
JobPostings
+---JobPostingID
+---UserID
+---Company
+---JobTitle
+---JobTypeID
+---JobLocationID
+---Description
+---HowToApply
+---CompanyURL
+---LogoURL
JobLocations
+---JobLocationID
+---City
+---State
+---Zip
JobTypes
+---JobTypeID
+---JobTypeName
Note: the UserID will be linked to a Member table generated by a MembershipProvider.
Now, I am extremely new to relational databases and SQL so go lightly on me.
What about naming? Should it be just "Description" under the JobPostings table, or should it be "JobDescription" (same with other columns in that main table). Should it be "JobPostingID" or just "ID"?
General tips are appreciated as well.
Edit: The JobTypes are fixed for our project, there will be 15 job categories. I've made this a community wiki to encourage people to post.

A few thoughts:
Unless you know a priori that there is a limited list of job types, don't split that into a separate table;
Just use "ID" as the primary key on each table (we already know it's a JobLocationID, because it's in the JobLocations table...);
I'd drop the 'Job' prefix from the fields in JobPostings, as it's a bit redundant.
There's a load of domain-specific info that you could include, like salary ranges, and applicant details, but I don't know how far you're supposed to be going with this.

Job Schema http://gfilter.net/junk/JobSchema.png
I split Company out of Job Posting, as this makes maintaining the companies easier.
I also added a XREF table that can store the relationship between companies and locations. You can setup a row for each company office, and have a very simple way to find "Alternative Job Locations for this Company".
This should be a fun project...good luck.
EDIT: I would add Created and LastModifiedBy (Referring to a UserID). These are great columns for general housekeeping.

Looks good to me, I would recommend also adding Created, LastModified and Deleted columns to the user updateable tables as well for future proofing.
Make sure you explicitly define your primary and foreign keys as well in your schema.

What about naming? Should it be just
"Description" under the JobPostings
table, or should it be JobDescription
(same with other columns in that main
table). Should it be "JobPostingID" or
just "ID"?
Personally, I specify generic-sounding fields like "ID" and "Description" with prefixes as you suggest. It avoids confusion about what the id/description applies to when you write queries later on (and saves you the trouble of aliasing them).

I'd recommend folding the data you're going to be storing in JobLocations back into the main table. It's ok to have a table for states and another for countries, but I doubt you want a table that contains every city/state/country pair, you really don't gain anything from it. What happens if someone goes in and edits their location? You'd have to check to make sure no other joblisting points to the location and edit it, else create a new location and point to that instead.
My usual pattern is address and city as text with the record and FK to a state table.

Related

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.
I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

One column with multiple foreign keys situation

Here's the deal : this is my first ever database project and I am afraid my solution to this problem isn't quite the best. The database keeps track of different "types" of cooperators. Those types are companies, organisations, employed workers, other "persons" ...
All of those have totally different sets of information, but they all have one in common - contact information. I decided to let user enter what kind of contact information he wants to add to any of the cooperator, whether its e-mail, phone, URL, Fax and so on ...
So I created "Contacts" table in which all the contact data for all the cooperators will be put, regardless of what type the cooperator is it.
The TablesList is the table that contains the list of cooperator types (Companies, Organisations, Workers)
Each row in the Contacts table must contain "TableID" number which identifies what type of the cooperator is it (Company/Worker/Organisation...), and must contain "RowID" which identifies what exact company/worker/organisation the contact is about.
The problem that exists it that Contacts table contains foreign keys from 3 other tables in 1 column, which cannot be good. I could remove the relationships and just fill the column with thos ID's without the DBMS knowing about the constrains, but that just doesnt look like a good solution to me, so now I doubt this idea is any good.
What do you suggest ?
Keep in mind that in future there may be some more types of cooperators added if needed (like temp/contract workers, agencies) and Contacts table should be designed to support them too
Thanks in advance !
Btw im using SQL CE and C#
here is the sketch of whats going on :
EDIT
Although it doesnt feel right, I just removed the relationships and it works just fine with my application so far
I suspect your design is over-normalized. You can simplify it by consolidating data into three tables: Companies, Workers and Organisations. It's not going to be normalized but it is much simpler to work with.
Check out http://www.codinghorror.com/blog/2008/07/maybe-normalizing-isnt-normal.html

db design critic /suggestions

I am redesigning a pharmacy db system and need inputs to see if the new design is optimal or requires tweaking.
Here's a snapshot of the old system..
As can be seen, the pharmacies table stores pharmacy information, along with its address and contact information. Pharmacies are grouped together for invoicing purposes(pharmacygroup) or for sales, advertsing other purposes (banner group). The invoice group may have a different physical address, different contact information.
Here's my new design. I have split the address from both the pharmacy and pharmacygroup table into a table of its own and made a new table for contacts. Their could be technical contacts, account contacts, owner contacts etc, hence the contacttypes table. The pharmacy and the pharmacygroup can have separate contact info, I thought of making a single contact table and have a 'linktype' and 'linkid' column to indicate if its a pharmacy contact or pharmacy group contact, but I am not sure if this is a right approach. Is this a good design or will it be costly in terms of data retrieval because of the number of joins?? Another thing I noted that , is in the old design , they didn't create any foreign key constraints, although the pharmacy table had groupid and bannergroupid references for pharmacygroup and bannergroup, possibly to save time for data retrieval. Is this a good approach?
Your design looks good to me. I always prefer to have a couple of extra joins on the design step over spending time reorganizing data after system went into production. You never know in advance what kind of reports will be requested by management/sales/financial people, and proper relational design will give you more freedom.
Also, you cannot blame only a couple of extra JOINs for your performance issues. You should always look at:
data volumes (and physical data layout),
transaction amount and density,
I/O, CPU, memory usage,
your RDBMS configuration,
SQL queries quality.
In my view, JOINs will be on the bottom of this list.
As to the RI constraints (Referential Integrity), I've seen a couple of projects that had been running without any Primary/Foreign keys for increased performance. The main excuse was: we have all checks embedded into the Application and Application is the only source of any changes in the system. On the other hand, they agreed, that it is not known, whether systems were in a consistent state (in fact, analysis showed they were not).
I always stick to creating all possible keys/constraints on the design state, as there always will be some “cowboys” around, who will dig into your database and “adjust” data they seem fits better. Still, you might want to temporarily disable or even drop some constraints/indexes for the bulk data manipulations, which is also an official recommendation.
If uncertain, create 2 test databases, one with and another without constraints. Load some data and compare query performance. I think it will be similar.
And here my comments on your sketches, decisions are all yours.
You might want to create a common contacts table the same way you did for addresses, i.e. add contact_id, owner_contact_id, etc. columns to the target relations instead of referencing relations from contacts table;
As you have only one column in contacttype table (and in case you'll have a common contacts), it's better to move the only field away and avoid this table;
You seem to have mixture of singular/plural names for your tables, better to stick to a common pattern here. I personally prefer singular;
In pharmacygroup your PK is named id, while all the rest PKs follow tableid pattern, it will be easier to write scripts later if you'll use a common pattern here;
In addresses table you have fields with underscores, like street_name, while elsewhere you avoid _ — consider making it common;
References are named differently. Although it is not so highly important, I do have a couple of systems where I have to rely on the constraints' names, so it's better to use some pattern here. I use the following one:
prefix p_, f_, c_, t_, u_ or i_ for primary, foreign keys, check constraints, triggers, unique and other indexes;
name of the table;
name of the column constraint/index/trigger refers to.
Why I prefer naming tables in singular form? Because I always name PK using table_id pattern, and IMHO pharmacy_id looks better then pharmacies_id. I use this approach as I have a bunch of general-purpose scripts which relies on this pattern when performing data consistency checks prior to loading it into the main tables.
EDIT:
More on contacts.
You can use contact_id in all your tables, making it a primary contact, whatever this might mean in your application. Should you need more contacts to be there for some relations, then you can go with different prefixes, like owner_contact_id, sales_contact_id, etc.
In case you expect a huge number of contacts to be there for some relations, like pharmacygroup, then you will can add an extra table like this:
CREATE TABLE pharmacygroupcontact (
contactid int4,
groupid int4,
contact_desc text
);
It partially copies your initial groupcontacts, but consists of two FKs and a description.
Which approach is better I cannot tell as I'm not aware how Application is designed.
You have 2 contact tables, I would create one, then use linking tables to link groupcontacts and pharmacycontacts. I would definitely want to have the FK and PK relationships setup to.

Are relationship tables really needed?

Relationship tables mostly contain two columns: IDTABLE1, and IDTABLE2.
Only thing that seems to change between relationship tables is the names of those two columns, and table name.
Would it be better if we create one table Relationships and in this table we place 3 columns:
TABLE_NAME, IDTABLE1, IDTABLE2, and then use this table for all relationships?
Is this a good/acceptable solution in web/desktop application development? What would be downside of this?
Note:
Thank you all for feedback. I appreciate it.
But, I think you are taking it a bit too far... Every solution works until one point.
As data storage simple text file is good till certain point, than excel is better, than MS Access, than SQL Server, than...
To be honest, I haven't seen any argument that states why this solution is bad for small projects (with DB size of few GB).
It would be a monster of a table; it would also be cumbersome. Performance-wise, such a table would not be a great idea. Also, foreign keys are impossible to add to such a table. I really can't see a lot of advantages to such a solution.
Bad idea.
How would you enforce the foreign keys if IDTABLE1 could contain ids from any table at all?
To achieve acceptable performance on joins without a load of unnecessary IO to bring in completely unrelated rows you would need a composite index with leading column TABLE_NAME that basically ends up partitioning the table into sections anyway.
Obviously even with this pseudo partitioning going on you would still be wasting a lot of space in the table/indexes just repeating the table name for each row.
Isn't it a big IF that you're only going to store the 2 ID fields? If I have a StudentCourse (or better yet Enrollment) table that has StudentID & CourseID, but wouldn't EnrollmentDate go in this table as well since not all students enroll on the first day of class. Seems like a bad idea to add this column to an already bloated table where most records will be null.
The benefit of a single table could be a requirement that the application has the ability to allow user/admin to create these relationships with data (Similar to have a single lookup or reference list table) and avoid having to create a new table to address these User Created References. Needing dynamic querying may benefit as well. An application that requires such dynamic data structure requirements might be better suited for a schemaless or nosql database.

What are some of your most useful database standards?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have some ideas, some that I have accumulated over time, but I really want to know what makes things go smoothly for you when modeling database:
Table name matches Primary Key name and description key
Schemas are by functional area
Avoid composite primary keys where possible (use unique constraints)
Camel Case table names and field names
Do not prefix tables with tbl_, or procs with SP_ (no hungarian notation)
OLTP databases should be atleast in BCNF / 4NF
Name similarly targetted stored procs with the same prefix, for instance if you've got 3 stored procedures for Person. That way everything for person is grouped in one place and you can find them easily without having to look through all your procs to find them.
PersonUpdate
PersonDelete
PersonCreate
Do similar things for tables when you have groups of tables with related data. For instance:
InvoiceHeaders
InvoiceLines
InvoiceLineDetails
If you have the option of schemas within your database, use them. It's much nicer to see:
Invoice.Header
Invoice.Line.Items
Invoice.Line.Item.Details
Person.Update
Person.Delete
Person.Create
Don't use triggers unless there's no other reasonable approach to achieve that goal.
Give field names a meaningful prefix so you can tell what table they come from without someone needing to explain. That way when you see a field name referenced, you can easily tell which table it's from.
Use consistent data types for fields containing similar data, i.e. don't store phone number as numeric in one table and varchar in another. In fact, don't store it as numeric, if I come across a negative phone number I'll be mad.
Don't use spaces or other obscure characters in table/field names. They should be entirely alphanumeric - or if I had my druthers, entirely alphabetic with the exception of the underscore. I'm currently working on an inherited system where table and field names contain spaces, question marks and exclamation marks. Makes me want to kill the designer on a daily basis!
Don't use syntax keywords as object names it'll cause headaches trying to retrieve data from them. I hate having to wrap object names as [index] that's two needless chars I didn't need to type damn you!
One thing I haven't seen mentioned yet:
Never use database keywords as object names. You do not want to have to qualify them every time you use them
If you misspell something when you create it, fix it as soon as you notice it. Don't spend years having to remember that in this table UserName is really Usernmae. It's a whole lot easier to fix when there isn't much code written against it.
Never use implied joins (the comma syntax), always specify the joins.
Putting everybody's input together into one list.
Naming Standards
Schemas are named by functional area (Products, Orders, Shipping)
No Hungarian Notation: No type names in object names (no strFirstName)
Do not use registered keywords for object names
No spaces or any special characters in object names (Alphanumber + Underscore are the only things allowed)
Name objects in a natural way (FirstName instead of NameFirst)
Table name should match Primary Key Name and Description field (SalesType – SalesTypeId, SalesTypeDescription)
Do not prefix with tbl_ or sp_
Name code by object name (CustomerSearch, CustomerGetBalance)
CamelCase database object names
Column names should be singular
Table names may be plural
Give business names to all constraints (MustEnterFirstName)
Data Types
Use same variable type across tables (Zip code – numeric in one table and varchar in another is not a good idea)
Use nNVarChar for customer information (name, address(es)) etc. you never know when you may go multinational
In code
Keywords always in UPPERCASE
Never use implied joins (Comma syntax) - always use explicit INNER JOIN / OUTER JOIN
One JOIN per line
One WHERE clause per line
No loops – replace with set based logic
Use short forms of table names for aliases rather than A, B, C
Avoid triggers unless there is no recourse
Avoid cursors like the plague (read http://www.sqlservercentral.com/articles/T-SQL/66097/)
Documentation
Create database diagrams
Create a data dictionary
Normalization and Referential Integrity
Use single column primary keys as much as possible. Use unique constraints where required.
Referential integrity will be always enforced
Avoid ON DELETE CASCADE
OLTP must be at least 4NF
Evaluate every one-to-many relationship as a potential many-to-many relationship
Non user generated Primary Keys
Build Insert based models instead of update based
PK to FK must be same name (Employee.EmployeeId is the same field as EmployeeSalary.EmployeeId)
Except when there is a double join (Person.PersonId joins to PersonRelation.PersonId_Parent and PersonRelation.PersonId_Child)
Maintenance : run periodic scripts to find
Schema without table
Orphaned records
Tables without primary keys
Tables without indexes
Non-deterministic UDF
Backup, Backup, Backup
Be good
Be Consistent
Fix errors now
Read Joe Celko's SQL Programming Style (ISBN 978-0120887972)
My standards for Oracle are:
Keywords are always in UPPERCASE;
Database object names are always in lowercase;
Underscores will replace spaces (ie there won't be any camel case conventions that are common on, say, SQL Server);
Primary keys will pretty much always be named 'id';
Referential integrity will be enforced;
Integer values (including table ids) will generally always be NUMBER(19,0). The reason for this is that this will fit in a 64-bit signed integer thus allowing the Java long type to be used instead of the more awkward BigInteger;
Despite the misnomer of appending "_number" to some column names, the type of such columns will be VARCHAR2 not a number type. Number types are reserved for primary keys and columns you do arithmetic on;
I always use a technical primary keys; and
Each table will have its own sequence for key generation. The name of that sequence will be _seq.
With SQL Server, the only modification is to use camel case for database object names (ie PartyName instead of party_name).
Queries will tend to be written multi-line with one clause or condition per line:
SELECT field1, field2, field2
FROM tablename t1
JOIN tablename2 t2 ON t1.id = t2.tablename_id
WHERE t1.field1 = 'blah'
AND t2.field2 = 'foo'
If the SELECT clause is sufficiently long I'll split it out one field per line.
Name all constraints
don't forget to back up your databases on a regular basis.
Don't use type names in the field names. The older guys will remember the old MS standard of lpszFieldName and the stupidity that ensued.
Use descriptive field names That follow normal language conventions. For example "FirstName" instead of "NameFirst"
Each word in the field name is capitalized
No underscores
Do not use normal keywords such as "Index"
Do not prefix ANYTHING with the object type. For example we do NOT use tblCustomers or spCustomersGet. These don't allow for good sorting and provide zero value.
Use schemas to define separate areas of the database. Such as sales.Customers and hr.Employees. This will get rid of most of the prefixes people use.
Loops of any kind should be viewed with suspicion. There's usually a better set based way.
Use views for complicated joins.
Avoid complicated joins when possible. It may be more astheticaly pleasing to have a CustomerPhoneNumbers table; but honestly, how many phone numbers do we really need to store? Just add the fields to the Customers table. Your DB queries will be faster and it's much easier to understand.
If one table calls a field "EmployeeId" then EVERY SINGLE TABLE that references it should use that name. It doesn't need to be called CustomerServiceRepId just because it's in an extension table.
Almost all tables have the "s" ending. For example: Customers, Orders, etc. After all the table holds many records...
Evaluate your queries, indexes and foreign key relationships with an analysis tool. Even those that may be generated for you. You might be surprised.
Linking tables which support many to many relationships have both linked tables in the name. For example, SchoolsGrades. It's very easy to tell by the table name what it does.
Be CONSISTENT. If you start down one path with your conventions, don't change horses halfway unless you are willing to refactor all of the previous work. This should put the brakes on any "wouldn't it be great if.." ideas that end up causing confusion and vast amounts of rework.
Think before you type. Do you really need that table, field, sproc, or view? Are you sure it isn't covered somewhere else? Get concensus before adding it. And if for some reason you have to take it out, talk to your team first. I've been at places where the DBA's make daily breaking changes without regard for the devs. This isn't fun.
If a database is for a particular application, have a version table so that the database releases can be checked against the code releases (amongst other reasons).
I always try not to use the type in the field name - "sFirstName", "sLastName", or "iEmployeeID". While they match at first, if something changes, they'll be out of sync, and it's a huge headache to change those names later, since you have to change the dependant objects as well.
Intellisense and the GUI tools make it trivial to find out what type a column is, so I don't feel this is necessary.
The WITH clause really helps break queries down into manageable parts.
It also really helps for efficiency on the execution plans of the queries.
Ensure that every varchar/nvarchar choice is appropriate.
Ensure that every NULLable column choice is appropriate - avoid NULLable columns where possible - allowing NULL should be the justifiable position.
Regardless of any other rules you might use from the suggestions here, I would create a stored procedure in the database that can be run on a regular basis to determine system health for any rules or standards you do have (some of this is a little SQL-Server specific):
Look for orphaned records in any cases where the DBMS system's referential integrity cannot be used for some reason (in my system I have a table of processes and a table of tests - so my system_health SP looks for processes without tests, since I only have a one-way FK relationship)
Look for empty schemas
Look for tables without primary keys
Look for tables without any indexes
Look for database objects without documentation (we use SQL Server Extended properties to put the documentation in the database - this documentation can be as granular as the column).
Look for system-specific issues - tables which need to be archived, exceptions which are not part of normal monthly or daily processing, certain common column names with or without defaults (CreateDate, say).
Look for non-deterministic UDFs
Look for TODO comments to ensure that code in the DB does not somehow have untested or pre-release code.
All this can be automated to give you an overall picture of system health.
Everyone writes SQL queries (views, stored procedures, etc) in the same basic format. It really helps development/maintenance efforts down the road.
Consistent naming standards. Having everyone on the same page, using the same format (whether it be Camel Case, specific prefixes, etc..) helps in being able to maintain a system accurately.
A few likes and dislikes.
My opinion is prefixes are horrible in every aspect. I currently work on a system where the tables are prefixed, and the columns within the tables are prefixed with 2 letter table name acronyms, I waste at least 30 mins each day working on this database because the acronym isn't logical. If you want to denote something with a prefix use a schema owner instead.
Using NVarchar from the start of a project if there is even a slight hint that down the line the text data will need to support multi lingual chars. Upgrading large databases because of lack of forward planning and thinking is a pain and wastes time.
Splitting each condition within a where clause onto a new line for readability (in and not in statements wrapped in brackets and tabbed in.) I think this is the important standard for me.
I worked at one company where a standard was that comma's must always be placed at the start of a line when performing parameter or variable declarations. This apparently made it more readable however I found it a complete nightmare.
In addition to normalization to 3NF or BCNF (more about that in this question), I have found the following to be useful:
Name tables as plural nouns
Name columns as sigular
So a "People" table has a "PersonID" column.
There is nothing wrong with composite keys, so long as the rules of 3NF or BCNF still hold. In many cases (such as the "many-to-many" case) this is entirely desirable.
Avoid repeating the table name in the column names. peoplePersonID is better written as table.column anyway, and much more readable and therefore self-documenting. People.PersonID is better, to me at least.
ON DELETE CASCADE should be used very carefully.
Remember that NULL means one of two things: Either it's unknown or it's not applicable.
Remember also that NULLs have interesting affects on joins, so practice your LEFT, RIGHT, and FULL outer joins.
Some others (albeit small) comments to throw against the wall...
SQL Server database schemas can be useful for both organizing tables and stored procedures as well as controlling security.
Every transactional table should always track both who and when created the record as well as updated the record in separate columns. I've seen implementation that simply used "update date" which can lead to auditing challenges in the future.
Use GUID's for row identifiers for all rows for projects with offline/synchronization requirements.
Good database design and normalization.
Tables are named in the singular, lowercase, no underscores, no prefix
Fields also lowercase, no underscores, no prefix
Stored procedures prefixed with "st_" (sorts nicely)
Views that are treated like tables have no prefix
Views created for special reports, etc. have a "v" prefix
Indexed views created for performance have an "ixv" prefix
All indexes have purposeful names (no auto-naming)
Strongly prefer uniqueidentifier (with sequential increment) over int IDENTITY for surrogate keys
Don't artificially limit VARCHAR/NVARCHAR fields to 100 or 255. Give them room to breath. This isn't the 1980s, fields are not stored padded to their max length.
3NF minimum standard
Prefer joining tables to column-level foreign keys: many 1:m assumptions are challenged as a system grows over time.
Always use surrogate keys, not natural keys, as the primary key. All assumptions about "natural" keys (SSNs, usernames, phone numbers, internal codes, etc.) will eventually be challenged.
Tabular formatted SQL.
select a.field1, b.field2
from any_table a
inner join blah b on b.a_id = a.a_id
inner join yet_another y on y.longer_key = b.b_id
where a.field_3 > 7
and b.long_field_name < 2;
Part of this is to use uniformly long alias names (in the example, here, a, b, and y are all length 1).
With this kind of formatting, I can more quickly answer common questions like, "what table is aliased by 'a'?" and "which fields join table T into the query?" The structure doesn't take long to apply or to update, and I find that it saves a lot of time. We spend more time reading code than writing it.
Document everything; wiki type documentation is easy to setup and the software is free.
Make sure you understand the interface first and design the database second. Most of the time its a lot better to know how the data you are going to use needs to work and then engineer the database. Most bad DB design happens as things evolve not upfront.
Then define the database standard and version you are going to work to. Define standards for the code elements (views, functions etc), database naming; naming conventions for columns, tables; type conventions for columns; coding templates.
Spend time considering how you define types having standard database types for fields or bespoke types are a good thing to sort out upfront.
As part of your documentation include a list of don'ts as well as dos for the application which include your prefered hated functionality cursors, triggers.
Review it regularly.
13- Evaluate your queries
Thats true. Sometimes you don't get what you wanted.
For me, it's always useful to name the tables and fields with their exact content and (for us) in clear spanish and using Upper Camel Case, with no whitespaces:
User Name: NombreUsuario
First Last Name: ApellidoPaterno
Second Last Name: ApellidoMaterno
etc etc
Taking "database" to mean "SQL product", my answer is, "Too many to mention. You could write a whole book on the subject." Happily, someone has.
We use Joe Celko's SQL Programming Style (ISBN 978-0120887972): "this book is a collection of heuristics and rules, tips, and tricks that will help you improve SQL programming style and proficiency, and for formatting and writing portable, readable, maintainable SQL code."
Advantages of this approach is include:
the guy knows more about this kind of thing than me (is there another book on SQL heuristics?!);
the work has already been done e.g. I can give the book to someone on the team to read and refer to;
if someone doesn't like my coding style I can blame someone else;
I recently got a load of rep on SO by recommending another Celko book :)
In practice we do deviate from the prescriptions of The Book but surprisingly rarely.
In MS-SQL, I've always had objects owned by dbo., and I prefix calls to those objects with dbo.
Too many times I've seen our devs wonder why they can't call their objects that they inadvertainly owned.
Avoid silly abbreviation conventions, such as comprehensive dictionaries of abbreviations that actively encourage monstrosities like EMP_ID_CONV_FCTR_WTF_LOL_WAK_A_WAK_HU_HU. This rule is inspired a real set of guidelines I've seen before.
MVP Aaron Bertrand's
"My stored procedure "best practices" checklist"
Table name matches Primary Key name and description key
I have just recently, after years of agreeing with this, jumped ship, and now have an "ID" column on every table.
Yes I know, when linking tables it's abiguous! But so is linking ProductID to ProductID, so uhh, why the extra typing?
This:
SELECT p.Name, o.Quantity FROM Products p, Orders o WHERE o.ProductID = p.ID
Is slightly better than this:
SELECT p.Name, o.Quantity FROM Products p, Orders o WHERE o.ProductID = p.ProductID
Note that both will require table or alias prefixes. But not only am I typing slightly less (multiply that across dozens of tables with long descriptive names and it adds up fast in a data intensive application) but it also makes it easier to know which table is the parent table in every join, which, when joining 8-10 tables in a query, can help quite a bit.
I follow a lot of the same conventions as others here, but I wanted to say a few things that haven't been said yet.
Regardless of whether you like plural names or singular names for your tables, be consistent. Choose one or the other, but don't use both.
The primary key in a table has the same name as the table, with the suffix _PK. Foreign keys have their same name as their corresponding primary key, but with a suffix of _FK. For example, the Product table's primary key is called Product_PK; in the Order table the corresponding foreign key is Product_FK. I picked this habit up from another DBA friend of mine and so far I'm liking it.
Whenever I do an INSERT INTO...SELECT, I alias all the columns in the SELECT portion to match the names of the columns from the INSERT INTO portion to make it easier to maintain and see how things match up.
The most important standard is: don't have a database by default. I find too many developers grabbing a database for projects where life would have been much easier without one (at least yet). It is just a tool in the toolbox, and not every problem is a nail.
Inappropriate use of a database leads to anemic domain models, badly testable code and unneeded performance problems.
I agree with just about everything you have put there except for #5. I often use prefixes for tables and stored procedures because the systems that we develop have lots of different functional areas, so I will tend to prefix the tables and sprocs with an identifier that will allow for them to group nicely in Management Studio based on what area they belong to.
Example: cjso_Users, cjso_Roles, and then you have routing_Users, routing_Roles. This may sound like replication of data, but in reality the two different user/roles tables are for completely separate functions of the system (cjso would be for the customer-based ecommerce app while the routing would stand for employees and distributors who use the routing system).
I like our table naming convention:
People Table
PEO_PersonID
PEO_FirstName
...
Which helps make larger querys a bit more readable. and joins make a bit more sense:
Select * -- naughty!
From People
Join Orders on PEO_PersonID = ORD_PersonID
--...
i guess rather than what the naming convention is, is the consistency of the naming.