What are some of your most useful database standards? - sql

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have some ideas, some that I have accumulated over time, but I really want to know what makes things go smoothly for you when modeling database:
Table name matches Primary Key name and description key
Schemas are by functional area
Avoid composite primary keys where possible (use unique constraints)
Camel Case table names and field names
Do not prefix tables with tbl_, or procs with SP_ (no hungarian notation)
OLTP databases should be atleast in BCNF / 4NF

Name similarly targetted stored procs with the same prefix, for instance if you've got 3 stored procedures for Person. That way everything for person is grouped in one place and you can find them easily without having to look through all your procs to find them.
PersonUpdate
PersonDelete
PersonCreate
Do similar things for tables when you have groups of tables with related data. For instance:
InvoiceHeaders
InvoiceLines
InvoiceLineDetails
If you have the option of schemas within your database, use them. It's much nicer to see:
Invoice.Header
Invoice.Line.Items
Invoice.Line.Item.Details
Person.Update
Person.Delete
Person.Create
Don't use triggers unless there's no other reasonable approach to achieve that goal.
Give field names a meaningful prefix so you can tell what table they come from without someone needing to explain. That way when you see a field name referenced, you can easily tell which table it's from.
Use consistent data types for fields containing similar data, i.e. don't store phone number as numeric in one table and varchar in another. In fact, don't store it as numeric, if I come across a negative phone number I'll be mad.
Don't use spaces or other obscure characters in table/field names. They should be entirely alphanumeric - or if I had my druthers, entirely alphabetic with the exception of the underscore. I'm currently working on an inherited system where table and field names contain spaces, question marks and exclamation marks. Makes me want to kill the designer on a daily basis!
Don't use syntax keywords as object names it'll cause headaches trying to retrieve data from them. I hate having to wrap object names as [index] that's two needless chars I didn't need to type damn you!

One thing I haven't seen mentioned yet:
Never use database keywords as object names. You do not want to have to qualify them every time you use them
If you misspell something when you create it, fix it as soon as you notice it. Don't spend years having to remember that in this table UserName is really Usernmae. It's a whole lot easier to fix when there isn't much code written against it.
Never use implied joins (the comma syntax), always specify the joins.

Putting everybody's input together into one list.
Naming Standards
Schemas are named by functional area (Products, Orders, Shipping)
No Hungarian Notation: No type names in object names (no strFirstName)
Do not use registered keywords for object names
No spaces or any special characters in object names (Alphanumber + Underscore are the only things allowed)
Name objects in a natural way (FirstName instead of NameFirst)
Table name should match Primary Key Name and Description field (SalesType – SalesTypeId, SalesTypeDescription)
Do not prefix with tbl_ or sp_
Name code by object name (CustomerSearch, CustomerGetBalance)
CamelCase database object names
Column names should be singular
Table names may be plural
Give business names to all constraints (MustEnterFirstName)
Data Types
Use same variable type across tables (Zip code – numeric in one table and varchar in another is not a good idea)
Use nNVarChar for customer information (name, address(es)) etc. you never know when you may go multinational
In code
Keywords always in UPPERCASE
Never use implied joins (Comma syntax) - always use explicit INNER JOIN / OUTER JOIN
One JOIN per line
One WHERE clause per line
No loops – replace with set based logic
Use short forms of table names for aliases rather than A, B, C
Avoid triggers unless there is no recourse
Avoid cursors like the plague (read http://www.sqlservercentral.com/articles/T-SQL/66097/)
Documentation
Create database diagrams
Create a data dictionary
Normalization and Referential Integrity
Use single column primary keys as much as possible. Use unique constraints where required.
Referential integrity will be always enforced
Avoid ON DELETE CASCADE
OLTP must be at least 4NF
Evaluate every one-to-many relationship as a potential many-to-many relationship
Non user generated Primary Keys
Build Insert based models instead of update based
PK to FK must be same name (Employee.EmployeeId is the same field as EmployeeSalary.EmployeeId)
Except when there is a double join (Person.PersonId joins to PersonRelation.PersonId_Parent and PersonRelation.PersonId_Child)
Maintenance : run periodic scripts to find
Schema without table
Orphaned records
Tables without primary keys
Tables without indexes
Non-deterministic UDF
Backup, Backup, Backup
Be good
Be Consistent
Fix errors now
Read Joe Celko's SQL Programming Style (ISBN 978-0120887972)

My standards for Oracle are:
Keywords are always in UPPERCASE;
Database object names are always in lowercase;
Underscores will replace spaces (ie there won't be any camel case conventions that are common on, say, SQL Server);
Primary keys will pretty much always be named 'id';
Referential integrity will be enforced;
Integer values (including table ids) will generally always be NUMBER(19,0). The reason for this is that this will fit in a 64-bit signed integer thus allowing the Java long type to be used instead of the more awkward BigInteger;
Despite the misnomer of appending "_number" to some column names, the type of such columns will be VARCHAR2 not a number type. Number types are reserved for primary keys and columns you do arithmetic on;
I always use a technical primary keys; and
Each table will have its own sequence for key generation. The name of that sequence will be _seq.
With SQL Server, the only modification is to use camel case for database object names (ie PartyName instead of party_name).
Queries will tend to be written multi-line with one clause or condition per line:
SELECT field1, field2, field2
FROM tablename t1
JOIN tablename2 t2 ON t1.id = t2.tablename_id
WHERE t1.field1 = 'blah'
AND t2.field2 = 'foo'
If the SELECT clause is sufficiently long I'll split it out one field per line.

Name all constraints

don't forget to back up your databases on a regular basis.

Don't use type names in the field names. The older guys will remember the old MS standard of lpszFieldName and the stupidity that ensued.
Use descriptive field names That follow normal language conventions. For example "FirstName" instead of "NameFirst"
Each word in the field name is capitalized
No underscores
Do not use normal keywords such as "Index"
Do not prefix ANYTHING with the object type. For example we do NOT use tblCustomers or spCustomersGet. These don't allow for good sorting and provide zero value.
Use schemas to define separate areas of the database. Such as sales.Customers and hr.Employees. This will get rid of most of the prefixes people use.
Loops of any kind should be viewed with suspicion. There's usually a better set based way.
Use views for complicated joins.
Avoid complicated joins when possible. It may be more astheticaly pleasing to have a CustomerPhoneNumbers table; but honestly, how many phone numbers do we really need to store? Just add the fields to the Customers table. Your DB queries will be faster and it's much easier to understand.
If one table calls a field "EmployeeId" then EVERY SINGLE TABLE that references it should use that name. It doesn't need to be called CustomerServiceRepId just because it's in an extension table.
Almost all tables have the "s" ending. For example: Customers, Orders, etc. After all the table holds many records...
Evaluate your queries, indexes and foreign key relationships with an analysis tool. Even those that may be generated for you. You might be surprised.
Linking tables which support many to many relationships have both linked tables in the name. For example, SchoolsGrades. It's very easy to tell by the table name what it does.
Be CONSISTENT. If you start down one path with your conventions, don't change horses halfway unless you are willing to refactor all of the previous work. This should put the brakes on any "wouldn't it be great if.." ideas that end up causing confusion and vast amounts of rework.
Think before you type. Do you really need that table, field, sproc, or view? Are you sure it isn't covered somewhere else? Get concensus before adding it. And if for some reason you have to take it out, talk to your team first. I've been at places where the DBA's make daily breaking changes without regard for the devs. This isn't fun.

If a database is for a particular application, have a version table so that the database releases can be checked against the code releases (amongst other reasons).

I always try not to use the type in the field name - "sFirstName", "sLastName", or "iEmployeeID". While they match at first, if something changes, they'll be out of sync, and it's a huge headache to change those names later, since you have to change the dependant objects as well.
Intellisense and the GUI tools make it trivial to find out what type a column is, so I don't feel this is necessary.

The WITH clause really helps break queries down into manageable parts.
It also really helps for efficiency on the execution plans of the queries.

Ensure that every varchar/nvarchar choice is appropriate.
Ensure that every NULLable column choice is appropriate - avoid NULLable columns where possible - allowing NULL should be the justifiable position.
Regardless of any other rules you might use from the suggestions here, I would create a stored procedure in the database that can be run on a regular basis to determine system health for any rules or standards you do have (some of this is a little SQL-Server specific):
Look for orphaned records in any cases where the DBMS system's referential integrity cannot be used for some reason (in my system I have a table of processes and a table of tests - so my system_health SP looks for processes without tests, since I only have a one-way FK relationship)
Look for empty schemas
Look for tables without primary keys
Look for tables without any indexes
Look for database objects without documentation (we use SQL Server Extended properties to put the documentation in the database - this documentation can be as granular as the column).
Look for system-specific issues - tables which need to be archived, exceptions which are not part of normal monthly or daily processing, certain common column names with or without defaults (CreateDate, say).
Look for non-deterministic UDFs
Look for TODO comments to ensure that code in the DB does not somehow have untested or pre-release code.
All this can be automated to give you an overall picture of system health.

Everyone writes SQL queries (views, stored procedures, etc) in the same basic format. It really helps development/maintenance efforts down the road.

Consistent naming standards. Having everyone on the same page, using the same format (whether it be Camel Case, specific prefixes, etc..) helps in being able to maintain a system accurately.

A few likes and dislikes.
My opinion is prefixes are horrible in every aspect. I currently work on a system where the tables are prefixed, and the columns within the tables are prefixed with 2 letter table name acronyms, I waste at least 30 mins each day working on this database because the acronym isn't logical. If you want to denote something with a prefix use a schema owner instead.
Using NVarchar from the start of a project if there is even a slight hint that down the line the text data will need to support multi lingual chars. Upgrading large databases because of lack of forward planning and thinking is a pain and wastes time.
Splitting each condition within a where clause onto a new line for readability (in and not in statements wrapped in brackets and tabbed in.) I think this is the important standard for me.
I worked at one company where a standard was that comma's must always be placed at the start of a line when performing parameter or variable declarations. This apparently made it more readable however I found it a complete nightmare.

In addition to normalization to 3NF or BCNF (more about that in this question), I have found the following to be useful:
Name tables as plural nouns
Name columns as sigular
So a "People" table has a "PersonID" column.
There is nothing wrong with composite keys, so long as the rules of 3NF or BCNF still hold. In many cases (such as the "many-to-many" case) this is entirely desirable.
Avoid repeating the table name in the column names. peoplePersonID is better written as table.column anyway, and much more readable and therefore self-documenting. People.PersonID is better, to me at least.
ON DELETE CASCADE should be used very carefully.
Remember that NULL means one of two things: Either it's unknown or it's not applicable.
Remember also that NULLs have interesting affects on joins, so practice your LEFT, RIGHT, and FULL outer joins.

Some others (albeit small) comments to throw against the wall...
SQL Server database schemas can be useful for both organizing tables and stored procedures as well as controlling security.
Every transactional table should always track both who and when created the record as well as updated the record in separate columns. I've seen implementation that simply used "update date" which can lead to auditing challenges in the future.
Use GUID's for row identifiers for all rows for projects with offline/synchronization requirements.

Good database design and normalization.

Tables are named in the singular, lowercase, no underscores, no prefix
Fields also lowercase, no underscores, no prefix
Stored procedures prefixed with "st_" (sorts nicely)
Views that are treated like tables have no prefix
Views created for special reports, etc. have a "v" prefix
Indexed views created for performance have an "ixv" prefix
All indexes have purposeful names (no auto-naming)
Strongly prefer uniqueidentifier (with sequential increment) over int IDENTITY for surrogate keys
Don't artificially limit VARCHAR/NVARCHAR fields to 100 or 255. Give them room to breath. This isn't the 1980s, fields are not stored padded to their max length.
3NF minimum standard
Prefer joining tables to column-level foreign keys: many 1:m assumptions are challenged as a system grows over time.
Always use surrogate keys, not natural keys, as the primary key. All assumptions about "natural" keys (SSNs, usernames, phone numbers, internal codes, etc.) will eventually be challenged.

Tabular formatted SQL.
select a.field1, b.field2
from any_table a
inner join blah b on b.a_id = a.a_id
inner join yet_another y on y.longer_key = b.b_id
where a.field_3 > 7
and b.long_field_name < 2;
Part of this is to use uniformly long alias names (in the example, here, a, b, and y are all length 1).
With this kind of formatting, I can more quickly answer common questions like, "what table is aliased by 'a'?" and "which fields join table T into the query?" The structure doesn't take long to apply or to update, and I find that it saves a lot of time. We spend more time reading code than writing it.

Document everything; wiki type documentation is easy to setup and the software is free.
Make sure you understand the interface first and design the database second. Most of the time its a lot better to know how the data you are going to use needs to work and then engineer the database. Most bad DB design happens as things evolve not upfront.
Then define the database standard and version you are going to work to. Define standards for the code elements (views, functions etc), database naming; naming conventions for columns, tables; type conventions for columns; coding templates.
Spend time considering how you define types having standard database types for fields or bespoke types are a good thing to sort out upfront.
As part of your documentation include a list of don'ts as well as dos for the application which include your prefered hated functionality cursors, triggers.
Review it regularly.

13- Evaluate your queries
Thats true. Sometimes you don't get what you wanted.
For me, it's always useful to name the tables and fields with their exact content and (for us) in clear spanish and using Upper Camel Case, with no whitespaces:
User Name: NombreUsuario
First Last Name: ApellidoPaterno
Second Last Name: ApellidoMaterno
etc etc

Taking "database" to mean "SQL product", my answer is, "Too many to mention. You could write a whole book on the subject." Happily, someone has.
We use Joe Celko's SQL Programming Style (ISBN 978-0120887972): "this book is a collection of heuristics and rules, tips, and tricks that will help you improve SQL programming style and proficiency, and for formatting and writing portable, readable, maintainable SQL code."
Advantages of this approach is include:
the guy knows more about this kind of thing than me (is there another book on SQL heuristics?!);
the work has already been done e.g. I can give the book to someone on the team to read and refer to;
if someone doesn't like my coding style I can blame someone else;
I recently got a load of rep on SO by recommending another Celko book :)
In practice we do deviate from the prescriptions of The Book but surprisingly rarely.

In MS-SQL, I've always had objects owned by dbo., and I prefix calls to those objects with dbo.
Too many times I've seen our devs wonder why they can't call their objects that they inadvertainly owned.

Avoid silly abbreviation conventions, such as comprehensive dictionaries of abbreviations that actively encourage monstrosities like EMP_ID_CONV_FCTR_WTF_LOL_WAK_A_WAK_HU_HU. This rule is inspired a real set of guidelines I've seen before.

MVP Aaron Bertrand's
"My stored procedure "best practices" checklist"

Table name matches Primary Key name and description key
I have just recently, after years of agreeing with this, jumped ship, and now have an "ID" column on every table.
Yes I know, when linking tables it's abiguous! But so is linking ProductID to ProductID, so uhh, why the extra typing?
This:
SELECT p.Name, o.Quantity FROM Products p, Orders o WHERE o.ProductID = p.ID
Is slightly better than this:
SELECT p.Name, o.Quantity FROM Products p, Orders o WHERE o.ProductID = p.ProductID
Note that both will require table or alias prefixes. But not only am I typing slightly less (multiply that across dozens of tables with long descriptive names and it adds up fast in a data intensive application) but it also makes it easier to know which table is the parent table in every join, which, when joining 8-10 tables in a query, can help quite a bit.

I follow a lot of the same conventions as others here, but I wanted to say a few things that haven't been said yet.
Regardless of whether you like plural names or singular names for your tables, be consistent. Choose one or the other, but don't use both.
The primary key in a table has the same name as the table, with the suffix _PK. Foreign keys have their same name as their corresponding primary key, but with a suffix of _FK. For example, the Product table's primary key is called Product_PK; in the Order table the corresponding foreign key is Product_FK. I picked this habit up from another DBA friend of mine and so far I'm liking it.
Whenever I do an INSERT INTO...SELECT, I alias all the columns in the SELECT portion to match the names of the columns from the INSERT INTO portion to make it easier to maintain and see how things match up.

The most important standard is: don't have a database by default. I find too many developers grabbing a database for projects where life would have been much easier without one (at least yet). It is just a tool in the toolbox, and not every problem is a nail.
Inappropriate use of a database leads to anemic domain models, badly testable code and unneeded performance problems.

I agree with just about everything you have put there except for #5. I often use prefixes for tables and stored procedures because the systems that we develop have lots of different functional areas, so I will tend to prefix the tables and sprocs with an identifier that will allow for them to group nicely in Management Studio based on what area they belong to.
Example: cjso_Users, cjso_Roles, and then you have routing_Users, routing_Roles. This may sound like replication of data, but in reality the two different user/roles tables are for completely separate functions of the system (cjso would be for the customer-based ecommerce app while the routing would stand for employees and distributors who use the routing system).

I like our table naming convention:
People Table
PEO_PersonID
PEO_FirstName
...
Which helps make larger querys a bit more readable. and joins make a bit more sense:
Select * -- naughty!
From People
Join Orders on PEO_PersonID = ORD_PersonID
--...
i guess rather than what the naming convention is, is the consistency of the naming.

Related

Is it better to name the primary key column id or *_id?

I've been using Rails for a few years and I've grown used to the convention of naming the primary key column id. But I've run across lots of examples in SQL books that name the primary key column something like employee_id for an employees table or feed_id for a feeds table.
One advantage of the 2nd system seems to be that you can use USING() more to produce more concise SQL queries:
select feeds.title, items.title from items inner join feeds USING(feed_id);
As opposed to
select feeds.title, items.title from items inner join feeds on feeds.id = items.feed_id;
Which naming convention is better? Which is favored by experienced database administrators?
Also, is it better to pluralize the name of the table?
I always use the verbose form (i.e. 'employee_id' rather than 'id') as it is more descriptive. If you are joining more than one table and both have 'id' column you will have to alias 'id' if you need to SELECT both of the ids. Also, as you mentioned, you get the advantage of USING clause. In the grand scheme of things it isn't a huge factor one way or the other but the more verbose form gives you advantages.
Both options are valid but the purists will say use id as its name is specified by the table.
I use table_id because I find it to be more descriptive and makes debugging easier. It's more practical.
Re: Tablenames. Another hotly debated topic among database nerds but I say Singular.
Tablename_Id is my strong preference. When you do joins to Fks you know exactly what to join to what and don't make mistakes where you join to ID in table a when you meant tableb below is an example of how easy this is to do especially if you copy the on clause from somewhere else
FROM tablea a
JOIN tableb b
ON a.ID = b.tableaid
JOIN tablec c
ON a.ID = c.tablebid
In the case above, you really wanted to join to B.Id but forgot to change it from a when you copied. It will work and give you a resultset that isn't correct. If you use table_id instead, the query would fail the syntax check.
Another problem with using Id is when you are doing complex reports. Since the repport queries have to have fields with individual names, you can end up wasting time writing a bunch of aliases you wouldn't need if you had named the id with the tablename.
Now people who use ORMs don't write a lot of SQl but what they do write and what report writers write are generally complex, complicated statements. You need to design you database to make it easier to do those things than simple queries.
The use of ID as the name for the identifying field is considered a SQl antipattern. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1308929815&sr=1-1
This is user preference, but I always name the primary keys of my tables Id. I always name references of that Id in other tables as [SingularEntityName][Id] e.g.
Credentials
Id Password
Users
Id Name CredentialId
Descriptions
Id UserId
Keeps my references clean. However, just be consistant in your naming and it really shouldn't matter how you set up your schemas.
To open the can of worms again,
I'm willing to bet those who select tablename_id are older, more experienced programmers.
Those who use just id are younger.
Why ? because you learn redundancy and constancy is not always a bad thing.
the one thing I would add to the #1 answer, use the "_" helps make it easier to pick out the variable in code, in the table, etc... I do the same for foreign keys. TableName_FK Some will argue over that but it works for me and it's obvious what it is.
I have had to work on other's code many times over the years. Consistency is critical, obfuscation is worthless and meaningful variable names very helpful.
There are those who argue that verbosity makes code harder to read. I don't think that argument flies in today's world of objects.that.derive.from.some.microsoft.class.twenty.layers.deep.that.you.have.to.fully.reference.
BTW - as so many have said, it's your choice. Those folks who spend time arguing over coding syntax don't have enough work to do. Learn to be flexible and to use the standards of the workplace where you are employed. If you are lucky enough to set your own standards, then have at it. The fact your are wondering is great. But choose one and then be consistent (until you change jobs or decide you have a paradigm shift that means you want to change your style.)
You can often pick out what era someone started learning to code by their personal preferences and styles. Guys that write very tight, minmal, hard to read code, started back when memory was very limited (DOS) and probably wrote a lot of assembler, those that use Hungarian started back with the Win SDK, etc...
This discussion has been evolving for decades. The older I get, the more I document my code, the more meaningful I make my variable names, etc... because in a week I will have forgotten what I wrote and I need the road maps to make sense of it. Not so much that I'm forgetful, although that's part of the equation, but more so because I'm writing code on so many different projects.
it's entirely your choice. But personally I prefer the second one as I wouldn't need to look for table names in my code when I come across an id. I think tablename_id is better.
Another advantage to giving your primary keys names that are unique to that table is that it makes it easier to have a naming convention, when referring to those keys in different tables, that indicates the corresponding key.
For example, suppose everything in your alpha table begins alpha_, so that you have alpha_id as your primary key. In your beta table - where everything would begin beta_ - you would use beta_alpha_id to have a reference in that table to the keys in the alpha table.

How important are lookup tables?

A lot of the applications I write make use of lookup tables, since that was just the way I was taught (normalization and such). The problem is that the queries I make are often more complicated because of this. They often look like this
get all posts that are still open
"SELECT * FROM posts WHERE status_id = (SELECT id FROM statuses WHERE name = 'open')"
Often times, the lookup tables themselves are very short. For instance, there may only be 3 or so different statuses. In this case, would it be okay to search for a certain type by using a constant or so in the application? Something like
get all posts that are still open
"SELECT * FROM posts WHERE status_id = ".Status::OPEN
Or, what if instead of using a foreign id, I set it as an enum and queried off of that?
Thanks.
The answer depends a little if you are limited to freeware such as PostGreSQL (not fully SQL compliant), or if you are thinking about SQL (ie. SQL compliant) and large databases.
In SQL compliant, Open Architecture databases, where there are many apps using one database, and many users using different report tools (not just the apps) to access the data, standards, normalisation, and open architecture requirements are important.
Despite the people who attempt to change the definition of "normalisation", etc. to suit their ever-changing purpose, Normalisation (the science) has not changed.
if you have data values such as {Open; Closed; etc} repeated in data tables, that is data duplication, a simple Normalisation error: if you those values change, you may have to update millions of rows, which is very limited design.
Such values should be Normalised into a Reference or Lookup table, with a short CHAR(2) PK:
O Open
C Closed
U [NotKnown]
The data values {Open;Closed;etc} are no longer duplicated in the millions of rows. It also saves space.
the second point is ease of change, if Closed were changed to Expired, again, one row needs to be changed, and that is reflected in the entire database; whereas in the un-normalised files, millions of rows need to be changed.
Adding new data values, eg. (H,HalfOpen) is then simply a matter of inserting one row.
in Open Architecture terms, the Lookup table is an ordinary table. It exists in the [SQL compliant] catalogue; as long as the FOREIGN KEY relation has been defined, the report tool can find that as well.
ENUM is a Non-SQL, do not use it. In SQL the "enum" is a Lookup table.
The next point relates to the meaningfulness of the key.
If the Key is meaningless to the user, fine, use an {INT;BIGINT;GUID;etc} or whatever is suitable; do not number them incrementally; allow "gaps".
But if the Key is meaningful to the user, do not use a meaningless number, use a meaningful Relational Key.
Now some people will get in to tangents regarding the permanence of PKs. That is a separate point. Yes, of course, always use a stable value for a PK (not "immutable", because no such thing exists, and a system-generated key does not provide row uniqueness).
{M,F} are unlikely to change
if you have used {0,1,2,4,6}, well don't change it, why would you want to. Those values were supposed to be meaningless, remember, only a meaningful Key need to be changed.
if you do use meaningful keys, use short alphabetic codes, that developers can readily understand (and infer the long description from). You will appreciate this only when you code SELECT and realise you do not have to JOIN every Lookup table. Power users too, appreciate it.
Since PKs are stable, particularly in Lookup tables, you can safely code:
WHERE status_code = 'O' -- Open
You do not have to JOIN the Lookup table and obtain the data value Open, as a developer, you are supposed to know what the Lookup PKs mean.
Last, if the database were large, and supported BI or DSS or OLAP functions in addition to OLTP (as properly Normalised databases can), then the Lookup table is actually a Dimension or Vector, in Dimension-Fact analyses. If it was not there, then it would have to be added in, to satisfy the requirements of that software, before such analyses can be mounted.
If you do that to your database from the outset, you will not have to upgrade it (and the code) later.
Your Example
SQL is a low-level language, thus it is cumbersome, especially when it comes to JOINs. That is what we have, so we need to just accept the encumbrance and deal with it. Your example code is fine. But simpler forms can do the same thing.
A report tool would generate:
SELECT p.*,
s.name
FROM posts p,
status s
WHERE p.status_id = s.status_id
AND p.status_id = 'O'
Another Exaple
For banking systems, where we use short codes which are meaningful (since they are meaningful, we do not change them with the seasons, we just add to them), given a Lookup table such as (carefully chosen, similar to ISO Country Codes):
Eq Equity
EqCS Equity/Common Share
OTC OverTheCounter
OF OTC/Future
Code such as this is common:
WHERE InstrumentTypeCode LIKE "Eq%"
And the users of the GUI would choose the value from a drop-down that displays
{Equity/Common Share;Over The Counter},
not {Eq;OTC;OF}, not {M;F;U}.
Without a lookup table, you can't do that, either in the apps, or in the report tool.
For look-up tables I use a sensible primary key -- usually just a CHAR(1) that makes sense in the domain with an additional Title (VARCHAR) field. This can maintain relationship enforcement while "keeping the SQL simple". The key to remember here is the look-up table does not "contain data". It contains identities. Some other identities might be time-zone names or assigned IOC country codes.
For instance gender:
ID Label
M Male
F Female
N Neutral
select * from people where gender = 'M'
Alternatively, an ORM could be used and manual SQL generation might never have to be done -- in this case the standard "int" surrogate key approach is fine because something else deals with it :-)
Happy coding.
Create a function for each lookup.
There is no easy way. You want performance and query simplicity. Ensure the following is maintained. You could create a SP_TestAppEnums to compare existing lookup values against the function and look for out of sync/zero returned.
CREATE FUNCTION [Enum_Post](#postname varchar(10))
RETURNS int
AS
BEGIN
DECLARE #postId int
SET #postId =
CASE #postname
WHEN 'Open' THEN 1
WHEN 'Closed' THEN 2
END
RETURN #postId
END
GO
/* Calling the function */
SELECT dbo.Enum_Post('Open')
SELECT dbo.Enum_Post('Closed')
Question is: do you need to include the lookup tables (domain tables 'round my neck of the woods) in your queries? Presumably, these sorts of tables are usually
pretty static in nature — the domain might get extended, but it probably won't get shortened.
their primary key values are pretty unlikely to change as well (e.g., the status_id for a status of 'open' is unlikely to suddenly get changed to something other than what it was created as).
If the above assumptions are correct, there's no real need to add all those extra tables to your joins just so your where clause can use a friend name instead of an id value. Just filter on status_id directly where you need to. I'd suspect the non-key attribute in the where clause ('name' in your example above) is more likely to get changes than the key attribute ('name' in your example above): you're more protected by referencing the desire key value(s) of the domain table in your join.
Domain tables serve
to limit the domain of the variable via a foreign key relationship,
to allow the domain to be expanded by adding data to the domain table,
to populate UI controls and the like with user-friendly information,
Naturally, you'd need to suck domain tables into your queries where you you actually required the non-key attributes from the domain table (e.g., descriptive name of the value).
YMMV: a lot depends on context and the nature of the problem space.
The answer is "whatever makes sense".
lookup tables involve joins or subqueries which are not always efficient. I make use of enums a lot to do this job. its efficient and fast
Where possible (and It is not always . . .), I use this rule of thumb: If I need to hard-code a value into my application (vs. let it remain a record in the database), and also store that vlue in my database, then something is amiss with my design. It's not ALWAYS true, but basically, whatever the value in question is, it either represents a piece of DATA, or a peice of PROGRAM LOGIC. It is a rare case that it is both.
NOT that you won't find yourself discovering which one it is halfway into the project. But as the others said above, there can be trade-offs either way. Just as we don't always acheive "perfect" normalization in a database design (for reason of performance, or simply because you CAN take thngs too far in pursuit of acedemic perfection . . .), we may make some concious choices about where we locate our "look-up" values.
Personally, though, I try to stand on my rule above. It is either DATA, or PROGRAM LOGIC, and rarely both. If it ends up as (or IN) a record in the databse, I try to keep it out of the Application code (except, of course, to retrieve it from the database . . .). If it is hardcoded in my application, I try to keep it out of my database.
In cases where I can't observe this rule, I DOCUMENT THE CODE with my reasoning, so three years later, some poor soul will be able to ficure out how it broke, if that happens.
The commenters have convinced me of the error of my ways. This answer and the discussion that went along with it, however, remain here for reference.
I think a constant is appropriate here, and a database table is not. As you design your application, you expect that table of statuses to never, ever change, since your application has hard-coded into it what those statuses mean, anyway. The point of a database is that the data within it will change. There are cases where the lines are fuzzy (e.g. "this data might change every few months or so…"), but this is not one of the fuzzy cases.
Statuses are a part of your application's logic; use constants to define them within the application. It's not only more strictly organized that way, but it will also allow your database interactions to be significantly speedier.

SQL Server - Schema/Code Analysis Rules - What would your rules include?

We're using Visual Studio Database Edition (DBPro) to manage our schema. This is a great tool that, among the many things it can do, can analyse our schema and T-SQL code based on rules (much like what FxCop does with C# code), and flag certain things as warnings and errors.
Some example rules might be that every table must have a primary key, no underscore's in column names, every stored procedure must have comments etc.
The number of rules built into DBPro is fairly small, and a bit odd. Fortunately DBPro has an API that allows the developer to create their own. I'm curious as to the types of rules you and your DB team would create (both schema rules and T-SQL rules). Looking at some of your rules might help us decide what we should consider.
Thanks - Randy
Some of mine. Not all could be tested programmatically:
No hungarian-style prefixes (like "tbl" for table, "vw" for view)
If there is any chance this would ever be ported to Oracle, no identifiers longer than 30 characters.
All table and column names expressed in lower-case letters only
Underscores between words in column and table names--we differ on this one obviously
Table names are singular ("customer" not "customers")
Words that make up table, column, and view names are not abbreviated, concatenated, or acronym-based unless necessary.
Indexes will be prefixed with “IX_”.
Primary Keys are prefixed with “PK_”.
Foreign Keys are prefixed with “FK_”.
Unique Constraints are prefixed with “UC_”.
I suspect most of my list would be hard to put in a rules engine, but here goes:
If possible I'd have it report any tables that are defined as wider than the bytes that can be stored in a record (excluding varchar(max) and text type fields) and/or a datapage.
I want all related PK and FK columns to have the same name if at all possible. The only time it isn't possible is when you need to have two FKs in the same table relating to one PK and even then, I would name it the name of the PK and a prefix or suffix describing the difference. For instance if I had a PersonID PK and a table needed to have both the sales rep id and the customer id, they would be CustomerPersonID, and RepPersonID.
I would check to make sure all FKs have an index.
I would want to know about all fields that are required but have no default value. Depending on what it is, you may not want to define a default, But I would want to be able to easily see which ones don't to hopefully find the ones that should have a default.
I would want all triggers checked to see that they are set-based and not designed to run for one row at time.
No table without a defined Unique index or PK. No table where the PK is more than one field. No table where the PK is not an int.
No object names that use reserved words for the database I'm using.
No fields with the word Date as part of the name that are not defined as date or datetime.
No table without an associated audit table.
No field called SSN, SocialSecurityNumber, etc. that is not encrypted. Same for any field named CreditCardNumber.
No user defined datatypes (In SQL Server at least, these are far more trouble than they are worth.)
No views that call other views. Experience has shown me these are often a performance disaster waiting to happen. Especially if they layer more than one layer deep.
If using replication, no table without a GUID field.
All tables should have a DateInserted field and InsertedBy field (even with auditing, it is often easier to research data problems if this info is easily available.)
Consistent use of the same case in naming. It doesn't matter which as long as all use the same one.
No tables with a field called ID. Hate these with a passion. They are so useless. ID fields should be named tablenameID if a PK and with the PK name if an FK.
No spaces or special characters in object names. In other words if you need special handling for the database to recognize it in the correct context in query, don't use it.
If it is going to analyze code as well, I'd want to see any code that uses a cursor or a correlated subquery. Why create performance problems from the start?
I would want to see if a proc uses dynamic SQl and if so if it has an input bit variable called Debug (and code to only print the dynamic SQl statment and not execute it, if the Debug variable is set to 1).
I'd want to be able to check that if there is more than one statement causing action in the database (insert/update/delete) that there is also an explicit transaction in the proc and error trapping to roll the whole thing back if any part of it fails.
I'm sure I could think of more.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)

Table Naming Dilemma: Singular vs. Plural Names [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Academia has it that table names should be the singular of the entity that they store attributes of.
I dislike any T-SQL that requires square brackets around names, but I have renamed a Users table to the singular, forever sentencing those using the table to sometimes have to use brackets.
My gut feel is that it is more correct to stay with the singular, but my gut feel is also that brackets indicate undesirables like column names with spaces in them etc.
Should I stay, or should I go?
I had same question, and after reading all answers here I definitely stay with SINGULAR, reasons:
Reason 1 (Concept). You can think of bag containing apples like "AppleBag", it doesn't matter if contains 0, 1 or a million apples, it is always the same bag. Tables are just that, containers, the table name must describe what it contains, not how much data it contains. Additionally, the plural concept is more about a spoken language one (actually to determine whether there is one or more).
Reason 2. (Convenience). it is easier come out with singular names, than with plural ones. Objects can have irregular plurals or not plural at all, but will always have a singular one (with few exceptions like News).
Customer
Order
User
Status
News
Reason 3. (Aesthetic and Order). Specially in master-detail scenarios, this reads better, aligns better by name, and have more logical order (Master first, Detail second):
1.Order
2.OrderDetail
Compared to:
1.OrderDetails
2.Orders
Reason 4 (Simplicity). Put all together, Table Names, Primary Keys, Relationships, Entity Classes... is better to be aware of only one name (singular) instead of two (singular class, plural table, singular field, singular-plural master-detail...)
Customer
Customer.CustomerID
CustomerAddress
public Class Customer {...}
SELECT FROM Customer WHERE CustomerID = 100
Once you know you are dealing with "Customer", you can be sure you will use the same word for all of your database interaction needs.
Reason 5. (Globalization). The world is getting smaller, you may have a team of different nationalities, not everybody has English as a native language. It would be easier for a non-native English language programmer to think of "Repository" than of "Repositories", or "Status" instead of "Statuses". Having singular names can lead to fewer errors caused by typos, save time by not having to think "is it Child or Children?", hence improving productivity.
Reason 6. (Why not?). It can even save you writing time, save you disk space, and even make your computer keyboard last longer!
SELECT Customer.CustomerName FROM Customer WHERE Customer.CustomerID = 100
SELECT Customers.CustomerName FROM Customers WHERE Customers.CustomerID = 103
You have saved 3 letters, 3 bytes, 3 extra keyboard hits :)
And finally, you can name those ones messing up with reserved names like:
User > LoginUser, AppUser, SystemUser, CMSUser,...
Or use the infamous square brackets [User]
I prefer to use the uninflected noun, which in English happens to be singular.
Inflecting the number of the table name causes orthographic problems (as many of the other answers show), but choosing to do so because tables usually contain multiple rows is also semantically full of holes. This is more obvious if we consider a language that inflects nouns based on case (as most do):
Since we're usually doing something with the rows, why not put the name in the accusative case? If we have a table that we write to more than we read, why not put the name in dative? It's a table of something, why not use the genitive? We wouldn't do this, because the table is defined as an abstract container that exists regardless of its state or usage. Inflecting the noun without a precise and absolute semantic reason is babbling.
Using the uninflected noun is simple, logical, regular and language-independent.
If you use Object Relational Mapping tools or will in the future I suggest Singular.
Some tools like LLBLGen can automatically correct plural names like Users to User without changing the table name itself. Why does this matter? Because when it's mapped you want it to look like User.Name instead of Users.Name or worse from some of my old databases tables naming tblUsers.strName which is just confusing in code.
My new rule of thumb is to judge how it will look once it's been converted into an object.
one table I've found that does not fit the new naming I use is UsersInRoles. But there will always be those few exceptions and even in this case it looks fine as UsersInRoles.Username.
Others have given pretty good answers as far as "standards" go, but I just wanted to add this... Is it possible that "User" (or "Users") is not actually a full description of the data held in the table? Not that you should get too crazy with table names and specificity, but perhaps something like "Widget_Users" (where "Widget" is the name of your application or website) would be more appropriate.
What convention requires that tables have singular names? I always thought it was plural names.
A user is added to the Users table.
This site agrees:
http://vyaskn.tripod.com/object_naming.htm#Tables
This site disagrees (but I disagree with it):
http://justinsomnia.org/writings/naming_conventions.html
As others have mentioned: these are just guidelines. Pick a convention that works for you and your company/project and stick with it. Switching between singular and plural or sometimes abbreviating words and sometimes not is much more aggravating.
How about this as a simple example:
SELECT Customer.Name, Customer.Address FROM Customer WHERE Customer.Name > "def"
vs.
SELECT Customers.Name, Customers.Address FROM Customers WHERE Customers.Name > "def"
The SQL in the latter is stranger sounding than the former.
I vote for singular.
I am of the firm belief that in an Entity Relation Diagram, the entity should be reflected with a singular name, similar to a class name being singular. Once instantiated, the name reflects its instance. So with databases, the entity when made into a table (a collection of entities or records) is plural. Entity, User is made into table Users. I would agree with others who suggested maybe the name User could be improved to Employee or something more applicable to your scenario.
This then makes more sense in a SQL statement because you are selecting from a group of records and if the table name is singular, it doesn't read well.
I stick with singular for table names and any programming entity.
The reason? The fact that there are irregular plurals in English like mouse ⇒ mice and sheep ⇒ sheep. Then, if I need a collection, i just use mouses or sheeps, and move on.
It really helps the plurality stand out, and I can easily and programatically determine what the collection of things would look like.
So, my rule is: everything is singular, every collection of things is singular with an s appended. Helps with ORMs too.
IMHO, table names should be plural like Customers.
Class names should be singular like Customer if it maps to a row in the Customers table.
Singular. I don't buy any argument involving which is most logical - every person thinks his own preference is most logical. No matter what you do it is a mess, just pick a convention and stick to it. We are trying to map a language with highly irregular grammar and semantics (normal spoken and written language) to a highly regular (SQL) grammar with very specific semantics.
My main argument is that I don't think of the tables as a set but as relations.
So, the AppUser relation tells which entities are AppUsers.
The AppUserGroup relation tells me which entities are AppUserGroups
The AppUser_AppUserGroup relation tells me how the AppUsers and AppUserGroups are related.
The AppUserGroup_AppUserGroup relation tells me how AppUserGroups and AppUserGroups are related (i.e. groups member of groups).
In other words, when I think about entities and how they are related I think of relations in singular, but of course, when I think of the entities in collections or sets, the collections or sets are plural.
In my code, then, and in the database schema, I use singular. In textual descriptions, I end up using plural for increased readability - then use fonts etc. to distinguish the table/relation name from the plural s.
I like to think of it as messy, but systematic - and this way there is always a systematically generated name for the relation I wish to express, which to me is very important.
I also would go with plurals, and with the aforementioned Users dilemma, we do take the square bracketing approach.
We do this to provide uniformity between both database architecture and application architecture, with the underlying understanding that the Users table is a collection of User values as much as a Users collection in a code artifact is a collection of User objects.
Having our data team and our developers speaking the same conceptual language (although not always the same object names) makes it easier to convey ideas between them.
I personaly prefer to use plural names to represent a set, it just "sounds" better to my relational mind.
At this exact moment i am using singular names to define a data model for my company, because most of the people at work feel more confortable with it.
Sometimes you just have to make life easier to everyone instead of imposing your personal preferences.
(that's how i ended up in this thread, to get a confirmation on what should be the "best practice" for naming tables)
After reading all the arguing in this thread, i reached one conclusion:
I like my pancakes with honey, no matter what everybody's favorite flavour is. But if i am cooking for other people, i will try to serve them something they like.
Singular. I'd call an array containing a bunch of user row representation objects 'users', but the table is 'the user table'. Thinking of the table as being nothing but the set of the rows it contains is wrong, IMO; the table is the metadata, and the set of rows is hierarchically attached to the table, it is not the table itself.
I use ORMs all the time, of course, and it helps that ORM code written with plural table names looks stupid.
I've actually always thought it was popular convention to use plural table names. Up until this point I've always used plural.
I can understand the argument for singular table names, but to me plural makes more sense. A table name usually describes what the table contains. In a normalized database, each table contains specific sets of data. Each row is an entity and the table contains many entities. Thus the plural form for the table name.
A table of cars would have the name cars and each row is a car. I'll admit that specifying the table along with the field in a table.field manner is the best practice and that having singular table names is more readable. However in the following two examples, the former makes more sense:
SELECT * FROM cars WHERE color='blue'
SELECT * FROM car WHERE color='blue'
Honestly, I will be rethinking my position on the matter, and I would rely on the actual conventions used by the organization I'm developing for. However, I think for my personal conventions, I'll stick with plural table names. To me it makes more sense.
I don't like plural table names because some nouns in English are not countable (water, soup, cash) or the meaning changes when you make it countable (chicken vs a chicken; meat vs bird).
I also dislike using abbreviations for table name or column name because doing so adds extra slope to the already steep learning curve.
Ironically, I might make User an exception and call it Users because of USER (Transac-SQL), because I too don't like using brackets around tables if I don't have to.
I also like to name all the ID columns as Id, not ChickenId or ChickensId (what do plural guys do about this?).
All this is because I don't have proper respect for the database systems, I am just reapplying one-trick-pony knowledge from OO naming conventions like Java's out of habit and laziness. I wish there were better IDE support for complicated SQL.
We run similar standards, when scripting we demand [ ] around names, and where appropriate schema qualifiers - primarily it hedges your bets against future name grabs by the SQL syntax.
SELECT [Name] FROM [dbo].[Customer] WHERE [Location] = 'WA'
This has saved our souls in the past - some of our database systems have run 10+ years from SQL 6.0 through SQL 2005 - way past their intended lifespans.
If we look at MS SQL Server's system tables, their names as assigned by Microsoft are in plural.
The Oracle's system tables are named in singular. Although a few of them are plural.
Oracle recommends plural for user-defined table names.
That doesn't make much sense that they recommend one thing and follow another.
That the architects at these two software giants have named their tables using different conventions, doesn't make much sense either... After all, what are these guys ... PhD's?
I do remember in academia, the recommendation was singular.
For example, when we say:
select OrderHeader.ID FROM OrderHeader WHERE OrderHeader.Reference = 'ABC123'
maybe b/c each ID is selected from a particular single row ...?
The system tables/views of the server itself (SYSCAT.TABLES, dbo.sysindexes, ALL_TABLES, information_schema.columns, etc.) are almost always plural. I guess for the sake of consistency I'd follow their lead.
I am a fan of singular table names as they make my ER diagrams using CASE syntax easier to read, but by reading these responses I'm getting the feeling it never caught on very well? I personally love it. There is a good overview with examples of how readable your models can be when you use singular table names, add action verbs to your relationships and form good sentences for every relationships. It's all a bit of overkill for a 20 table database but if you have a DB with hundreds of tables and a complex design how will your developers ever understand it without a good readable diagram?
http://www.aisintl.com/case/method.html
As for prefixing tables and views I absolutely hate that practice. Give a person no information at all before giving them possibly bad information. Anyone browsing a db for objects can quite easily tell a table from a view, but if I have a table named tblUsers that for some reason I decide to restructure in the future into two tables, with a view unifying them to keep from breaking old code I now have a view named tblUsers. At this point I am left with two unappealing options, leave a view named with a tbl prefix which may confuse some developers, or force another layer, either middle tier or application to be rewritten to reference my new structure or name viewUsers. That negates a large part of the value of views IMHO.
Tables: plural
Multiple users are listed in the users table.
Models: singular
A singular user can be selected from the users table.
Controllers: plural
http://myapp.com/users would list multiple users.
That's my take on it anyway.
I once used "Dude" for the User table - same short number of characters, no conflict with keywords, still a reference to a generic human. If I weren't concerned about the stuffy heads that might see the code, I would have kept it that way.
I've always used singular simply because that's what I was taught. However, while creating a new schema recently, for the first time in a long time, I actively decided to maintain this convention simply because... it's shorter. Adding an 's' to the end of every table name seems as useless to me as adding 'tbl_' in front of every one.
This may be a bit redundant, but I would suggest being cautious. Not necessarily that it's a bad thing to rename tables, but standardization is just that; a standard -- this database may already be "standardized", however badly :) -- I would suggest consistency to be a better goal given that this database already exists and presumably it consists of more than just 2 tables.
Unless you can standardize the entire database, or at least are planning to work towards that end, I suspect that table names are just the tip of the iceberg and concentrating on the task at hand, enduring the pain of poorly named objects, may be in your best interest --
Practical consistency sometimes is the best standard... :)
my2cents ---
As others have mentioned here, conventions should be a tool for adding to the ease of use and readability. Not as a shackle or a club to torture developers.
That said, my personal preference is to use singular names for both tables and columns. This probably comes from my programming background. Class names are generally singular unless they are some sort of collection. In my mind I am storing or reading individual records in the table in question, so singular makes sense to me.
This practice also allows me to reserve plural table names for those that store many-to-many relationships between my objects.
I try to avoid reserved words in my table and column names, as well. In the case in question here it makes more sense to go counter to the singular convention for Users to avoid the need to encapsulate a table that uses the reserved word of User.
I like using prefixes in a limited manner (tbl for table names, sp_ for proc names, etc), though many believe this adds clutter. I also prefer CamelBack names to underscores because I always end up hitting the + instead of _ when typing the name. Many others disagree.
Here is another good link for naming convention guidelines: http://www.xaprb.com/blog/2008/10/26/the-power-of-a-good-sql-naming-convention/
Remember that the most important factor in your convention is that it make sense to the people interacting with the database in question. There is no "One Ring to Rule Them All" when it comes to naming conventions.
Possible alternatives:
Rename the table SystemUser
Use brackets
Keep the plural table names.
IMO using brackets is technically the safest approach, though it is a bit cumbersome. IMO it's 6 of one, half-a-dozen of the other, and your solution really just boils down to personal/team preference.
My take is in semantics depending on how you define your container. For example, A "bag of apples" or simply "apples" or an "apple bag" or "apple".
Example:
a "college" table can contain 0 or more colleges
a table of "colleges" can contain 0 or more collegues
a "student" table can contain 0 or more students
a table of "students" can contain 0 or more students.
My conclusion is that either is fine but you have to define how you (or people interacting with it) are going to approach when referring to the tables; "a x table" or a "table of xs"
I think using the singular is what we were taught in university. But at the same time you could argue that unlike in object oriented programming, a table is not an instance of its records.
I think I'm tipping in favour of the singular at the moment because of plural irregularities in English. In German it's even worse due to no consistent plural forms - sometimes you cannot tell if a word is plural or not without the specifying article in front of it (der/die/das). And in Chinese languages there are no plural forms anyway.
I only use nouns for my table names that are spelled the same, whether singular or plural:
moose
fish
deer
aircraft
you
pants
shorts
eyeglasses
scissors
species
offspring
I did not see this clearly articulated in any of the previous answers. Many programmers have no formal definition in mind when working with tables. We often communicate intuitively in terms of of "records" or "rows". However, with some exceptions for denormalized relations, tables are usually designed so that the relation between the non-key attributes and the key constitutes a set theoretic function.
A function can be defined as a subset of a cross-product between two sets, in which each element of the set of keys occurs at most once in the mapping. Hence the terminology arising from from that perspective tends to be singular. One sees the same singular (or at least, non-plural) convention across other mathematical and computational theories involving functions (algebra and lambda calculus for instance).
I always thought that was a dumb convention. I use plural table names.
(I believe the rational behind that policy is that it make it easier for ORM code generators to produce object & collection classes, since it is easier to produce a plural name from a singular name than vice-versa)