Data "derivation" in Sql - sql

It seems like a really basic concept, but I'm not even sure what to search for here. The few tries I've done all failed spectacularly.
Basically I have the following:
parameter(id PK, type_id, data...)
parameter_string(id PK, string data...)
parameter_float(id PK, float data...)
// etc, lots of them
parameter.id uniquely identifies the parameter itself, and parameter.type_id uniquely identifies one of the various tables containing data specific to that type. All the "derived" tables' id fields are unique across all of them, so simply selecting an id uniquely identifies the type right away.
Another requirement is that the same parameter subclass can be "part of" different parameters, and this is achieved because the parameter.type_id field can point to the same parameter "subclass" id.
So the way I'd like to represent that is to have each parameter's id field be a foreign key linked to parameter.type_id, but I get an error complaining that "The columns in the table 'parameter' do not match an existing primary key or UNIQUE contraint". And yes, that's obviously true. However I don't understand why that's a requirement, and how I can work around it and still keep something similar to my structure.

I need to guess a bit about what your question is, because you are not very specific about what exactly it is you want to ask ;-)
For me it seems that you can build your data model like this. To query all of your data at once you'd have to join all subclasses such as
select parameter.id, parameter.data, parameter_string.data, parameter_float.data, ...
from parameter left outer join parameter_string on parameter.type_id = parameter_string.id
left outer join parameter_float on parameter.type_id = parameter_float.id
-- ...
Or if you only need the string parameter values, just do an inner join instead.
Regarding your second issue/question: it is the other way around: parameter_string.id is the primary key of the parameter_string table.
parameter.type_id is the foreign key, but because it references several tables at once, you cannot have this as a foreign key constraint in that sense...
To learn more, you could do a search for something like relational database model generalization.

Related

is it necessary to have foreign key for simple tables

have a table called RoundTable
It has the following columns
RoundName
RoundDescription
RoundType
RoundLogo
Now the RoundType will be having values like "Team", "Individual", "Quiz"
is it necessary to have a one more table called "RoundTypes" with columns
TypeID
RoundType
and remove the RoundType from the rounds table and have a column "TypeID" which has a foreign key to this RoundType table?
Some say that if you have the RoundType in same table it is like hard-coding as there will be lot of round types in future.
is it like if there are going to be only 2-3 round types, i need not have foreign key??
Is it necessary? Obviously not. SQL works fine either way. In a properly defined database, you would do one of two things for RoundType:
Have a lookup table
Have a constraint that checks that values are within an agreed upon set (and I would put enums into this category)
If you have a lookup table, I would advocate having an auto-incremented id (called RoundTypeId) for it. Remember, that in a larger database, such a table would often have more than two columns:
CreatedAt -- when it was created
CreatedBy -- who created it
CreatedOn -- where it was created (important for distributed systems)
Long name
In a more advanced system, you might also need to internationalize the system -- that is, make it work for multiple languages. Then you would be looking up the actual string value in other tables.
is it like if there are going to be only 2-3 round types, i need not
have foreign key??
Usually it's just the opposite: If you have a different value for most of the records (like in a "lastName" column) you won't use a lookup table.
If, however, you know that you will have a limited set of allowed/possible values, a lookup table referenced via a foreign key is probably the better solution.
Maybe read up on "database normalization", starting perhaps # Wikipedia.
Actually you need to have separate table if you have following association between entities,
One to many
Many to many
because of virtue of these association simple DBMS becomes **R**DBMS ( Relation .)
Now ask simple question,
Whether my single record in round table have multiple roundTypes?
If so.. Make a new table and have foreign key in ROUNDTable.
Otherwise no.
yeah I think you should normalize it. Because if you will not do so then definitely you have to enter the round types (value) again and again for each record which is not good practice at all in case if you have large data. so i will suggest you to make another table
however later on you can make a view to get the desired result as fallow
create view vw_anyname
as
select RoundName, RoundDescription , RoundLogo, RoundType from roundtable join tblroundtype
on roundtable.TypeID = tblroundtype .typeid
select * from vw_anyname

Why is prefixing column names with the table name a convention?

If this is a duplicate I am sorry, I tried looking but this is an odd question to word.
I have seen this convention in many databases, but is seems redundant to me. I have found a few answers that say it is to reduce confusion during complex joins, but this doesn't seem like a sufficient reason. If you are making complex joins, make aliases. Do joins really represent such a common task that we should make standard tasks like selects, inserts, and updates redundant?
I don't think there is actually a convention of prefixing column names with the table name.
As Philippe Grondier details, the 'proper' approach to data modelling is to first create a dictionary of data element names. Following the international standard ISO 11179 guidelines:
[Object] [Qualifier] Property RepresentationTerm
you end up with data elements that are fully qualified. Here the qualifier elements Object, Qualifier and sometimes Property are in combination what you consider to be the 'prefix'.
On implementation of the data model in SQL, the table name can provide the context and leads the designer to drop the qualifying terms from the column name. I think this is convention you prefer.**
In other words, in the convention you are questioning it is not that the table name has been prefixed to the column name, rather it is that the qualifying terms have been retained.
** whether or not yours or any other is a good convention is subjective and Stackoverflow is not the place for such discussion. However, I will mention in passing that retaining qualification terms does have a practical benefits (as well as being theoretically sound) e.g. consider that SQL's NATURAL JOIN lends itself to columns that are named consistently throughout the schema.
It is true that such "developped column names" methods are widely used for column naming where, for example, Tbl_Person will have an id_Person primary key column, and a personName text column.
Though it might seem at first quite painfull to write 'developped' column names like "id_Person", "personName", "personAdress", etc, everything gets clearer when you have to write SELECT's on multiple tables, which is something that happens each time you open a form or a report.
There is also a theoretical/historical dimension to this "developped column names" method. First relational databases theories and methods (like MERISE) were proposing, as a first step, to build the so-called "data dictionary", ie the list of all data to be manipulated by the app\database.
This dictionary has to be established even before any "Entity-Relation" model is proposed. data names/descriptions have then to be fully developped, this to avoid confusion between 'similar' data entries, like, for example "companyName" and "personName".
Thus, the "developped column names" convention reflects the fact that, at the data level, similar columns (such as a Company.name and a Person.name columns) are not as equivalent as they seem to be. Though they both look like being here to hold a name, one of them is made to hold a company name, while the other is made to hold a person's name!
This convention can then be considered as a way to reflect the exact meaning of each of the database's column, or to reflect the exact meaning of each entry in the data dictionary.
I've never seen the full table name prefixed, but usually at least an abbreviation. And you're exactly right, it's for simplicity in joins and the like. It's easier to write ur_id all the time than it is to write id sometimes and userrights.id other times, for example. It's not that uncommon to need to access more than one table at a time.
Join is part of a select, so that comparison doesn't hold.
That aside, I don't think you should prefix the field with the table name, except for primary keys. I like to give every table a surrogate key, which I rather name after the table. So the table 'Orders' will get an 'OrderId' PK. An order line will have a foreign key OrderId to point to the order. That way, the field names are the same across tables, and you can tell by the name, which data it presents. You could name the field just 'Id' in all tables, but you do have to read the alias to see which ID you mean. Some queries I wrote are over 400 lines. You don't want to rely on table aliases alone. A little context in the fieldname itself does help.
It's not a convention; some people do it, some people don't. More often I see an ID column prefixed with the table name, but no other columns. Some (all?) DBs also allow prefixing with the table name in queries, but it's neither required, nor part of the actual column name.
In addition to what others said, it is also makes things simpler in the presence of identifying relationships (a.k.a. identifying FOREIGN KEYs).
An identifying relationship "migrates" the parent's primary key into a part of child's primary key. Prefix ensures there will be no collision and you won't need to rename the migrated fields, even when there are multiple levels of identifying relationships. For example:
PARENT:
PARENT_NAME PK
CHILD:
PARENT_NAME PK, FK referencing PARENT
CHILD_NAME PK
GRANDCHILD:
PARENT_NAME PK, FK referencing CHILD
CHILD_NAME PK, FK referencing CHILD
GRANDCHILD_NAME PK
Keeping the same name throughout the whole data model avoids any confusion as to what the field means and where it came from.
On the other hand, prefixing can take a toll on readability, so I usually take a compromise: prefix primary key fields but leave other fields unprefixed.
I dislike such naming conventions. It encourages sloth, specifically the use of unqualified references in queries. Use an alias for each table in your query and qualify each column reference with the appropriate alias.
The only such naming convention I like has to do with primary/foreign keys:
I like to name primary keys something clever, like id.
I like to name prefix the names of foreign key columns with the name of the table containing the primary key.
It makes for much more legible SQL, IMHO. An example:
create table foo
(
id int not null primary key ,
...
)
create table bar
(
id int not null primary key ,
foo_id int not null foreign key references foo (id) ,
...
)
select *
from foo foo
join bar bar on bar.foo_id = foo.id
This scheme falls down, of course, when you get to compound keys. But I like it. YMMV.

How to best explain on what fields should a user join on?

I need to explain to somebody how they can determine what fields from multiple tables/views they should join on. Any suggestions? I know how to do it but am having difficulty trying to explain it.
One of the issues they have is they will take two fields from two tables that are the same (zip code) and join on those, when in reality they should be joining on ID columns. When they choose the wrong column to join on it increases records they receive in return.
Should I work in PK and FK somewhere?
While it is indeed typical to join a PK to an FK any conversation about JOIN clauses that only revolve around PK's and FK's is fairly limited
For example I had this FROM clause in a recent SQL answer I gave
FROM
YourTable firstNames
LEFT JOIN YourTable lastNames
ON firstnames.Name = lastNames.Name
AND lastNames.NameType =2
and firstnames.FrequencyPercent < lastNames.FrequencyPercent
The table referenced on each side of the table is the same table (a self join) and it includes three condidtions one of which is an inequality. Furthermore there would never be an FK here because its looking to join on a field, that is by design, not a Candidate Key.
Also you don't have even have to join one table to another. You can join inline queries to each other which of course can't possibly have a Key.
So in order to properly understand JOIN you just need to understand that it combines the records from two relations (tables, views, inline queries) where some conditions evaluate to true. This means you need to understand boolean logic and the database and the data in the database.
If your user is having a problem with a specific JOIN ask them to SELECT some rows from one table and also the other and then ask them under what conditions would you want to combine the rows.
You don't need to talk in terms of a primary key of a table but you should point to it and explain that it uniquely identifies a given row and that you must join to related tables using it or you could get duplicated results.
Give them examples of joining with it and joining without it.
An ER diagram showing all of the tables they use and their key relationships would help ensure that they always use the correct keys.
It sounds to me like neither you, nor the person you are trying to help understands how this particular database is constructed and perhaps don't really even understand basic database fundamentals, like PK's and FK's. Most often a PK from one table is joined to a FK to another table.
Assuming the database has the proper PK's and FK's in place, it would probably help a great deal to generate an ER diagram. That would make the joining concept much easier to grasp.
Another approach you could take is to find someone who does understand these things and create some views for this person to use. This way he doesn't need to understand how to join the tables together.
A user shouldn't typically be doing joins. A user should have an interface that lets them get the data that they need in the way that they need it. If you don't have the developer resources to do that then you're going to be stuck with this problem of having to teach a user technical details. You also need to be very careful about what kind of damage the user can do. Do they have update rights on the data? I hope they don't accidentally do a DELETE FROM Table with no WHERE clause. Even if you restrict their permissions, a poorly written query can crush the database server or block resources causing problems for other users (and more work for you).
If you have no choice, then I think that you need to certainly teach them about primary and foreign keys, even if you don't call them that. Point out that the id on your table (or whatever your PK is) identifies a row. Then explain how the id appears in other tables to show the relationship. For example, "See, in the address table we have a person_id which tells us who that address belongs to."
After that, expect to spend a large portion of your time with that user as they make mistakes or come up with other things that they want to get from the database, but which they can't figure out how to get.
From theory, and ideally, you should define primary keys on all tables, and join tables using a primary key to the matching field or fields (foreign key) in the other table.
Even if you don't define or if they're not defined as primary keys, you need to make sure the fields uniquely identify the records in the table, and that they should be properly indexed.
For example, let's say the 'person' table has a SSN and a driver's license field. The SSN could be considered and flagged as the 'primary key', but if you join that table to a 'drivers' table which might not have the SSN, but does have the driver's license #, you could join them by the driver's license field (even if it's not flagged as primary key), but you need to make sure that the field is properly indexed in both tables.
...explain to somebody how they can determine what fields from multiple tables/views they should join on.
Simply put, look for the columns with values that match between the tables/views. Preferably, match exactly but some massaging might be necessary.
The existence of foreign key constraints would help to know what matches to what, but the constraint might not be directly to the table/view that is to be joined.
The existence of a primary key doesn't mean it is the criteria that is necessary for the query, so I would overlook this detail (depending on the audience).
I would recommend attacking the desired result set by starting with the columns desired, and working back from there. If there's more than one table's columns in the result set, focus on the table whose columns should be returning distinct results first and then gradually add joins, checking the result set between each JOIN addition to confirm the results are still the same. Otherwise, need to review the JOIN or if a JOIN is actually necessary vs IN or EXISTS.
I did this when I first started out, it comes from thinking of joins as just linking tables together, so I linked at all possible points.
Once you think of joins as a way to combine AND filter the data it becomes easier to understand them.
Writing out your request as a sentence is helpful too, "I want to see all the times Table A interacted with Table B". Then build a query from that using only the ID, noting that if you wanted to know "All the times Table A was in the same zip code as Table B" then you would join by zip code.

What is the best way to tie 1 table to many tables

I have a table. Call it TableA
this table will link to many tables and ideally be enforced by database relationships in (many-1)(TableA-TableB)
(many-1)(TableA-TableC) ... etc
The solution i have is to put all the foreign keys of TableB, TableC, etc in TableA along with a "Type" field (which contains a word version of which relationship is to be enforced). however i think there must be a better way.
What would you do?
I'd appreciate any advice in this and thanks.
This is a perfectly acceptable approach - foreign keys are indeed the correct way of modeling a many-to-one relationship.
Generally, you can't just say you want to make a solution "better"; rather, you should have a specific goal in mind. Faster, shorter implementation, less memory, whatever. Even better is if you have a specific use case you would like to optimize for.
Edit: your question is more clear now that you've edited it. If I understand correctly, you feel your current implementation is inefficient because one of your TableA items can be attached to at most one other item, be it from TableC, TableC, etc.
If that is correct, what I might do is implement the foreign key in Table A as both an ID and a table name, rather than having a new column for each new type of object you want to add to your system. Of course, this would prevent you from changing table names, so a more robust solution would be to have another table mapping unique ids to object types (stored as table names). Then the foreign key in Table A would be item_id and object_type_id, and you could retrieve the object by looking up object_type_id in the object_types table to get the table name.
If you want to add referential integrity enforced by the database server, each key must be represented by a unique column in TableA.
It's hard to give more advice than that without knowing more about what your design is attempting to do.

Naming of ID columns in database tables

I was wondering peoples opinions on the naming of ID columns in database tables.
If I have a table called Invoices with a primary key of an identity column I would call that column InvoiceID so that I would not conflict with other tables and it's obvious what it is.
Where I am workind current they have called all ID columns ID.
So they would do the following:
Select
i.ID
, il.ID
From
Invoices i
Left Join InvoiceLines il
on i.ID = il.InvoiceID
Now, I see a few problems here:
1. You would need to alias the columns on the select
2. ID = InvoiceID does not fit in my brain
3. If you did not alias the tables and referred to InvoiceID is it obvious what table it is on?
What are other peoples thoughts on the topic?
I always prefered ID to TableName + ID for the id column and then TableName + ID for a foreign key. That way all tables have a the same name for the id field and there isn't a redundant description. This seems simpler to me because all the tables have the same primary key field name.
As far as joining tables and not knowing which Id field belongs to which table, in my opinion the query should be written to handle this situation. Where I work, we always prefece the fields we use in a statement with the table/table alias.
Theres been a nerd fight about this very thing in my company of late. The advent of LINQ has made the redundant tablename+ID pattern even more obviously silly in my eyes. I think most reasonable people will say that if you're hand writing your SQL in such a manner as that you have to specify table names to differentiate FKs then it's not only a savings on typing, but it adds clarity to your SQL to use just the ID in that you can clearly see which is the PK and which is the FK.
E.g.
FROM Employees e
LEFT JOIN Customers c ON e.ID = c.EmployeeID
tells me not only that the two are linked, but which is the PK and which is the FK. Whereas in the old style you're forced to either look or hope that they were named well.
ID is a SQL Antipattern.
See http://www.amazon.com/s/ref=nb_sb_ss_i_1_5?url=search-alias%3Dstripbooks&field-keywords=sql+antipatterns&sprefix=sql+a
If you have many tables with ID as the id you are making reporting that much more difficult. It obscures meaning and makes complex queries harder to read as well as requiring you to use aliases to differentiate on the report itself.
Further if someone is foolish enough to use a natural join in a database where they are available, you will join to the wrong records.
If you would like to use the USING syntax that some dbs allow, you cannot if you use ID.
If you use ID you can easily end up with a mistaken join if you happen to be copying the join syntax (don't tell me that no one ever does this!)and forget to change the alias in the join condition.
So you now have
select t1.field1, t2.field2, t3.field3
from table1 t1
join table2 t2 on t1.id = t2.table1id
join table3 t3 on t1.id = t3.table2id
when you meant
select t1.field1, t2.field2, t3.field3
from table1 t1
join table2 t2 on t1.id = t2.table1id
join table3 t3 on t2.id = t3.table2id
If you use tablenameID as the id field, this kind of accidental mistake is far less likely to happen and much easier to find.
We use InvoiceID, not ID. It makes queries more readable -- when you see ID alone it could mean anything, especially when you alias the table to i.
I agree with Keven and a few other people here that the PK for a table should simply be Id and foreign keys list the OtherTable + Id.
However I wish to add one reason which recently gave more weight to this arguement.
In my current position we are employing the entity framework using POCO generation. Using the standard naming convention of Id the the PK allows for inheritance of a base poco class with validation and such for tables which share a set of common column names. Using the Tablename + Id as the PK for each of these tables destroys the ability to use a base class for these.
Just some food for thought.
It's not really important, you are likely to run into simalar problems in all naming conventions.
But it is important to be consistent so you don't have to look at the table definitions every time you write a query.
My preference is also ID for primary key and TableNameID for foreign key. I also like to have a column "name" in most tables where I hold the user readable identifier (i.e. name :-)) of the entry. This structure offers great flexibility in the application itself, I can handle tables in mass, in the same way. This is a very powerful thing. Usually an OO software is built on top of the database, but the OO toolset cannot be applied because the db itself does not allow it. Having the columns id and name is still not very good, but it is a step.
Select
i.ID , il.ID From
Invoices i
Left Join InvoiceLines il
on i.ID = il.InvoiceID
Why cant I do this?
Select
Invoices.ID
, InvoiceLines.ID
From
Invoices
Left Join InvoiceLines
on Invoices.ID = InvoiceLines.InvoiceID
In my opinion this is very much readable and simple. Naming variables as i and il is a poor choice in general.
I just started working in a place that uses only "ID" (in the core tables, referenced by TableNameID in foreign keys), and have already found TWO production problems directly caused by it.
In one case the query used "... where ID in (SELECT ID FROM OtherTable ..." instead of "... where ID in (SELECT TransID FROM OtherTable ...".
Can anyone honestly say that wouldn't have been much easier to spot if full, consistent names were used where the wrong statement would have read "... where TransID in (SELECT OtherTableID from OtherTable ..."? I don't think so.
The other issue occurs when refactoring code. If you use a temp table whereas previously the query went off a core table then the old code reads "... dbo.MyFunction(t.ID) ..." and if that is not changed but "t" now refers to a temp table instead of the core table, you don't even get an error - just erroneous results.
If generating unnecessary errors is a goal (maybe some people don't have enough work?), then this kind of naming convention is great. Otherwise consistent naming is the way to go.
I personally prefer (as it has been stated above) the Table.ID for the PK and TableID for the FK. Even (please don't shoot me) Microsoft Access recommends this.
HOWEVER, I ALSO know for a fact that some generating tools favor the TableID for PK because they tend to link all column name that contain 'ID' in the word, INCLUDING ID!!!
Even the query designer does this on Microsoft SQL Server (and for each query you create, you end up ripping off all the unnecessary newly created relationships on all tables on column ID)
THUS as Much as my internal OCD hates it, I roll with the TableID convention. Let's remember that it's called a Data BASE, as it will be the base for hopefully many many many applications to come. And all technologies Should benefit of a well normalized with clear description Schema.
It goes without saying that I DO draw my line when people start using TableName, TableDescription and such. In My opinion, conventions should do the following:
Table name: Pluralized. Ex. Employees
Table alias: Full table Name, singularized. Ex.
SELECT Employee.*, eMail.Address
FROM Employees AS Employee LEFT JOIN eMails as eMail on Employee.eMailID = eMail.eMailID -- I would sure like it to just have the eMail.ID here.... but oh well
[Update]
Also, there are some valid posts in this thread about duplicated columns due of the "kind of relationship" or role. Example, if a Store has an EmployeeID, that tells me squat. So I sometimes do something like Store.EmployeeID_Manager. Sure it's a bit larger but at leas people won't go crazy trying to find table ManagerID, or what EmployeeID is doing there. When querying is WHERE I would simplify it as:
SELECT EmployeeID_Manager as ManagerID FROM Store
For the sake of simplicity most people name the column on the table ID. If it has a foreign key reference on another table, then they explicity call it InvoiceID (to use your example) in the case of joins, you are aliasing the table anyway so the explicit inv.ID is still simpler than inv.InvoiceID
Coming at this from the perspective of a formal data dictionary, I would name the data element invoice_ID. Generally, a data element name will be unique in the data dictionary and ideally will have the same name throughout, though sometimes additional qualifying terms may be required based on context e.g. the data element named employee_ID could be used twice in the org chart and therefore qualified as supervisor_employee_ID and subordinate_employee_ID respectively.
Obviously, naming conventions are subjective and a matter of style. I've find ISO/IEC 11179 guidelines to be a useful starting point.
For the DBMS, I see tables as collections of entites (except those that only ever contain one row e.g. cofig table, table of constants, etc) e.g. the table where my employee_ID is the key would be named Personnel. So straight away the TableNameID convention doesn't work for me.
I've seen the TableName.ID=PK TableNameID=FK style used on large data models and have to say I find it slightly confusing: I much prefer an identifier's name be the same throughout i.e. does not change name based on which table it happens to appear in. Something to note is the aforementioned style seems to be used in the shops which add an IDENTITY (auto-increment) column to every table while shunning natural and compound keys in foreign keys. Those shops tend not to have formal data dictionaries nor build from data models. Again, this is merely a question of style and one to which I don't personally subscribe. So ultimately, it's not for me.
All that said, I can see a case for sometimes dropping the qualifier from the column name when the table's name provides a context for doing so e.g. the element named employee_last_name may become simply last_name in the Personnel table. The rationale here is that the domain is 'people's last names' and is more likely to be UNIONed with last_name columns from other tables rather than be used as a foreign key in another table, but then again... I might just change my mind, sometimes you can never tell. That's the thing: data modelling is part art, part science.
My vote is for InvoiceID for the table ID. I also use the same naming convention when it's used as a foreign key and use intelligent alias names in the queries.
Select Invoice.InvoiceID, Lines.InvoiceLine, Customer.OrgName
From Invoices Invoice
Join InvoiceLines Lines on Lines.InvoiceID = Invoice.InvoiceID
Join Customers Customer on Customer.CustomerID = Invoice.CustomerID
Sure, it's longer than some other examples. But smile. This is for posterity and someday, some poor junior coder is going to have to alter your masterpiece. In this example there is no ambiguity and as additional tables get added to the query, you'll be grateful for the verbosity.
FWIW, our new standard (which changes, uh, I mean "evolves", with every new project) is:
Lower case database field names
Uppercase table names
Use underscores to separate words in the field name - convert these to Pascal case in code.
pk_ prefix means primary key
_id suffix means an integer, auto-increment ID
fk_ prefix means foreign key (no suffix necessary)
_VW suffix for views
is_ prefix for booleans
So, a table named NAMES might have the fields pk_name_id, first_name, last_name, is_alive, and fk_company and a view called LIVING_CUSTOMERS_VW, defined like:
SELECT first_name, last_name
FROM CONTACT.NAMES
WHERE (is_alive = 'True')
As others have said, though, just about any scheme will work as long as it is consistent and doesn't unnecessarily obfuscate your meanings.
There are lots of answers on this already, but I wanted to add two major things that I haven't seen above:
Customers coming to you for support.
Many times a customer or user or even dev of another department have hit a snag and have contacted us saying they're having a problem doing an operation. We ask them what record they're having a problem with. Now, the data they see on the screen, e.g. a grid with customer name, number of orders, destination etc is an aggregate of many tables. They say they've having trouble with id 83. There's no way to know what id that is, which table it is, if it's just called 'id'.
Namely, a row of data does not give any indication which table it is from. Unless you happen to know the schema of your database well, which is rarely the case on complex systems or non-greenfield systems you've been told to take over, you don't know what id=83 means even if you have more data like name, address, etc (which might not even be in the same table!).
This id could be coming from a grid, or it could be coming from an error in your API, or a faulty query dumping the error message to the screen, or to a log file.
Often a developer just dumps 'ID' into a column and forgets about it, and often DBs have many similar tables like Invoice, InvoiceGrouping, InvoicePlan and the ID could be for any of them. In frustration you look in the code to see which one it is, and see that they've called it Id on the model as well, so you then have to dig into how the model for the page was constructed. I cannot count how many times I've had to do this to figure out what an Id is. It's a lot. Sometimes you have to dig out a SPROC as well that just returns 'Id' as a header. Nightmare.
Log files are easier when it's clear what went wrong
Often SQL can give pretty crappy error messages. "Could not insert item with ID 83, column would be truncated" or something like that is very hard to debug. Often error messages are not very helpful, but usually the thing that broke will make a vague attempt to tell you what record was broken by just dumping out the primary key name and the value. If it's "ID" then it doesn't really help at all.
This is just two things that I didn't feel were mentioned in the other answers.
I also think that a lot of comments are 'if you program in X way then this isn't an issue', and I think the points above (and other points on this question) are valid specifically because of the way people program and because they don't have the time, energy, budget and foresight to program in perfect logging and error handling or change engrained habits of quick SQL and code writing.
I definitely agree with including the table name in the ID field name, for exactly the reasons you give. Generally, this is the only field where I would include the table name.
I do hate the plain id name. I strongly prefer to always use the invoice_id or a variant thereof. I always know which table is the authoritative table for the id when I need to, but this confuses me
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.InvoiceID = inv.ID
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.ID = inv.InvoiceLineID
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.ID = inv.InvoiceID
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.InvoiceLineID = inv.ID
What's worst of all is the mix you mention, totally confusing. I've had to work with a database where almost always it was foo_id except in one of the most used ids. That was total hell.
I think you can use anything for the "ID" as long as you're consistent. Including the table name is important to. I would suggest using a modeling tool like Erwin to enforce the naming conventions and standards so when writing queries it's easy to understand the relationships that may exist between tables.
What I mean by the first statement is, instead of ID you can use something else like 'recno'. So then this table would have a PK of invoice_recno and so on.
Cheers,
Ben
For the column name in the database, I'd use "InvoiceID".
If I copy the fields into a unnamed struct via LINQ, I may name it "ID" there, if it's the only ID in the structure.
If the column is NOT going to be used in a foreign key, so that it's only used to uniquely identify a row for edit editing or deletion, I'll name it "PK".
If you give each key a unique name, e.g. "invoices.invoice_id" instead of "invoices.id", then you can use the "natural join" and "using" operators with no worries. E.g.
SELECT * FROM invoices NATURAL JOIN invoice_lines
SELECT * FROM invoices JOIN invoice_lines USING (invoice_id)
instead of
SELECT * from invoices JOIN invoice_lines
ON invoices.id = invoice_lines.invoice_id
SQL is verbose enough without making it more verbose.
What I do to keep things consistent for myself (where a table has a single column primary key used as the ID) is to name the primary key of the table Table_pk. Anywhere I have a foreign key pointing to that tables primary key, I call the column PrimaryKeyTable_fk. That way I know that if I have a Customer_pk in my Customer table and a Customer_fk in my Order table, I know that the Order table is referring to an entry in the Customer table.
To me, this makes sense especially for joins where I think it reads easier.
SELECT *
FROM Customer AS c
INNER JOIN Order AS c ON c.Customer_pk = o.Customer_fk
I prefer DomainName || 'ID'. (i.e. DomainName + ID)
DomainName is often, but not always, the same as TableName.
The problem with ID all by itself is that it doesn't scale upwards. Once you have about 200 tables, each with a first column named ID, the data begins to look all alike. If you always qualify ID with the table name, that helps a little, but not that much.
DomainName & ID can be used to name foreign keys as well as primary keys. When foriegn keys are named after the column that they reference, that can be of mnemonic assistance. Formally, tying the name of a foreign key to the key it references is not necessary, since the referential integrity constrain will establish the reference. But it's awfully handy when it comes to reading queries and updates.
Occasionally, DomainName || 'ID' can't be used, because there would be two columns in the same table with the same name. Example: Employees.EmployeeID and Employees.SupervisorID. In those cases, I use RoleName || 'ID', as in the example.
Last but not least, I use natural keys rather than synthetic keys when possible. There are situations where natural keys are unavailable or untrustworthy, but there are plenty of situations where the natural key is the right choice. In those cases, I let the natural key take on the name it would naturally have. This name often doesn't even have the letters, 'ID' in it. Example: OrderNo where No is an abbreviation for "Number".
For each table I choose a tree letter shorthand(e.g. Employees => Emp)
That way a numeric autonumber primary key becomes nkEmp.
It is short, unique in the entire database and I know exactly its properties at a glance.
I keep the same names in SQL and all languages I use (mostly C#, Javascript, VB6).
See the Interakt site's naming conventions for a well thought out system of naming tables and columns. The method makes use of a suffix for each table (_prd for a product table, or _ctg for a category table) and appends that to each column in a given table. So the identity column for the products table would be id_prd and is therefore unique in the database.
They go one step further to help with understanding the foreign keys: The foreign key in the product table that refers to the category table would be idctg_prd so that it is obvious to which table it belong (_prd suffix) and to which table it refers (category).
Advantages are that there is no ambiguity with the identity columns in different tables, and that you can tell at a glance which columns a query is referring to by the column names.
You could use the following naming convention. It has its flaws but it solves your particular problems.
Use short (3-4 characters) nicknames for the table names, i.e. Invoice - inv, InvoiceLines - invl
Name the columns in the table using those nicknames, i.e. inv_id, invl_id
For the reference columns use invl_inv_id for the names.
this way you could say
SELECT * FROM Invoice LEFT JOIN InvoiceLines ON inv_id = invl_inv_id