Create database model using columns names (no FK, no relation) - sql

I have a database with no FKs, PKs and not any documentation to show what tables relates with each other. So i'm in big trouble to do a reverse engineering on the physical data to mind how does one table relates to another one. Does anyone knows a tool to create a model based only by the names of columns?
Example:
Table A have a column with name ID_NTP_BAZINGA that relates to table B with the column name ID_NTP_BAZINGA.

Hm, it is hard task to do (and I kind of have done it myself), but I can think of some hints to automate your work, for example commands like sp_msforeachdb or sp_msforeachtable might be handy.
To determine FK relations in a way you mentioned, you could use such query:
select * from (
select object_name(object_id) [table],
name,
count(*) over (partition by name) [cnt]
from [DB_name].sys.columns
) a where cnt > 1
which, in your case, would return (among others)
Table A | ID_NTP_BAZINGA
Table B | ID_NTP_BAZINGA
and give you already some insight.
For candidates for PK, one could use sp_msforeachtable with dynamic SQL checking if count(distinct *) is equal to count(*) - this would tell you whether you have unique values or not, and in the end you would count(*) with is not null filter in where clause, so you'll know if you have nulls in particular column.
These are some general hints and exact queries you have to write yourself. This answer will get you started.

This is data archaeology. It's ironic because one of the reasons for developing databases in the first place was to guarantee that data would be better documented than it had been in files and records. One of the best handles on the data model in this case is the application code.
Look at the SQL used by the application, especially at the queries. Retrieval is when data gets turned into information, and that's where you'll get your clues. Pay special attention to the ON and WHERE clauses. You may be able to figure out while columns were functioning as PKs and FKs, even if they weren't declared as such.

Related

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?
You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.
There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.
I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.
Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.
a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp
This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.
You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

Multiple record types and how to split them amongst tables

I'm working on a database structure and trying to imagine the best way to split up a host of related records into tables. Records all have the same base type they inherit from, but each then expands on it for their particular use.
These 4 properties are present for every type.
id, name, groupid, userid
Here are the types that expand off those 4 properties.
"Static": value
"Increment": currentValue, maxValue, overMaxAllowed, underNegativeAllowed
"Target": targetValue, result, lastResult
What I tried initially was to create a "records" table with the 4 base properties in it. I then created 3 other tables named "records_static/increment/target", each with their specific properties as columns. I then forged relationships between a "rowID" column in each of these secondary tables with the main table's "id".
Populating the tables with dummy data, I am now having some major problems attempting to extract the data with a query. The only parameter is the userid, beyond that what I need is a table with all of the columns and data associated with the userid.
I am unsure if I should abandon that table design, or if I just am going about the query incorrectly.
I hope I explained that well enough, please let me know if you need additional detail.
Make the design as simple as possible.
First I'd try a single table that contains all attributes that might apply to a record. Irrelevant attributes can be null. You can enforce null values for a specific type with a check constraint.
If that doesn't work out, you can create three tables for each record type, without a common table.
If that doesn't work out, you can create a base table with 1:1 extension tables. Be aware that querying that is much harder, requiring join for every operation:
select *
from fruit f
left join
apple a
on a.fruit_id = f.id
left join
pear p
on p.fruit_id = f.id
left join
...
The more complex the design, the more room for an inconsistent database state. The second option you could have a pear and an apple with the same id. In the third option you can have missing rows in either the base or the extension table. Or the tables can contradict each other, for example a base row saying "pear" with an extension row in the Apple table. I fully trust end users to find a way to get that into your database :)
Throw out the complex design and start with the simplest one. Your first attempt was not a failure: you now know the cost of adding relations between tables. Which can look deceptively trivial (or even "right") at design time.
This is a typical "object-oriented to relational" mapping problem. You can find books about this. Also a lot of google hits like
http://www.ibm.com/developerworks/library/ws-mapping-to-rdb/
The easiest for you to implement is to have one table containing all columns necessary to store all your types. Make sure you define them as nullable. Only the common columns can be not null if necessary.
Just because object share some of the same properties does not mean you need to have one table for both objects. That leads to unnecessary right outer joins that have a 1 to 1 relationship which is not what I think of as good database design.
but...
If you want to continue in your fashion I think all you need is a primary key in the table with common columns "id, name, groupid, userid" (I assume ID) then that would be the foreign key to your table with currentValue, maxValue, overMaxAllowed, underNegativeAllowed

Is there any use to duplicate column names in a table?

In sqlite3, I can force two columns to alias to the same name, as in the following query:
SELECT field_one AS overloaded_name,
field_two AS overloaded_name
FROM my_table;
It returns the following:
overloaded_name overloaded_name
--------------- ---------------
1 2
3 4
... ...
... and so on.
However, if I create a named table using the same syntax, it appends one of the aliases with a :1:
sqlite> CREATE TABLE temp AS
SELECT field_one AS overloaded_name,
field_two AS overloaded_name
FROM my_table;
sqlite> .schema temp
CREATE TABLE temp(
overloaded_name TEXT,
"overloaded_name:1" TEXT
);
I ran the original query just to see if this was possible, and I was surprised that it was allowed. Is there any good reason to do this? Assuming there isn't, why is this allowed at all?
EDIT:
I should clarify: the question is twofold: why is the table creation allowed to succeed, and (more importantly) why is the original select allowed in the first place?
Also, see my clarification above with respect to table creation.
I can force two columns to alias to the same name...
why is [this] allowed in the first place?
This can be attributed to the shackles of compatibility. In the SQL Standards, nothing is ever deprecated. An early version of the Standard allowed the result of a table expression to include columns with duplicate names, probably because an influential vendor had allowed it, possibly due to the inclusion of a bug or the omission of a design feature, and weren't prepared to take the risk of breaking their customers' code (the shackles of compatibility again).
Is there any use to duplicate column names in a table?
In the relational model, every attribute of every relation has a name that is unique within the relevant relation. Just because SQL allows duplicate column names that doesn't mean that as a SQL coder you should utilise such as feature; in fact I'd say you have to vigilant not to invoke this feature in error. I can't think of any good reason to have duplicate column names in a table but I can think of many obvious bad ones. Such a table would not be a relation and that can't be a good thing!
why is the [base] table creation allowed to succeed
Undoubtedly an 'extension' to (a.k.a purposeful violation of) the SQL Standards, I suppose it could be perceived as a reasonable feature: if I attempt to create columns with duplicate names the system automatically disambigutes them by suffixing an ordinal number. In fact, the SQL Standard specifies that there be an implementation dependent way to ensure the result of a table expression does not implicitly have duplicate column names (but as you point out in the question this does not perclude the user from explicitly using duplicate AS clauses). However, I personally think the Standard behaviour of disallowing the duplicate name and raising an error is the correct one. Aside from the above reasons (i.e. that duplicate columns in the same table are of no good use), a SQL script that creates an object without knowing if the system has honoured that name will be error prone.
The table itself can't have duplicate column names because inserting and updating would be messed up. Which column gets the data?
During selects the "duplicates" are just column labels so do not hurt anything.
I assume you're talking about the CREATE TABLE ... AS SELECT command. This looks like an SQL extension to me.
Standard SQL does not allow you to use the same column name for different columns, and SQLite appears to be allowing that in its extension, but working around it. While a simple, naked select statement simply uses as to set the column name, create table ... as select uses it to create a brand new table with those column names.
As an aside, it would be interesting to see what the naked select does when you try to use the duplicated column, such as in an order by clause.
If you were allowed to have multiple columns with the same name, it would be a little difficult for the execution engine to figure out what you meant with:
select overloaded_name from table;
The reason why you can do it in the select is to allow things like:
select id, surname as name from users where surname is not null
union all
select id, firstname as name from users where surname is null
so that you end up with a single name column.
As to whether there's a good reason, SQLite is probably assuming you know what you're doing when you specify the same column name for two different columns. Its essence seems to be to allow a great deal of latitude to the user (as evidenced by the fact that the columns are dynamically typed, for example).
The alternative would be to simply refuse your request, which is what I'd prefer, but the developers of SQLite are probably more liberal (or less anal-retentive) than I :-)

Are relationship tables really needed?

Relationship tables mostly contain two columns: IDTABLE1, and IDTABLE2.
Only thing that seems to change between relationship tables is the names of those two columns, and table name.
Would it be better if we create one table Relationships and in this table we place 3 columns:
TABLE_NAME, IDTABLE1, IDTABLE2, and then use this table for all relationships?
Is this a good/acceptable solution in web/desktop application development? What would be downside of this?
Note:
Thank you all for feedback. I appreciate it.
But, I think you are taking it a bit too far... Every solution works until one point.
As data storage simple text file is good till certain point, than excel is better, than MS Access, than SQL Server, than...
To be honest, I haven't seen any argument that states why this solution is bad for small projects (with DB size of few GB).
It would be a monster of a table; it would also be cumbersome. Performance-wise, such a table would not be a great idea. Also, foreign keys are impossible to add to such a table. I really can't see a lot of advantages to such a solution.
Bad idea.
How would you enforce the foreign keys if IDTABLE1 could contain ids from any table at all?
To achieve acceptable performance on joins without a load of unnecessary IO to bring in completely unrelated rows you would need a composite index with leading column TABLE_NAME that basically ends up partitioning the table into sections anyway.
Obviously even with this pseudo partitioning going on you would still be wasting a lot of space in the table/indexes just repeating the table name for each row.
Isn't it a big IF that you're only going to store the 2 ID fields? If I have a StudentCourse (or better yet Enrollment) table that has StudentID & CourseID, but wouldn't EnrollmentDate go in this table as well since not all students enroll on the first day of class. Seems like a bad idea to add this column to an already bloated table where most records will be null.
The benefit of a single table could be a requirement that the application has the ability to allow user/admin to create these relationships with data (Similar to have a single lookup or reference list table) and avoid having to create a new table to address these User Created References. Needing dynamic querying may benefit as well. An application that requires such dynamic data structure requirements might be better suited for a schemaless or nosql database.

Naming of ID columns in database tables

I was wondering peoples opinions on the naming of ID columns in database tables.
If I have a table called Invoices with a primary key of an identity column I would call that column InvoiceID so that I would not conflict with other tables and it's obvious what it is.
Where I am workind current they have called all ID columns ID.
So they would do the following:
Select
i.ID
, il.ID
From
Invoices i
Left Join InvoiceLines il
on i.ID = il.InvoiceID
Now, I see a few problems here:
1. You would need to alias the columns on the select
2. ID = InvoiceID does not fit in my brain
3. If you did not alias the tables and referred to InvoiceID is it obvious what table it is on?
What are other peoples thoughts on the topic?
I always prefered ID to TableName + ID for the id column and then TableName + ID for a foreign key. That way all tables have a the same name for the id field and there isn't a redundant description. This seems simpler to me because all the tables have the same primary key field name.
As far as joining tables and not knowing which Id field belongs to which table, in my opinion the query should be written to handle this situation. Where I work, we always prefece the fields we use in a statement with the table/table alias.
Theres been a nerd fight about this very thing in my company of late. The advent of LINQ has made the redundant tablename+ID pattern even more obviously silly in my eyes. I think most reasonable people will say that if you're hand writing your SQL in such a manner as that you have to specify table names to differentiate FKs then it's not only a savings on typing, but it adds clarity to your SQL to use just the ID in that you can clearly see which is the PK and which is the FK.
E.g.
FROM Employees e
LEFT JOIN Customers c ON e.ID = c.EmployeeID
tells me not only that the two are linked, but which is the PK and which is the FK. Whereas in the old style you're forced to either look or hope that they were named well.
ID is a SQL Antipattern.
See http://www.amazon.com/s/ref=nb_sb_ss_i_1_5?url=search-alias%3Dstripbooks&field-keywords=sql+antipatterns&sprefix=sql+a
If you have many tables with ID as the id you are making reporting that much more difficult. It obscures meaning and makes complex queries harder to read as well as requiring you to use aliases to differentiate on the report itself.
Further if someone is foolish enough to use a natural join in a database where they are available, you will join to the wrong records.
If you would like to use the USING syntax that some dbs allow, you cannot if you use ID.
If you use ID you can easily end up with a mistaken join if you happen to be copying the join syntax (don't tell me that no one ever does this!)and forget to change the alias in the join condition.
So you now have
select t1.field1, t2.field2, t3.field3
from table1 t1
join table2 t2 on t1.id = t2.table1id
join table3 t3 on t1.id = t3.table2id
when you meant
select t1.field1, t2.field2, t3.field3
from table1 t1
join table2 t2 on t1.id = t2.table1id
join table3 t3 on t2.id = t3.table2id
If you use tablenameID as the id field, this kind of accidental mistake is far less likely to happen and much easier to find.
We use InvoiceID, not ID. It makes queries more readable -- when you see ID alone it could mean anything, especially when you alias the table to i.
I agree with Keven and a few other people here that the PK for a table should simply be Id and foreign keys list the OtherTable + Id.
However I wish to add one reason which recently gave more weight to this arguement.
In my current position we are employing the entity framework using POCO generation. Using the standard naming convention of Id the the PK allows for inheritance of a base poco class with validation and such for tables which share a set of common column names. Using the Tablename + Id as the PK for each of these tables destroys the ability to use a base class for these.
Just some food for thought.
It's not really important, you are likely to run into simalar problems in all naming conventions.
But it is important to be consistent so you don't have to look at the table definitions every time you write a query.
My preference is also ID for primary key and TableNameID for foreign key. I also like to have a column "name" in most tables where I hold the user readable identifier (i.e. name :-)) of the entry. This structure offers great flexibility in the application itself, I can handle tables in mass, in the same way. This is a very powerful thing. Usually an OO software is built on top of the database, but the OO toolset cannot be applied because the db itself does not allow it. Having the columns id and name is still not very good, but it is a step.
Select
i.ID , il.ID From
Invoices i
Left Join InvoiceLines il
on i.ID = il.InvoiceID
Why cant I do this?
Select
Invoices.ID
, InvoiceLines.ID
From
Invoices
Left Join InvoiceLines
on Invoices.ID = InvoiceLines.InvoiceID
In my opinion this is very much readable and simple. Naming variables as i and il is a poor choice in general.
I just started working in a place that uses only "ID" (in the core tables, referenced by TableNameID in foreign keys), and have already found TWO production problems directly caused by it.
In one case the query used "... where ID in (SELECT ID FROM OtherTable ..." instead of "... where ID in (SELECT TransID FROM OtherTable ...".
Can anyone honestly say that wouldn't have been much easier to spot if full, consistent names were used where the wrong statement would have read "... where TransID in (SELECT OtherTableID from OtherTable ..."? I don't think so.
The other issue occurs when refactoring code. If you use a temp table whereas previously the query went off a core table then the old code reads "... dbo.MyFunction(t.ID) ..." and if that is not changed but "t" now refers to a temp table instead of the core table, you don't even get an error - just erroneous results.
If generating unnecessary errors is a goal (maybe some people don't have enough work?), then this kind of naming convention is great. Otherwise consistent naming is the way to go.
I personally prefer (as it has been stated above) the Table.ID for the PK and TableID for the FK. Even (please don't shoot me) Microsoft Access recommends this.
HOWEVER, I ALSO know for a fact that some generating tools favor the TableID for PK because they tend to link all column name that contain 'ID' in the word, INCLUDING ID!!!
Even the query designer does this on Microsoft SQL Server (and for each query you create, you end up ripping off all the unnecessary newly created relationships on all tables on column ID)
THUS as Much as my internal OCD hates it, I roll with the TableID convention. Let's remember that it's called a Data BASE, as it will be the base for hopefully many many many applications to come. And all technologies Should benefit of a well normalized with clear description Schema.
It goes without saying that I DO draw my line when people start using TableName, TableDescription and such. In My opinion, conventions should do the following:
Table name: Pluralized. Ex. Employees
Table alias: Full table Name, singularized. Ex.
SELECT Employee.*, eMail.Address
FROM Employees AS Employee LEFT JOIN eMails as eMail on Employee.eMailID = eMail.eMailID -- I would sure like it to just have the eMail.ID here.... but oh well
[Update]
Also, there are some valid posts in this thread about duplicated columns due of the "kind of relationship" or role. Example, if a Store has an EmployeeID, that tells me squat. So I sometimes do something like Store.EmployeeID_Manager. Sure it's a bit larger but at leas people won't go crazy trying to find table ManagerID, or what EmployeeID is doing there. When querying is WHERE I would simplify it as:
SELECT EmployeeID_Manager as ManagerID FROM Store
For the sake of simplicity most people name the column on the table ID. If it has a foreign key reference on another table, then they explicity call it InvoiceID (to use your example) in the case of joins, you are aliasing the table anyway so the explicit inv.ID is still simpler than inv.InvoiceID
Coming at this from the perspective of a formal data dictionary, I would name the data element invoice_ID. Generally, a data element name will be unique in the data dictionary and ideally will have the same name throughout, though sometimes additional qualifying terms may be required based on context e.g. the data element named employee_ID could be used twice in the org chart and therefore qualified as supervisor_employee_ID and subordinate_employee_ID respectively.
Obviously, naming conventions are subjective and a matter of style. I've find ISO/IEC 11179 guidelines to be a useful starting point.
For the DBMS, I see tables as collections of entites (except those that only ever contain one row e.g. cofig table, table of constants, etc) e.g. the table where my employee_ID is the key would be named Personnel. So straight away the TableNameID convention doesn't work for me.
I've seen the TableName.ID=PK TableNameID=FK style used on large data models and have to say I find it slightly confusing: I much prefer an identifier's name be the same throughout i.e. does not change name based on which table it happens to appear in. Something to note is the aforementioned style seems to be used in the shops which add an IDENTITY (auto-increment) column to every table while shunning natural and compound keys in foreign keys. Those shops tend not to have formal data dictionaries nor build from data models. Again, this is merely a question of style and one to which I don't personally subscribe. So ultimately, it's not for me.
All that said, I can see a case for sometimes dropping the qualifier from the column name when the table's name provides a context for doing so e.g. the element named employee_last_name may become simply last_name in the Personnel table. The rationale here is that the domain is 'people's last names' and is more likely to be UNIONed with last_name columns from other tables rather than be used as a foreign key in another table, but then again... I might just change my mind, sometimes you can never tell. That's the thing: data modelling is part art, part science.
My vote is for InvoiceID for the table ID. I also use the same naming convention when it's used as a foreign key and use intelligent alias names in the queries.
Select Invoice.InvoiceID, Lines.InvoiceLine, Customer.OrgName
From Invoices Invoice
Join InvoiceLines Lines on Lines.InvoiceID = Invoice.InvoiceID
Join Customers Customer on Customer.CustomerID = Invoice.CustomerID
Sure, it's longer than some other examples. But smile. This is for posterity and someday, some poor junior coder is going to have to alter your masterpiece. In this example there is no ambiguity and as additional tables get added to the query, you'll be grateful for the verbosity.
FWIW, our new standard (which changes, uh, I mean "evolves", with every new project) is:
Lower case database field names
Uppercase table names
Use underscores to separate words in the field name - convert these to Pascal case in code.
pk_ prefix means primary key
_id suffix means an integer, auto-increment ID
fk_ prefix means foreign key (no suffix necessary)
_VW suffix for views
is_ prefix for booleans
So, a table named NAMES might have the fields pk_name_id, first_name, last_name, is_alive, and fk_company and a view called LIVING_CUSTOMERS_VW, defined like:
SELECT first_name, last_name
FROM CONTACT.NAMES
WHERE (is_alive = 'True')
As others have said, though, just about any scheme will work as long as it is consistent and doesn't unnecessarily obfuscate your meanings.
There are lots of answers on this already, but I wanted to add two major things that I haven't seen above:
Customers coming to you for support.
Many times a customer or user or even dev of another department have hit a snag and have contacted us saying they're having a problem doing an operation. We ask them what record they're having a problem with. Now, the data they see on the screen, e.g. a grid with customer name, number of orders, destination etc is an aggregate of many tables. They say they've having trouble with id 83. There's no way to know what id that is, which table it is, if it's just called 'id'.
Namely, a row of data does not give any indication which table it is from. Unless you happen to know the schema of your database well, which is rarely the case on complex systems or non-greenfield systems you've been told to take over, you don't know what id=83 means even if you have more data like name, address, etc (which might not even be in the same table!).
This id could be coming from a grid, or it could be coming from an error in your API, or a faulty query dumping the error message to the screen, or to a log file.
Often a developer just dumps 'ID' into a column and forgets about it, and often DBs have many similar tables like Invoice, InvoiceGrouping, InvoicePlan and the ID could be for any of them. In frustration you look in the code to see which one it is, and see that they've called it Id on the model as well, so you then have to dig into how the model for the page was constructed. I cannot count how many times I've had to do this to figure out what an Id is. It's a lot. Sometimes you have to dig out a SPROC as well that just returns 'Id' as a header. Nightmare.
Log files are easier when it's clear what went wrong
Often SQL can give pretty crappy error messages. "Could not insert item with ID 83, column would be truncated" or something like that is very hard to debug. Often error messages are not very helpful, but usually the thing that broke will make a vague attempt to tell you what record was broken by just dumping out the primary key name and the value. If it's "ID" then it doesn't really help at all.
This is just two things that I didn't feel were mentioned in the other answers.
I also think that a lot of comments are 'if you program in X way then this isn't an issue', and I think the points above (and other points on this question) are valid specifically because of the way people program and because they don't have the time, energy, budget and foresight to program in perfect logging and error handling or change engrained habits of quick SQL and code writing.
I definitely agree with including the table name in the ID field name, for exactly the reasons you give. Generally, this is the only field where I would include the table name.
I do hate the plain id name. I strongly prefer to always use the invoice_id or a variant thereof. I always know which table is the authoritative table for the id when I need to, but this confuses me
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.InvoiceID = inv.ID
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.ID = inv.InvoiceLineID
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.ID = inv.InvoiceID
SELECT * from Invoice inv, InvoiceLine inv_l where
inv_l.InvoiceLineID = inv.ID
What's worst of all is the mix you mention, totally confusing. I've had to work with a database where almost always it was foo_id except in one of the most used ids. That was total hell.
I think you can use anything for the "ID" as long as you're consistent. Including the table name is important to. I would suggest using a modeling tool like Erwin to enforce the naming conventions and standards so when writing queries it's easy to understand the relationships that may exist between tables.
What I mean by the first statement is, instead of ID you can use something else like 'recno'. So then this table would have a PK of invoice_recno and so on.
Cheers,
Ben
For the column name in the database, I'd use "InvoiceID".
If I copy the fields into a unnamed struct via LINQ, I may name it "ID" there, if it's the only ID in the structure.
If the column is NOT going to be used in a foreign key, so that it's only used to uniquely identify a row for edit editing or deletion, I'll name it "PK".
If you give each key a unique name, e.g. "invoices.invoice_id" instead of "invoices.id", then you can use the "natural join" and "using" operators with no worries. E.g.
SELECT * FROM invoices NATURAL JOIN invoice_lines
SELECT * FROM invoices JOIN invoice_lines USING (invoice_id)
instead of
SELECT * from invoices JOIN invoice_lines
ON invoices.id = invoice_lines.invoice_id
SQL is verbose enough without making it more verbose.
What I do to keep things consistent for myself (where a table has a single column primary key used as the ID) is to name the primary key of the table Table_pk. Anywhere I have a foreign key pointing to that tables primary key, I call the column PrimaryKeyTable_fk. That way I know that if I have a Customer_pk in my Customer table and a Customer_fk in my Order table, I know that the Order table is referring to an entry in the Customer table.
To me, this makes sense especially for joins where I think it reads easier.
SELECT *
FROM Customer AS c
INNER JOIN Order AS c ON c.Customer_pk = o.Customer_fk
I prefer DomainName || 'ID'. (i.e. DomainName + ID)
DomainName is often, but not always, the same as TableName.
The problem with ID all by itself is that it doesn't scale upwards. Once you have about 200 tables, each with a first column named ID, the data begins to look all alike. If you always qualify ID with the table name, that helps a little, but not that much.
DomainName & ID can be used to name foreign keys as well as primary keys. When foriegn keys are named after the column that they reference, that can be of mnemonic assistance. Formally, tying the name of a foreign key to the key it references is not necessary, since the referential integrity constrain will establish the reference. But it's awfully handy when it comes to reading queries and updates.
Occasionally, DomainName || 'ID' can't be used, because there would be two columns in the same table with the same name. Example: Employees.EmployeeID and Employees.SupervisorID. In those cases, I use RoleName || 'ID', as in the example.
Last but not least, I use natural keys rather than synthetic keys when possible. There are situations where natural keys are unavailable or untrustworthy, but there are plenty of situations where the natural key is the right choice. In those cases, I let the natural key take on the name it would naturally have. This name often doesn't even have the letters, 'ID' in it. Example: OrderNo where No is an abbreviation for "Number".
For each table I choose a tree letter shorthand(e.g. Employees => Emp)
That way a numeric autonumber primary key becomes nkEmp.
It is short, unique in the entire database and I know exactly its properties at a glance.
I keep the same names in SQL and all languages I use (mostly C#, Javascript, VB6).
See the Interakt site's naming conventions for a well thought out system of naming tables and columns. The method makes use of a suffix for each table (_prd for a product table, or _ctg for a category table) and appends that to each column in a given table. So the identity column for the products table would be id_prd and is therefore unique in the database.
They go one step further to help with understanding the foreign keys: The foreign key in the product table that refers to the category table would be idctg_prd so that it is obvious to which table it belong (_prd suffix) and to which table it refers (category).
Advantages are that there is no ambiguity with the identity columns in different tables, and that you can tell at a glance which columns a query is referring to by the column names.
You could use the following naming convention. It has its flaws but it solves your particular problems.
Use short (3-4 characters) nicknames for the table names, i.e. Invoice - inv, InvoiceLines - invl
Name the columns in the table using those nicknames, i.e. inv_id, invl_id
For the reference columns use invl_inv_id for the names.
this way you could say
SELECT * FROM Invoice LEFT JOIN InvoiceLines ON inv_id = invl_inv_id