Using auto assigned primary key or setting it on INSERT? - sql

I just answered this question: Can I get the ID of the data that I just inserted?. I explained that in order to know the ID of the last record inserted in a table, what I would do is inserting it manually, instead of using some sequence or serial field.
What I like to do is to run a Max(id) query before INSERT, add 1 to that result, and use that number as ID for the record I'm about to insert.
Now, what I would like to ask: is this a good idea? Can it give some trouble? What are the reasons to use automatically set field on IDs fields?
Note: this is not exactly a question, but looking help center it seems like a good question to ask. If you find it to be off-topic, please tell me and I'll remove it.

This is a bad idea and it will fail in a multi threaded (or multi users) environment.
Please note that the surrogate-key vs natural-key debate is still far from having a concrete definitive solution - but putting that aside for a minute - even if you do go with a surrogate key - you should never try to manually auto-increment columns. Let the database do that for you and avoid all kinds of problems that can occur if you try to do that manually - such as primary key constraint violations in the best case, or duplicate values in the worst case.

If an Entity uses an ID as the Primary-Key, it is in general a good idea to let the DB autocreate it, so you don't need to determine an unused one while creating this Entity in your code. Furthermore a DateAccessObject(DAO) does not need to operate on the ID.
Dependant on what DB u might use, you might even not be allowed to retrieve all IDs of that Table..
I guess there might be other good reasons to let the DB manage this part.

Related

SQL - Must there always be a primary key?

There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?
It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.
Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.
You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.
Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.

General SQL question about Primary Keys

I know this is pretty elementary but here it goes.
I would like to know how you know what columns are a primary key in a table that does not have a primary key? Is there a technique or something that I should read?
Thank you in advance
You need to take a look at your data structures.
A primary key must:
never be NULL (no exceptions)
reliably and uniquely identify each single row
and it helps if it's
small and easy to use
stable (doesn't change at all, or at least not often)
a single column (or at most two)
Check your data - which columns or set of columns can fulfill these requirements??
Once you have those potential primary keys (the "candidate keys") - think about how you will access the data, and what other data might need to be associated with this one entity in question - what would make sense as a foreign key? Do you want to reference your department by its name? Probably not a good idea, since the name could be misspelled, it might change over time etc. By the department's office location? Bad choice, too. But something like a unique "department ID" might be a good idea.
If you don't find any appropriate column(s) in your actual data that could serve as primary key and would make sense, it's a common practice to introduce a "surrogate key" - an extra column, often an INT (and often something like an "auto-increment" INT) that will serve as an artificial identifier for each row. If you do this, one common best practice is to never show that artificial key on any data screen - it has no meaning whatsoever to the users of your system - so don't even show it to them.
Checking these requirements, and a lot of experience, will help you find the right primary key.
It really depends on the data itself. You need to determine what fields can be used to identify the record uniquely.
In SQL server it'll have a key next to it. It's typically ID or something with ID in it. It's also unique and typically increments. When you look at it in SQL. Server management studio under table design you'll see it towards the top of the list of columns with the Lil key icon.
It's a unique identifier that deciphers each record from one another. Kind of like how each person has a ssn.

MySQL Columns default value is the default value from primary key of another table

(Table)File has many (Table)Words
FK Words.file_id related to a single File.id
Default value of Words.frame is equal to File.frame for that PK/FK
Does this type of default relationship have a name? Examples on getting this setup? (MySQL)
Edit
The reason for this is that words may have the same frame as the file and if they do, we want to use that default, however some may not and need to be set manually. Is this really bad practice to handle it this way as described in one of the answers? Any improvement suggestions?
You may want to use a Trigger. You should be able to mimick the "default value" of Words.frame to be based on the value of another field from the File table.
It doesn't have a name, but feels like denormalization / data duplication to me.
#Daniel Vassallo suggests an insert trigger for this, and I think that would be the best approach as well if this is really what you need.

Database Design: Storing the primary key as a separate field in the same table

I have a table that must reference another record, but of the same table. Here's an example:
Customer
********
ID
ManagerID (the ID of another customer)
...
I have a bad feeling about doing this. My other idea was to just have a separate table that just stored the relationship.
CustomerRelationship
***************
ID
CustomerID
ManagerID
I feel I may be over complicating such a trivial thing however, I would like to get some idea's on the best approach for this particular scenario?
Thanks.
There's nothing wrong about the first design. The second one, where you have an 'intermediate' table, is used for many-to-many relationships, which i don't think is yours.
BTW, that intermediate table wouldn't have and ID of its own.
Why do you have a "bad feeling" about this? It's perfectly acceptable for a table to reference its own primary key. Introducing a secondary table only increases the complexity of your queries and negatively impacts performance.
Can a Customer have multiple managers? If so, then you need a separate table.
Otherwise, a single table is fine.
You can use the first approach. See also Using Self-Joins
There's absolutely nothing wrong with the first approach, in fact Oracle has included the 'CONNECT BY' extension to SQL since at least version 6 which is intended to directly support this type of hierarchical structure (and possibly makes Oracle worth considering as your database if you are going to be doing a lot of this).
You'll need self-joins in databases which don't have something analogous, but that's also a perfectly fine and standard solution.
As a programmer I like the first approach. I like to have less number of tables. Here we are not even talking of normalization and why do we need more tables? That is just me.
Follow the KISS principle here: Keep it simple, (silly | stupid | stud | [whatever epithet starting with S you prefer]). Go with one table, unless you have a reason to need more.
Note that if the one-to-many/many-to-many relationship ends up being the case, you can extract the existing column into a table of its own, and fill in the new entries at that time.
The only reason I would ever recommend avoiding such self-referecing tables is that SQL Server does have a few spots where there are limitations with self-referencing tables.
For one, if you ever happen to come across the need for an indexed view, then you'd find out that if one of the tables used in a view definition is indeed self-referencing, you won't be able to create a clustered index on your view :-(
But apart from that - the design per se is sound and absolutely valid - go for it! I always like to keep things as simple as possible (but no simpler than that).
Marc

Database-wide unique-yet-simple identifiers in SQL Server

First, I'm aware of this question, and the suggestion (using GUID) doesn't apply in my situation.
I want simple UIDs so that my users can easily communicate this information over the phone :
Hello, I've got a problem with order
1584
as opposed to
hello, I've got a problem with order
4daz33-d4gerz384867-8234878-14
I want those to be unique (database wide) because I have a few different kind of 'objects' ... there are order IDs, and delivery IDs, and billing-IDs and since there's no one-to-one relationship between those, I have no way to guess what kind of object an ID is referring to.
With database-wide unique IDs, I can immediately tell what object my customer is referring to. My user can just input an ID in a search tool, and I save him the extra-click to further refine what is looking for.
My current idea is to use identity columns with different seeds 1, 2, 3, etc, and an increment value of 100.
This raises a few question though :
What if I eventually get more than 100 object types? granted I could use 1000 or 10000, but something that doesn't scale well "smells"
Is there a possibility the seed is "lost" (during a replication, a database problem, etc?)
more generally, are there other issues I should be aware of?
is it possible to use an non integer (I currently use bigints) as an identity columns, so that I can prefix the ID with something representing the object type? (for example a varchar column)
would it be a good idea to user a "master table" containing only an identity column, and maybe the object type, so that I can just insert a row in it whenever a need a new idea. I feel like it might be a bit overkill, and I'm afraid it would complexify all my insertion requests. Plus the fact that I won't be able to determine an object type without looking at the database
are there other clever ways to address my problem?
Why not use identities on all the tables, but any time you present it to the user, simply tack on a single char for the type? e.g. O1234 is an order, D123213 is a delivery, etc.? That way you don't have to engineer some crazy scheme...
Handle it at the user interface--add a prefix letter (or letters) onto the ID number when reporting it to the users. So o472 would be an order, b531 would be a bill, and so on. People are quite comfortable mixing letters and digits when giving "numbers" over the phone, and are more accurate than with straight digits.
You could use an autoincrement column to generate the unique id. Then have a computed column which takes the value of this column and prepends it with a fixed identifier that reflects the entity type, for example OR1542 and DL1542, would represent order #1542 and delivery #1542, respectively. Your prefix could be extended as much as you want and the format could be arranged to help distiguish between items with the same autoincrement value, say OR011542 and DL021542, with the prefixes being OR01 and DL02.
I would implement by defining a generic root table. For lack of a better name call it Entity. The Entity table should have at a minimum a single Identity column on it. You could also include other fields that are common accross all your objects or even meta data that tells you this row is an order for example.
Each of your actual Order, Delivery...tables will have a FK reference back to the Entity table. This will give you a single unique ID column
Using the seeds in my opinion is a bad idea, and one that could lead to problems.
Edit
Some of the problems you mentioned already. I also see this being a pain to track and ensure you setup all new entities correctly. Imagine a developer updating the system two years from now.
After I wrote this answer I had thought a but more about why your doing this, and I came to the same conclusion that Matt did.
MS's intentional programing project had a GUID-to-word system that gave pronounceable names from random ID's
Why not a simple Base36 representation of a bigint? http://en.wikipedia.org/wiki/Base_36
We faced a similar problem on a project. We solved it by first creating a simple table that only has one row: a BIGINT set as auto-increment identity.
And we created an sproc that inserts a new row in that table, using default values and inside a transaction. It then stores the SCOPE_IDENTITY in a variable, rolls back the transaction and then returns the stored SCOPE_IDENTITY.
This gives us a unique ID inside the database without filling up a table.
If you want to know what kind of object the ID is referring to, I'd lose the transaction rollback and also store the type of object along side the ID. That way findout out what kind of object the Id is referring to is only one select (or inner join) away.
I use a high/low algorithm for this. I can't find a description for this online though. Must blog about it.
In my database, I have an ID table with an counter field. This is the high part. In my application, I have a counter that goes from 0 to 99. This is the low part. The generated key is 100 * high + low.
To get a key, I do the following
initially high = -1
initially low = 0
method GetNewKey()
begin
if high = -1 then
high = GetNewHighFromDatabase
newkey = 100 * high + low.
Inc low
If low = 100 then
low = 0
high = -1
return newKey
end
The real code is more complicated with locks etc but that is the general gist.
There are a number of ways of getting the high value from the database including auto inc keys, generators etc. The best way depends on the db you are using.
This algorithm gives simple keys while avoiding most the db hit of looking up a new key every time. In testing, I found it had similar performance to guids and vastly better performance than retrieving an auto inc key every time.
You could create a master UniqueObject table with your identity and a subtype field. Subtables (Orders, Users, etc.) would have a FK to UniqueObject. INSTEAD OF INSERT triggers should keep the pain to a minimum.
Maybe an itemType-year-week-orderNumberThisWeek variant?
o2009-22-93402
Such identifier can consist of several database column values and simply formatted into a form of an identifier by the software.
I had a similar situation with a project.
My solution: By default, users only see the first 7 characters of the GUID.
It's sufficiently random that collisions are extremely unlikely (1 in 268 million), and it's efficient for speaking and typing.
Internally, of course, I'm using the entire GUID.