SQL Server, does the id change if an element gets deleted? - sql

I wondered if I insert, let's say, 10 entries into a SQL Server table.
If i then delete one of them, will the id/index change correspondingly?
Example:
1 | Simon Cowell | 56 years
2 | Frank Lampard| 24 years
3 | Harry Bennet | 12 years
If I delete #2, will Harry Bennet's index change to 2?
Thanks :)
EDIT:
Sorry for my outrage, had a bad day. And yes, I should have researched it myself, I deserve to be downvoted.
I don't ask for anything, I just want to say that I'm sorry :|

Since you seem to be conflating "id/index" let's talk a little but about the primary key and indexes in the context of a relational database.
The "id" or primary key assigned to a row in a SQL database is the unique identifier for that row. It can consist of one or more columns. (When more than one column is involved it is known as a "composite" or "multi-part" key.) The primary key should really do nothing more than be a unique handle for addressing a row: the primary key should not contain any information about the entity represented by the row, especially if that info has the potential to be mutable; an example would be a part number that has a suffix that stands for the type of metal the part is made from; if that metal can possibly change from titanium to unobtainium, say, that part number would make a bad choice as a primary key; it would be better to have another column to store the type of metal than to make the metal-type suffix part of the primary key. "Meaningful" primary keys might have made some sense in legacy non-relational databases but in a relational database they are to be avoided.
When seeking to enforce the uniqueness of a primary key, a database engine can make use of an index so it can rapidly test whether the key value exists. It could use a binary algorithm to find the value, avoiding the need to scan the actual data "brute force", row by row, looking for the value. But the index that is used behind the scenes by the engine to assist it with the primary key housekeeping is not the same as the primary key itself.
If you have a simple sequential integer as your primary key, there's an infinite number of them, so there is no need to reuse an integer when it becomes available when the row to which it was assigned has been deleted. So the relational database engine won't automatically attempt to reuse it, and it won't by any means change the primary key values that have been assigned to all other rows in the table when "gaps" in the number sequence are created by a deletion. Many other rows in other tables could be referencing those values and having them be mutable would create either chaos or a huge inefficiency.
Hashing algorithms are another very efficient way a database engine can quickly test for the existence of a key value. It computes the location in the hashed-file where the key would be if it did exist, and then looks there for it. The rows are stored in no particular order, so such schemes are optimized for instant finding of records in a large table, not for culling records that have a common characteristic, such as all customers in zipcode 10023.

No. You can set up triggers or logic to do it if you want; however, it will not automatically do this.

No it will not change automatically

No, it wont. And hopefully, that's the answer you're hoping for. For any auto-generated identifiers (such as IDENTITY columns), you should, so far as possible, ignore the data type and treat it as an opaque "blob" of identity information.
It gets assigned during insert, and you can use it for cross-referencing purposes, but the fact that it's numeric is not something you should use or rely upon. It's just a stable identifier for the row.

Related

Adding an artificial primary key versus using a unique field [duplicate]

This question already has answers here:
Surrogate vs. natural/business keys [closed]
(19 answers)
Why would one consider using Surrogate keys vs Natural with ON UPDATE CASCADE?
(1 answer)
Closed 7 months ago.
Recently I Inherited a huge app from somebody who left the company.
This app used a SQL server DB .
Now the developer always defines an int base primary key on tables. for example even if Users table has a unique UserName field , he always added an integer identity primary key.
This is done for every table no matter if other fields could be unique and define primary key.
Do you see any benefits whatsoever on this? using UserName as primary key vs adding UserID(identify column) and set that as primary key?
I feel like I have to add add another element to my comments, which started to produce an essay of comments, so I think it is better that I post it all as an answer instead.
Sometimes there are domain specific reasons why a candidate key is not a good candidate for joins (maybe people change user names so often that the required cascades start causing performance problems). But another reason to add an ever-increasing surrogate is to make it the clustered index. A static and ever-increasing clustered index alleviates a high-cost IO operation known as a page split. So even with a good natural candidate key, it can be useful to add a surrogate and cluster on that. Read this for further details.
But if you add such a surrogate, recognise that the surrogate is purely internal, it is there for performance reasons only. It does not guarantee the integrity of your data. It has no meaning in the model, unless it becomes part of the model. For example, if you are generating invoice numbers as an identity column, and sending those values out into the real world (on invoice documents/emails/etc), then it's not a surrogate, it's part of the model. It can be meaningfully referenced by the customer who received the invoice, for example.
One final thing that is typically left out of this discussion is one particular aspect of join performance. It is often said that the primary key should also be narrow, because it can make joins more performant, as well as reducing the size of non-clustered indexes. And that's true.
But a natural primary key can eliminate the need for a join in the first place.
Let's put all this together with an example:
create table Countries
(
countryCode char(2) not null primary key clustered,
countryName varchar(64) not null
);
insert Countries values
('AU', 'Australia'),
('FR', 'France');
create table TourLocations
(
tourLocationName varchar(64) not null,
tourLocationId int identity(1,1) unique clustered,
countryCode char(2) not null foreign key references Countries(countryCode),
primary key (countryCode, tourLocationName)
);
insert TourLocations (TourLocationName, countryCode) values
('Bondi Beach', 'AU'),
('Eiffel Tower', 'FR')
I did not add a surrogate key to Countries, because there aren't many rows and we're not going to be constantly inserting new rows. I already know what all the countries are, and they don't change very often.
On the TourLocations table I have added an identity and clustered on it. There could be very many tour locations, changing all the time.
But I still must have a natural key on TourLocations. Otherwise I could insert the same tour location name with the same country twice. Sure, the Id's will be different. But the Id's don't mean anything. As far as any real human is concerned, two tour locations with the same name and country code are completely indistinguishable. Do you intend to have actual users using the system? Then you've got a problem.
By putting the same country and location name in twice I haven't created two facts in my database. I have created the same fact twice! No good. The natural key is necessary. In this sense The Impaler's answer is strictly, necessarily, wrong. You cannot not have a natural key. If the natural key can't be defined as anything other than "every meaningful column in the table" (that is to say, excluding the surrogate), so be it.
OK, now let's investigate the claim that an int identity key is advantageous because it helps with joins. Well, in this case my char(2) country code is narrower than an int would have been.
But even if it wasn't (maybe we think we can get away with a tinyint), those country codes are meaningful to real people, which means a lot of the time I don't have to do the join at all.
Suppose I gave the results of this query to my users:
select countryCode, tourLocationName
from TourLocations
order by 1, 2;
Very many people will not need me to provide the countries.countryName column for them to know which country is represented by the code in each of those rows. I don't have to do the join.
When you're dealing with a specific business domain that becomes even more likely. Meaningful codes are understood by the domain users. They often don't need to see the long description columns from the key table. So in many cases no join is required to give the users all of the information they need.
If I had foreign keyed to an identity surrogate I would have to do the join, because the identity surrogate doesn't mean anything to anyone.
You are talking about the difference between synthetic and natural keys.
In my [very] personal opinion, I would recommend to always use synthetic keys (and always call it id). The main problem is that natural keys are never unique; they are unique in theory, yes, but in the real world there are a myriad of unexpected and inexorable events that will make this false.
In database design:
Natural keys correspond to values present in the domain model. For example, UserName, SSN, VIN can be considered natural keys.
Synthetic keys are values not present in the domain model. They are just numeric/string/UUID values that have no relationship with the actual data. They only serve as a unique identifiers for the rows.
I would say, stick to synthetic keys and sleep well at night. You never know what the Marketing Department will come up with on Monday, and suddenly "the username is not unique anymore".
Yes having a dedicated int is a good thing for PK use.
you may have multiple alternate keys, that's ok too.
two great reasons for it:
it is performant
it protects against key mutation ( editing a name etc. )
A username or any such unique field that holds meaningful data is subject to changes. A name may have been misspelled or you might want to edit a name to choose a better one, etc. etc.
Primary keys are used to identify records and, in conjunction with foreign keys, to connect records in different tables. They should never change. Therefore, it is better to use a meaningless int field as primary key.
By meaningless I mean that apart from being the primary key it has no meaning to the users.
An int identity column has other advantages over a text field as primary key.
It is generated by the database engine and is guaranteed to be unique in multi-user scenarios.
it is faster than a text column.
Text can have leading spaces, hidden characters and other oddities.
There are multiple kinds of text data types, multiple character sets and culture dependent behaviors resulting in text comparisons not always working as expected.
int primary keys generated in ascending order have a superior performance in conjunction with clustered primary keys (which is a SQL-Server specialty).
Note that I am talking from a database point of view. In the user interface, users will prefer identifying entries by name or e-mail address, etc.
But commands like SELECT, INSERT, UPDATE or DELETE will always identify records by the primary key.
This subject - quite much like gulivar travels and wars being fought over which end of the egg you supposed to crack open to eat.
However, using the SAME "id" name for all tables, and autonumber? Yes, it is LONG establihsed choice.
There are of course MANY different views on this subject, and many advantages and disavantages.
Regardless of which choice one perfers (or even needs), this is a long established concept in our industry. In fact SharePoint tables use "ID" and autonumber by defualt. So does ms-access, and there probably more that do this.
The simple concpet?
You can build your tables with the PK and child tables with forighen keys.
At that point you setup your relationships between the tables.
Now, you might decide to add say some invoice number or whatever. Rules might mean that such invoice number is not duplicated.
But, WHY do we care of you have some "user" name, or some "invoice" number or whatever. Why should that fact effect your relational database model?
You mean I don't have a user name, or don't have a invoice number, and the whole database and relatonships don't work anymore? We don't care!!!!
The concept of data, even required fields, or even a column having to be unique ?
That has ZERO to do with a working relational data model.
And maybe you decide that invoice number is not generated until say sent to the customer. So, the fact of some user name, invoice number or whatever? Don't care - you can have all kinds of business rules for those numbers, but they have ZERO do to do with the fact that you designed a working relational data model based on so called "surrogate" or sometime called synthetic keys.
So, once you build that data model - even with JUST the PK "id" and FK (forighen keys), you are NOW free to start adding columns and define what type of data you going to put in each table. but, what you shove into each table has ZERO to do with that working related data model. They are to be thought as seperate concpets.
So, if you have a user name - add that column to the table. If you don't want users name, remove the column. As such data you store in the table has ZERO to do with the automatic PK ID you using - it not really any different then say what area of memory the computer going to allocate to load that data. Basic data operations of the system is has nothing to do with having build database with relationships that simple exist. And the data columns you add after having built those relationships is up to you - but will not, and should not effect the operation of the database and relationships you built and setup. Not only are these two concepts separate, but they free the developer from having to worry about the part that maintains the relationships as opposed to data column you add to such tables to store user data.
I mean, in json data, xml? We often have a master + child table relationship. We don't care how that relationship is maintained - but only that it exists.
Thus yes, all tables have that pk "ID". Even better? in code, you NEVER have to guess what the PK id is - it always the same!!!
So, data and columns you put and toss into a table? Those columns and data have zero to do with the PK id, and while it is the database generating that PK? It could be a web service call to some monkeys living in a far away jungle eating banana's and they give you a PK value based on how many bananas they eaten. We just really don't' care about that number - it is just internal house keeping numbers - one that we don't see or even care about in most code. And thus the number one rule to such auto matic PK values?
You NEVER give that auto PK number any meaning from a user and applcation point of view.
In summary:
Yes, using a PK called "id" for all tables? Common, and in fact in SharePoint and many systems, it not only the default, but is in fact required for such systems to operate.
Its better to use userid. User table is referenced by many other tables.
The referenced table would contain the primary key of the user table as foreign key.
Its better to use userid since its integer value,
it takes less space than string values of username and
the searches by the database engine would be faster
user(userid, username, name)
comments(commentid, comment, userid) would be better than
comments(commentid, comment, username)

What could be the purpose of primary keys on all tables being derived from a single table?

I've really been scratching my head over this and don't know how to ask the question well enough to find an answer on Google or StackOverflow etc.
There is a very old system used at work - I don't have access to the server side so can't view its tables, but I do know its an SQL database and have done enough experimenting with the API to see what adding to each table does, and I'm questioning how it allocates primary keys;
It has a lot of tables, each with a primary key as expected, but the primary key on any/all of its tables seems to be allocated so that there is absolutely no duplication of primary keys anywhere in the system.
e.g.
add row to table 1 get pk = 1
add row to table 2 get pk = 2
add row to table 1 again, get pk = 3
add row to table 10 and get pk = 4
Is this method some sort of old database technique?
What could be the purpose of doing this?
There are more funny nuances that I won't get into detail of, e.g. a certain range of pk's being allocated for certain tables but I just wanted to see if anyone recognises the main principle here and if there's a point to it, or if it's just bad / weird design
A primary key only needs to be unique within a single table. There is no such thing as a primary key across multiple tables.
This might be useful under some circumstances. For instance, this would allow all entities to be represented in a single table. This can be handy for "generic" information, such as adding comments to the entities.
More prosaically, I have seen this in older Oracle databases. Oracle did not have any automated mechanism for generating ids, so this required using a sequence. As a matter of convenience, laziness, or design, multiple tables might use the same sequence -- resulting in the behavior that you see.

SQL - Must there always be a primary key?

There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?
It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.
Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.
You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.
Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.

Using string as PK vs using GUID or int Id with Unique Constraint for Names

Hi I was wondering what is the best practice for tables in which you have a record that must be unique. I've seen the two ways of doing that: use a Primary Key or add a Unique constraint to the column.
If you use a primary key, is it bad practice to have a primary key such as "UserName" that is varchar(*)? Does that impact performance enough that it is problematic? Is it best to use an integer id with a unique constraint on the username?
I see some other factors that may impact choosing a column as PK vs Unique. Am I right about these?
PK
- Column should be one that doesn't ever need to be changed
Unique
- Column could be changed later on
Having a primary key on the UserName is not the best idea, but it isn't so bad in performance as you maybe think.
The best idea would be using a ID (INT) as PRIMARY KEY and the UserName as UNIQUE.
Usernames change over time, that is why they are a bad candidate for a PK especally since it is extremely likely you have child records associated with the username. For instance suppose my username included some variation of my real name. If I then got divorced and returned to my maiden name, the last thing I want to do is be reminded of that SOB I was married to and so I change my username. Do you really want to change the 2 million posts I've made in the last ten years as well? I didn't think so.
Yes string comparisons are slower but this may or may not be an issue depending on the overall amount of action the database will get. Small copmany database with less than 200 users, probaly not a problem, Internet site with millions of users, much more likely to be a problem.
It may or may not be a good idea as others have already discussed. Let me just add one more detail...
I see some other factors that may impact choosing a column as PK vs Unique.
The main difference is usually related to clustering. Most DBMSes (that support clustering) automatically use PK as a clustering index. For example MySQL/InnoDB always clusters data and you can't event turn it off, while MS SQL Server clusters by default (you have to use special syntax to turn it off).
Should you choose to use clustering (or are forced by your DBMS), having fewer indexes is usually better (e.g. see "Disadvantages of clustering" in this article), even when leading to "fatter" foreign keys.

Is ID column required in SQL?

Traditionally I have always used an ID column in SQL (mostly mysql and postgresql).
However I am wondering if it is really necessary if the rest of the columns in each row make in unique. In my latest project I have the "ID" column set as my primary key, however I never call it or use it in any way, as the data in the row makes it unique and is much more useful for me.
So, if every row in a SQL table is unique, does it need a primary key ID table, and are there ant performance changes with or without one?
Thanks!
EDIT/Additional info:
The specific example that made me ask this question is a table I am using for a many-to-many-to-many-to-many table (if we still call it that at that point) it has 4 columns (plus ID) each of which represents an ID of an external table, and each row will always be numeric and unique. only one of the columns is allowed to be null.
I understand that for normal tables an ID primary key column is a VERY good thing to have. But I get the feeling on this particular table it just wastes space and slows down adding new rows.
If you really do have some pre-existing column in your data set that already does uniquely identify your row - then no, there's no need for an extra ID column. The primary key however must be unique (in ALL circumstances) and cannot be empty (must be NOT NULL).
In my 20+ years of experience in database design, however, this is almost never truly the case. Most "natural" ID's that appear to be unique aren't - ultimately. US Social Security Numbers aren't guaranteed to be unique, and most other "natural" keys end up being almost unique - and that's just not good enough for a database system.
So if you really do have a proper, unique key in your data already - use it! But most of the time, it's easier and more convenient to have just a single surrogate ID that you can guarantee will be unique over all rows.
Don't confuse the logical model with the implementation.
The logical model shows a candidate key (all columns) which could makes your primary key.
Great. However...
In practice, having a multi column primary key has downsides: it's wide, not good when clustered etc. There is plenty of information out there and in the "related" questions list on the right
So, you'd typically
add a surrogate key (ID column)
add a unique constraint to keep the other columns unique
the ID column will be the clustered key (can be only one per table)
You can make either key the primary key now
The main exception is link or many-to-many tables that link 2 ID columns: a surrogate isn't needed (unless you have a braindead ORM)
Edit, a link: "What should I choose for my primary key?"
Edit2
For many-many tables: SQL: Do you need an auto-incremental primary key for Many-Many tables?
Yes, you could have many attributes (values) in a record (row) that you could use to make a record unique. This would be called a composite primary key.
However it will be much slower in general because the construction of the primary index will be much more expensive. The primary index is used by relational database management systems (RDBMS) not only to determine uniqueness, but also in how they order and structure records on disk.
A simple primary key of one incrementing value is generally the most performant and the easiest solution for the RDBMS to manage.
You should have one column in every table that is unique.
EDITED...
This is one of the fundamentals of database table design. It's the row identifier - the identifier identifies which row(s) are being acted upon (updated/deleted etc). Relying on column combinations that are "unique", eg (first_name, last_name, city), as your key can quickly lead to problems when two John Smiths exist, or worse when John Smith moves city and you get a collision.
In most cases, it's best to use a an artificial key that's guaranteed to be unique - like an auto increment integer. That's why they are so popular - they're needed. Commonly, the key column is simply called id, or sometimes <tablename>_id. (I prefer id)
If natural data is available that is unique and present for every row (perhaps retinal scan data for people), you can use that, but all-to-often, such data isn't available for every row.
Ideally, you should have only one unique column. That is, there should only be one key.
Using IDs to key tables means you can change the content as needed without having to repoint things
Ex. if every row points to a unique user, what would happen if he/she changed his name to let say John Blblblbe which had already been in db? And then again, what would happen if you software wants to pick up John Blblblbe's details, whose details would be picked up? the old John's or the one ho has changed his name? Well if answer for bot questions is 'nothing special gonna happen' then, yep, you don't really need "ID" column :]
Important:
Also, having a numeric ID column with numbers is much more faster when you're looking for an exact row even when the table hasn't got any indexing keys or have more than one unique
If you are sure that any other column is going to have unique data for every row and isn't going to have NULL at any time then there is no need of separate ID column to distinguish each row from others, you can make that existing column primary key for your table.
No, single-attribute keys are not essential and nor are surrogate keys. Keys should have as many attributes as are necessary for data integrity: to ensure that uniqueness is maintained, to represent accurately the universe of discourse and to allow users to identify the data of interest to them. If you have already identified a suitable key and if you don't find any real need to create another one then it would make no sense to add redundant attributes and indexes to your table.
An ID can be more meaningful, for an example an employee id can represent from which department he is, year of he join and so on. Apart from that RDBMS supports lots operations with ID's.