What does the "minimality" mean in database candidate keys? - sql

I was very confused by the idea given by my professor when studying about candidate key (but i feel sham to ask him :p )
"no component of K can be eliminated without destroying
the uniqueness property --- minimality"
And i searched on wiki it says
"there is no proper subset of these attributes for which (1) holds (which means that the set is minimal)." It also give an example but i don't understand.
So my question is what does the "eliminated" here means? if it means remove the whole rows of data, then it should always keep the uniqueness of the data(because you can't lose uniqueness by removing a row of data). If it means only remove the single attribute of K and left a row with a empty "block", it looks silly and will destroy the data. So can someone give me some simple example for what does this property mean?(maybe one for the good one and one for the bad one) Thank you~

"Elimination" here does not touch the data at all. It just means that you remove one attribute/column from your key. If that reduced set of columns is still a key (i.e. uniquely identifies any data row), then the previous key was not minimal.
Example:
name id amount
A 1 1000
B 2 2000
C 3 1000
You could use name or id as a minimal key.
You could also use [name, id] as a (compound, multi-column) key. But that key is not minimal (because you can remove one column from it and still have a key).
The column amount in itself does not make a key at all.
[amount, id] would be a key, but again, it is not minimal.

Related

SQL - Must there always be a primary key?

There are a couple of similar questions already out there and the consensus seemed to be that a primary key should always be created.
But what if you have a single row table for storing settings (and let's not turn this into a discussion about why it might be good/bad to create a single row table please)?
Surely having a primary key on a single row table becomes completely useless?
It may seem completely useless, but it's also completely harmless, and I'd vote for harmless with good design principles vs. useless with no design principles every time.
Other people have commented, rightly, that you don't know how you're going to use the table in a year or five years... what if someone comes along and decides they want to duplicate the configuration -- move it to a distributed environment or add a test environment by using a duplicate configuration string or whatever. Having a field that acts like a primary key means that whenever you query the table, if you use the key, you'll be certain no matter what anyone else may do to your table, that you're getting the correct record.
You're right there are a million other aspects -- surrogate keys vs. intelligent keys, indexing, partitioning (silly on a single row table, I know), whatever... but without getting into that I'd vote add the key rather than not add it. You could have done it by the time you read this thread.
Short answer, no key, duplicate records possible. Your planning a single row now, but what about six months in the future when you single row multiplies. Put a primary key on the table, even for single row.
You could always base your primary key on the name of the setting. Then your table would become a key-value store.
But no, in many RDBMS you are not REQUIRED to have a primary key per table.
Having declared a primary key on a single row table in SQL will ensure that there will be no duplicates. Whether it is useless depends on your requirements. Usually it is a good idea to avoid duplicates.

SQL Server, does the id change if an element gets deleted?

I wondered if I insert, let's say, 10 entries into a SQL Server table.
If i then delete one of them, will the id/index change correspondingly?
Example:
1 | Simon Cowell | 56 years
2 | Frank Lampard| 24 years
3 | Harry Bennet | 12 years
If I delete #2, will Harry Bennet's index change to 2?
Thanks :)
EDIT:
Sorry for my outrage, had a bad day. And yes, I should have researched it myself, I deserve to be downvoted.
I don't ask for anything, I just want to say that I'm sorry :|
Since you seem to be conflating "id/index" let's talk a little but about the primary key and indexes in the context of a relational database.
The "id" or primary key assigned to a row in a SQL database is the unique identifier for that row. It can consist of one or more columns. (When more than one column is involved it is known as a "composite" or "multi-part" key.) The primary key should really do nothing more than be a unique handle for addressing a row: the primary key should not contain any information about the entity represented by the row, especially if that info has the potential to be mutable; an example would be a part number that has a suffix that stands for the type of metal the part is made from; if that metal can possibly change from titanium to unobtainium, say, that part number would make a bad choice as a primary key; it would be better to have another column to store the type of metal than to make the metal-type suffix part of the primary key. "Meaningful" primary keys might have made some sense in legacy non-relational databases but in a relational database they are to be avoided.
When seeking to enforce the uniqueness of a primary key, a database engine can make use of an index so it can rapidly test whether the key value exists. It could use a binary algorithm to find the value, avoiding the need to scan the actual data "brute force", row by row, looking for the value. But the index that is used behind the scenes by the engine to assist it with the primary key housekeeping is not the same as the primary key itself.
If you have a simple sequential integer as your primary key, there's an infinite number of them, so there is no need to reuse an integer when it becomes available when the row to which it was assigned has been deleted. So the relational database engine won't automatically attempt to reuse it, and it won't by any means change the primary key values that have been assigned to all other rows in the table when "gaps" in the number sequence are created by a deletion. Many other rows in other tables could be referencing those values and having them be mutable would create either chaos or a huge inefficiency.
Hashing algorithms are another very efficient way a database engine can quickly test for the existence of a key value. It computes the location in the hashed-file where the key would be if it did exist, and then looks there for it. The rows are stored in no particular order, so such schemes are optimized for instant finding of records in a large table, not for culling records that have a common characteristic, such as all customers in zipcode 10023.
No. You can set up triggers or logic to do it if you want; however, it will not automatically do this.
No it will not change automatically
No, it wont. And hopefully, that's the answer you're hoping for. For any auto-generated identifiers (such as IDENTITY columns), you should, so far as possible, ignore the data type and treat it as an opaque "blob" of identity information.
It gets assigned during insert, and you can use it for cross-referencing purposes, but the fact that it's numeric is not something you should use or rely upon. It's just a stable identifier for the row.

sql primary key auto increment

Is having a primary key that auto increments on each new row necessary? for me this number is getting quite long and I'm not even using it for anything.
I can imagine that with gradual user activity on my site new rows will be added (I am only testing atm with just 2 alfa test users and already the number has auto incremented to over 100), eventually this number could reach silly proportions (example: 10029379000577352881086) and not only slow the site down (effecting user experience) but also could inevitably push my site over its quota (exceeding its allowed size (laymen's))
really is this needed?
If you have some field/column (or combination of columns) which can be a primary key, use that, why use Auto increment. There are school of thoughts which believe using a mix of both. You could search for surrogate keys and you may find this answer interesting Surrogate vs. natural/business keys
For size quota problem, practically I don't think the maximum auto increment value would cause your site to go over data limit. If it is of int type it will take 4 bytes, regardless of the value inside. For SQL server int type could contain values ranging from -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647).
Here is the link for that
You need a way to uniquely identify each record in your table.
If you have that already -- say a user-ID or email-address -- then you don't necessarily need that auto-incrementing field.
Note: If you don't already have a unique constraint on that field, you should add one so that duplicate data cannot be entered into the table.
Warning: If you decide to get rid of it, be sure that no other tables are using it.
can't you user multiple columns to get a composite key instead of that?
just a hint.
You do need a key that identifies every row. But a key doesn't have to be a number that "auto-increments" for every row. The fact that a few people seem to think incrementing numbers are always a good idea for keys is probably a consequence either of carelessness or a lack of appreciation of database fundamentals, sound design and data integrity.
primary key is not always necessary to have for a table . for your question check my answer:
when and when not primary key should use

Is ID column required in SQL?

Traditionally I have always used an ID column in SQL (mostly mysql and postgresql).
However I am wondering if it is really necessary if the rest of the columns in each row make in unique. In my latest project I have the "ID" column set as my primary key, however I never call it or use it in any way, as the data in the row makes it unique and is much more useful for me.
So, if every row in a SQL table is unique, does it need a primary key ID table, and are there ant performance changes with or without one?
Thanks!
EDIT/Additional info:
The specific example that made me ask this question is a table I am using for a many-to-many-to-many-to-many table (if we still call it that at that point) it has 4 columns (plus ID) each of which represents an ID of an external table, and each row will always be numeric and unique. only one of the columns is allowed to be null.
I understand that for normal tables an ID primary key column is a VERY good thing to have. But I get the feeling on this particular table it just wastes space and slows down adding new rows.
If you really do have some pre-existing column in your data set that already does uniquely identify your row - then no, there's no need for an extra ID column. The primary key however must be unique (in ALL circumstances) and cannot be empty (must be NOT NULL).
In my 20+ years of experience in database design, however, this is almost never truly the case. Most "natural" ID's that appear to be unique aren't - ultimately. US Social Security Numbers aren't guaranteed to be unique, and most other "natural" keys end up being almost unique - and that's just not good enough for a database system.
So if you really do have a proper, unique key in your data already - use it! But most of the time, it's easier and more convenient to have just a single surrogate ID that you can guarantee will be unique over all rows.
Don't confuse the logical model with the implementation.
The logical model shows a candidate key (all columns) which could makes your primary key.
Great. However...
In practice, having a multi column primary key has downsides: it's wide, not good when clustered etc. There is plenty of information out there and in the "related" questions list on the right
So, you'd typically
add a surrogate key (ID column)
add a unique constraint to keep the other columns unique
the ID column will be the clustered key (can be only one per table)
You can make either key the primary key now
The main exception is link or many-to-many tables that link 2 ID columns: a surrogate isn't needed (unless you have a braindead ORM)
Edit, a link: "What should I choose for my primary key?"
Edit2
For many-many tables: SQL: Do you need an auto-incremental primary key for Many-Many tables?
Yes, you could have many attributes (values) in a record (row) that you could use to make a record unique. This would be called a composite primary key.
However it will be much slower in general because the construction of the primary index will be much more expensive. The primary index is used by relational database management systems (RDBMS) not only to determine uniqueness, but also in how they order and structure records on disk.
A simple primary key of one incrementing value is generally the most performant and the easiest solution for the RDBMS to manage.
You should have one column in every table that is unique.
EDITED...
This is one of the fundamentals of database table design. It's the row identifier - the identifier identifies which row(s) are being acted upon (updated/deleted etc). Relying on column combinations that are "unique", eg (first_name, last_name, city), as your key can quickly lead to problems when two John Smiths exist, or worse when John Smith moves city and you get a collision.
In most cases, it's best to use a an artificial key that's guaranteed to be unique - like an auto increment integer. That's why they are so popular - they're needed. Commonly, the key column is simply called id, or sometimes <tablename>_id. (I prefer id)
If natural data is available that is unique and present for every row (perhaps retinal scan data for people), you can use that, but all-to-often, such data isn't available for every row.
Ideally, you should have only one unique column. That is, there should only be one key.
Using IDs to key tables means you can change the content as needed without having to repoint things
Ex. if every row points to a unique user, what would happen if he/she changed his name to let say John Blblblbe which had already been in db? And then again, what would happen if you software wants to pick up John Blblblbe's details, whose details would be picked up? the old John's or the one ho has changed his name? Well if answer for bot questions is 'nothing special gonna happen' then, yep, you don't really need "ID" column :]
Important:
Also, having a numeric ID column with numbers is much more faster when you're looking for an exact row even when the table hasn't got any indexing keys or have more than one unique
If you are sure that any other column is going to have unique data for every row and isn't going to have NULL at any time then there is no need of separate ID column to distinguish each row from others, you can make that existing column primary key for your table.
No, single-attribute keys are not essential and nor are surrogate keys. Keys should have as many attributes as are necessary for data integrity: to ensure that uniqueness is maintained, to represent accurately the universe of discourse and to allow users to identify the data of interest to them. If you have already identified a suitable key and if you don't find any real need to create another one then it would make no sense to add redundant attributes and indexes to your table.
An ID can be more meaningful, for an example an employee id can represent from which department he is, year of he join and so on. Apart from that RDBMS supports lots operations with ID's.

Updating multiple rows which are in conflict with unique index

I am using Microsoft SQL Server and I have a master-detail scenario where I need to store the order of details. So in the Detail table I have ID, MasterID, Position and some other columns. There is also a unique index on MasterID and Position. It works OK except one case: when I have some existing details and I change their order. For example when I change a detail on position 3 with a detail on position 2. When I save the detail on position 2 (which in the database has Position equal to 3) SQL Server protests, because the index uniqueness constraint.
How to solve this problem in a reasonable way?
Thank you in advance
Lukasz Glaz
This is a classic problem and the answer is simple: if you want to move item 3 to position 2, you must first change the sort column of 2 to a temporary number (e.g. 99). So it goes like this:
Move 2 to 99
Move 3 to 2
Move 99 to 3
You must be careful, though, that your temporary value is never used in normal processing and that you respect multiple threads if applicable.
Update: BTW - one way to deal with the "multiple users may be changing the order" issue is to do what I do: give each user a numberical ID and then add this to the temporary number (my staff ID is actually the Unique Identity field ID from the staff table used to gate logins). So, for example, if your positions will never be negative, you might use -1000 - UserID as your temporary value. Trust me on one thing though: you do not want to just assume that you'll never have a collision. If you think that and one does occur, it'll be extremely hard to debug!
Update: GUZ points out that his users may have reordered an entire set of line items and submitted them as a batch - it isn't just a switch of two records. You can approach this in one of two ways, then.
First, you could change the existing sort fields of the entire set to a new set of non-colliding values (e.g. -100 - (staffID * maxSetSize) + existingOrderVal) and then go record-by-record and change each record to the new order value.
Or you could essentially treat it like a bubble sort on an array where the orderVal value is the equivalent of your array index. Either this makes perfect sense to you (and is obvious) or you should stick with solution 1 (which is easier in any event).
you could just remove the unique constraint (but leave an index key) on the order column, and ensure uniqueness in your code if necessary.