SQL: string PK ( maybe composite ) or auto_number artificial PK with unique constraint?

SQL: string PK ( maybe composite ) or auto_number artificial PK with unique constraint? - sql

I am creating a new database and have a dilemma: I have an authors table. My requirements are that it should contain the firstName and lastName columns and I shouldn't allow two authors that have the same firstName and lastName.
My first ideea was to make firstName and lastName composite pk, and that's it!
But, I will tie this table to other tables so, to make my life easier I though of using an int auto_increment PK and make a composite UC of firstName and lasteName.
My question is generally: let's say I have a persons table and I can use the SSN as PK. If i tie this table to n tables duplicating the SSN value in the child tables may consume more memory than rather using an int auto_increment PK and making UC of SSN column?
What approach is better, and when to use what?
Kind regards,

The best approach is to keep your PRIMARY KEY separated from the rest of the table, the business logic. Plus it is way faster to have a separate PK because of indexing, it is so much better for performance, easier to maintain and to link tables to each other, it also ensures faster execution of statements on the table.
Your best bet would be to go for a PK column like id then have a separate UNIQUE constraint for the firstName-lastName.

Related

MS SQL creating many-to-many relation with a junction table

I'm using Microsoft SQL Server Management Studio and while creating a junction table should I create an ID column for the junction table, if so should I also make it the primary key and identity column? Or just keep 2 columns for the tables I'm joining in the many-to-many relation?
For example if this would be the many-to many tables:
MOVIE
Movie_ID
Name
etc...
CATEGORY
Category_ID
Name
etc...
Should I make the junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
Movie_Category_Junction_ID
[and make the Movie_Category_Junction_ID my Primary Key and use it as the Identity Column] ?
Or:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
[and just leave it at that with no primary key or identity table] ?

I would use the second junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
The primary key would be the combination of both columns. You would also have a foreign key from each column to the Movie and Category table.
The junction table would look similar to this:
create table movie_category_junction
(
movie_id int,
category_id int,
CONSTRAINT movie_cat_pk PRIMARY KEY (movie_id, category_id),
CONSTRAINT FK_movie
FOREIGN KEY (movie_id) REFERENCES movie (movie_id),
CONSTRAINT FK_category
FOREIGN KEY (category_id) REFERENCES category (category_id)
);
See SQL Fiddle with Demo.
Using these two fields as the PRIMARY KEY will prevent duplicate movie/category combinations from being added to the table.

There are different schools of thought on this. One school prefers including a primary key and naming the linking table something more significant than just the two tables it is linking. The reasoning is that although the table may start out seeming like just a linking table, it may become its own table with significant data.
An example is a many-to-many between magazines and subscribers. Really that link is a subscription with its own attributes, like expiration date, payment status, etc.
However, I think sometimes a linking table is just a linking table. The many to many relationship with categories is a good example of this.
So in this case, a separate one field primary key is not necessary. You could have a auto-assign key, which wouldn't hurt anything, and would make deleting specific records easier. It might be good as a general practice, so if the table later develops into a significant table with its own significant data (as subscriptions) it will already have an auto-assign primary key.
You can put a unique index on the two fields to avoid duplicates. This will even prevent duplicates if you have a separate auto-assign key. You could use both fields as your primary key (which is also a unique index).
So, the one school of thought can stick with integer auto-assign primary keys, and avoids compound primary keys. This is not the only way to do it, and maybe not the best, but it won't lead you wrong, into a problem where you really regret it.
But, for something like what you are doing, you will probably be fine with just the two fields. I'd still recommend either making the two fields a compound primary key, or at least putting a unique index on the two fields.

I would go with the 2nd junction table. But make those two fields as Primary key. That will restrict duplicate entries.

Is the following acceptable foreign key usage

I have the following database, the first table users is a table containing my users, userid is a primary key.
The next is my results table, now for each user, there can be a result with an id and it can be against an exam. Is it ok in this scenario to use "id" as a primary key and "userid" as a foreign key? Is there a better way I could model this scenario?
These then link to the corresponding exams...

I would probably not have userid as a varchar. I would have that as an int as well.
So the user table is like this:
userId int
userName varchar
firstName varchar
lastName varchar
And then the forenkey in the results table table would be an int. Like this:
userId int
result varchar
id int
examid INT
Becuase if you are plaing on JOIN ing the tables together then JOIN ing on a varchar is not as fast as JOIN ing on a INT
EDIT
That depend on how much data you are planing to store. Beause you know that there is a minimum chans that GUIDs are not unique. Simple proof that GUID is not unique. I think if I would design this database I would go with an int. Becuase it feels a little bit overkill to use a GUID as a userid

Provided that each user/exam will only ever produce one result, then you could create a composite key using the userid and exam columns in the results table.
Personally though, i'd go with the arbitrary id field approach as I don't like having to pass in several values to reference records. But that's just me :).
Also, the exam field in the results table should also be a foreign key.

Another way of doing this could be to abstract the Grade Levels from the Exam, and make the Exam a unique entity (and primary key) on its own table. So this would make a Grade Levels table (pkey1 = A, pkey2 = B, etc) where the grade acts as the foreign key in your second table, thus removing an entire field.
You could also normal out another level and make a table for Subjects, which would be the foreign key for a dedicated Exam Code table. You can have ENG101, ENG102, etc for exams, and the same for the other exam codes for the subject. The benefit of this is to maintain your exams, subjects, students and grade levels as unique entities. The primary and foreign keys of each are evident, and you keep a simple maintenance future with room to scale up.
You could consider using composite keys, but this is a nice and simple way to start, and you can merge tables for indexing and compacting as required.

Please make sure you first understand Normal Forms before actually normalizing your schema.

Performance - Int vs Char(3)

I have a table and am debating between 2 different ways to store information. It has a structure like so
int id
int FK_id
varchar(50) info1
varchar(50) info2
varchar(50) info3
int forTable or char(3) forTable
The FK_id can be a foreign key to one of 6 tables so I need another field to determine which table it's for.
I see two solutions:
An integer that is a FK to a settings table which has its actual value.
A char(3) field with the a abbreviated version of the table.
I am wondering if anyone knows if one will be more beneficial speed wise over the other or if there will be any major problems using the char(3)
Note: I will be creating an indexed view on each of the 6 different values for this field. This table will contain ~30k rows and will need to be joined with much larger tables

In this case, it probably doesn't matter except for the collation overhead (A vs a vs ä va à)
I'd use char(3), say for currency code like CHF, GBP etc But if my natural key was "Swiss Franc", "British Pound" etc, I'd take the numeric.
3 bytes + collation vs 4 bytes numeric? You'd need a zillion rows or be running a medium sized country before it mattered...

Have you considered using a TinyInt. Only takes one byte to store it's value. TinyInt has a range of values between 0 and 255.

Is the reason you need a single table that you want to ensure that when the six parent tables reference a given instance of a child row that is guaranteed to be the same instance? This is the classic "multi-parent" problem. An example of where you might run into this is with addresses or phone numbers with multiple person/contact tables.
I can think of a couple of options:
Choice 1: A link table for each parent table. This would be the Hoyle architecture. So, something like:
Create Table MyTable(
id int not null Primary Key Clustered
, info1 varchar(50) null
, info2 varchar(50) null
, info3 varchar(50) null
)
Create Table LinkTable1(
MyTableId int not null
, ParentTable1Id int not null
, Constraint PK_LinkTable1 Primary Key Clustered( MyTableId, ParentTable1Id )
, Constraint FK_LinkTable1_ParentTable1
Foreign Key ( MyTableId )
References MyTable ( Id )
, Constraint FK_LinkTable1_ParentTable1
Foreign Key ( ParentTable1Id )
References ParentTable1 ( Id )
)
...
Create Table LinkTable2...LinkTable3
Choice 2. If you knew that you would never have more than say six tables and were willing to accept some denormalization and a fugly design, you could add six foreign keys to your main table. That avoids the problem of populating a bunch of link tables and ensures proper referential integrity. However, that design can quickly get out of hand if the number of parents grows.
If you are content with your existing design, then with respect to the field size, I would use the full table name. Frankly, the difference in performance between a char(3) and a varchar(50) or even varchar(128) will be negligible for the amount of data you are likely to put in the table. If you really thought you were going to have millions of rows, then I would strongly consider the option of linking tables.
If you wanted to stay with your design and wanted the maximum performance, then I would use a tinyint with a foreign key to a table that contained the list of the six tables with a tinyint primary key. That prevents the number from being "magic" and ensures that you narrow down the list of parent tables. Of course, it still does not prevent orphaned records. In this design, you have to use triggers to do that.

Because your FK cannot be enforced (since it is a variant depending upon type) by database constraint, I would strongly consider re-evaluating your design to use link tables, where each link table includes two FK columns, one to the PK of the entity and one to the PK of one of the 6 tables.
While this might seem to be overkill, it makes a lot of things simpler and adding new link tables is no more complex than accommodating new FK-types. In addition, it is more easily expandable to the case where an entity needs more than a 1-1 relationship to a single table, or needs multiple 1-1 relationships to the 6 other entities.
In a varying-FK scenario, you can lose database consistency, you can join to the wrong entity by neglecting to filter on type code, etc.
I should add that another huge benefit of link tables is that you can link to tables which have keys of varying data types (ints, natural keys, etc) without having to add surrograte keys or stored the key in a varchar or similar workarounds which are prone to problems.

I think a small integer (tinyint) is called for here. An "abbreviated version" looks too much like a magic number.
I also think performance wise the integer should beat the char(3).

First off, a 50 character Id that is not globally unique sounds a little scary. Do the IDs have some meaning? If not, you can easily get a GUID in less space. Personally, I am a big fan of making things human readable whenever possible. I would, and have, put the full name in graphs until I needed to do otherwise. My preference would be to have linking tables for each possible related table though.
Unless you are talking about really large scale, you are much better off decreasing the size of the IDs and taking a few more characters for the name of the table. For really large scale, I would decrease the size of the IDs and use an integer.
Jacob

Can I have 2 unique columns in the same table?

I have 2 tables:
roomtypes[id(PK),name,maxAdults...]
features(example: Internet in room, satelite tv)
Can both id and name field be unique in the same table in mysql MYISAM?
If the above is posible, I am thinking of changing the table to:
features[id(PK),name,roomtypeID] ==> features[id(PK),name,roomtypeNAME]
...because it is helping me not to do extra querying in presentation for features because the front end users can't handle with IDs.

Of course, you can make one of them PRIMARY and one UNIQUE. Or both UNIQUE. Or one PRIMARY and four UNIQUEs, if you like

Yes, you can define UNIQUE constraints to columns other than the primary key in order to ensure the data is unique between rows. This means that the value can only exist in that column once - any attempts to add duplicates will result in a unique constraint violation error.
I am thinking of changing the FEATURES table to features[id(PK), name, roomtypeNAME] because it is helping me not to do extra querying in presentation for features because the front end users can't handle with IDs.
There's two problems:
A unique constraint on the ROOM_TYPE_NAME wouldn't work - you'll have multiple instances of a given room type, and a unique constraint is designed to stop that.
Because of not using a foreign key to the ROOM_TYPES table, you risk getting values like "Double", "double", "dOUBle"
I recommend sticking with your original design for sake of your data; your application is what translates a room type into its respective ROOM_TYPE record while the UI makes it presentable.

I would hope so otherwise MySQL is not compliant with the SQL standard. You can only have one primary key but you can mark other columns as unique.
In SQL, this is achieved with:
create table tbl (
colpk char(10) primary key,
coluniq char(10) unique,
colother char(10)
);
There are other ways to do it (particularly with multi-part keys) but this is a simple solution.

Yes you can.
Also keep in mind that MySQL allow NULL values in unique columns, whereas a column that is a primary key cannot have a NULL value.

1 RoomType may have many Features
1 Feature may be assigned to many RoomTypes
So what type of relationship do i have? M:N ?
You have there a many-to-many relationship, which has to be represented by an extra table.
That relationship table will have 2 fields: the PK of RoomTypes and the PK of Features.
The PK of the relationship table will be made of those 2 fields.
If that's usefull, you can add extra fields like the Quantity.
I would like to encourage you to read about database Normalization, which is he process of creating a correct design for a relational database. You can Google for that, or look eventually here (there are plenty of books/web pages on this)

Thanks again for very helpful answers.
1 roomType may have many features
1 feature may be assigned to many roomTypes
So what type of relationship do i have? M:N ?
If yes the solution I see is changing table structure to
roomTypes[id,...,featuresIDs]
features[id(PK),name,roomtypeIDs] multiple roomTypesIDs separated with comma?

SQL Server 2008 - Table - Clarifications

I am new to SQL Server 2008 database development.
Here I have a master table named ‘Student’ and a child table named ‘Address’. The common column between these tables is ‘Student ID’.
My doubts are:
Do we need to put ‘Address Id’ in the ‘Address’ table and make it primary key? Is it mandatory? ( I won’t be using this ‘Address Id’ in any of my reports )
Is Primary key column a must in any table?
Would you please help me on these.
Would you please also refer best links/tutorials for SQL Server 2008 database design practices (If you are aware of) which includes naming conventions, best practices, SQL optimizations etc. etc.

1) Yes, having an ADDRESS_ID column as the primary key of the ADDRESS table is a good idea.
But having the STUDENT_ID as a foreign key in the ADDRESS table is not a good idea. This means that an address record can only be associated to one student. Students can have roommates, so they'd have identical addresses. Which comes back to why it's a good idea to have the ADDRESS_ID column as a primary key, as it will indicate a unique address record.
Rather than have the STUDENT_ID column in the ADDRESS table, I'd have a corrollary/xref/lookup table between the STUDENT and ADDRESS tables:
STUDENT_ADDRESSES_XREF
STUDENT_ID, pk, fk to STUDENTS table
ADDRESS_ID, pk, fk to ADDRESS table
EFFECTIVE_DATE, date, not null
EXPIRY_DATE, date, not null
This uses a composite primary key, so that only one combination of the student & address exist. I added the dates in case there was a need to know when exactly, because someone could move back home/etc after all.
Most importantly, this works off the ADDRESS_ID column to allow for a single address to be associated to multiple people.
2) Yes, defining a primary key is frankly a must for any table.
In most databases, the act also creates an index - making searching more efficient. That's on top of the usual things like making sure a record is a unique entry...

Every table should have a way to uniquely and unambiguously identify a record. Make AddressID the primary key for the address table.
Without a primary key, the database will allow duplicate records; possibly creating join problems or trigger problems (if you implement them) down the road.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas