Table Primary Key Length

Table Primary Key Length - sql

I am wondering does length of a primary key have a non-trivial effect on performance. For example consider the following table definitions,
CREATE TABLE table1 (
id VARCHAR(50) PRIMARY KEY,
first_column VARCHAR(50) NULL,
second_column VARCHAR(75) NOT NULL
);
CREATE TABLE table2(
id VARCHAR(250) PRIMARY KEY,
first_column VARCHAR(50) NULL,
second_column VARCHAR(75) NOT NULL
);
Does table1 performs better than table2, why?

In general, performance will depend more on what is stored than on the length of a varchar column. If both the varchar(50) and varchar(250) columns have a median length of 40 characters, they'll probably have similar performance.
In some dbms, the primary key is also a clustered key by default. But if your primary key is unsuitable as a clustered key, you can usually tell the dbms to not use a clustered key.

yes the primary key with varchar(50) will be more efficient. as You know the primary key holds Clustered Index on it, and as soon as new record is entered in the table, the value will be arranged in clustered index internally. You will see this difference in billions of records.
so its generally advised to have a natural primary key. Like id's etc.

Related

What is the difference between a constraint primary key and normal primary key?

I know Its possible to use the constraint to put multiple fields like this:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CONSTRAINT PK_Person PRIMARY KEY (ID,LastName)
);
But If we had to compare these two below, is there is any difference?
Create table client
(cod_clt int identity not null,
Nom t,
Dn datetime not null,
Credit numeric(6,2) not null,
Constraint x1 check (credit between 100 and 1456.25),
Constraint x2 primary key (cod_clt)
)
and this:
Create table client
(cod_clt int primary key,
Nom t,
Dn datetime not null,
Credit numeric(6,2) not null,
Constraint x1 check (credit between 100 and 1456.25)
)

There's 6 classes of constraints in SQL server:
NOT NULL
Unique Key
Primary Key
Foriegn Key
Check
Default
(Some, including myself, will argue that a column Data Type and Unique indexes are also types of constraints but I digress.)
In SQL Server there are two types of constraints: Column level and Table level.
NOT NULL is a column level constraint, all the others can be table or column level. Primary and foreign keys can consist of one or more columns; when they consist of more than one column they are known as a "composite key". Composite keys must be table-level.
The most notable difference between column and table level constraints is that table level allows you to give your constraints a meaningful name which is why I personally prefer them.
In your first example you have a table level primary key constraint and it is a composite key. In your last two examples don't have a composite key which is why it can be both table and column level. The key difference in your last two examples is that you are able to name the table level primary key but not the column level one. This is a big deal for people who properly manage their metadata.
Lastly, one thing that makes Primary Key & Unique constraints are special in that, when you create them, you can create an index. The default behavior for a primary key is to also create a clustered index. The decision to create a clustered index and/or unique index is a big one so I include the keywords clustered or nonclustered when I define my primary (and unique) keys so as not to depend on default system behavior for this.
Here's a couple good links about constraints:
https://technet.microsoft.com/en-us/library/ms189862(v=sql.105).aspx - (Microsoft)
https://www.w3schools.com/sql/sql_constraints.asp (W3 Schools)

Is my query correct when I set primary key for 3 columns in a table?

In my case, I have only 1 candidate may go with 1 job at the time so they are must be 2 primary key.
Then, a column is as JobApplicationId use for the table CandidateDetail as a foreign key.
Is that correct when I decide to set these 3 columns above as primary key or there are other ways to address my problem here?
CREATE TABLE Candidate(
CandidateId int identity primary key,
FullName nvarchar(50)
)
CREATE TABLE Job(
JobId int identity primary key,
JobTitle nvarchar(50)
)
CREATE TABLE JobApplication(
JobApplicationId int identity,
JobId int,
CandidateId int,
CreatedDate datetime,
primary key(JobApplicationId, JobId, CandidateId)
)
CREATE TABLE CandidateDetail(
CandidateDetailId int identity primary key,
JobApplicationId int,
[Description] nvarchar(300)
)
ALTER TABLE JobApplication ADD CONSTRAINT fk_JobApplication_Job FOREIGN KEY (JobId) REFERENCES Job(JobId)
ALTER TABLE JobApplication ADD CONSTRAINT fk_JobApplication_Candidate FOREIGN KEY (CandidateId) REFERENCES Candidate(CandidateId)
ALTER TABLE CandidateDetail ADD CONSTRAINT fk_CandidateDetail_JobApplication FOREIGN KEY (JobApplicationId) REFERENCES JobApplication(JobApplicationId)

Instead of a primary key with three columns you could just have JobApplicationId as the primary key and a unique constraint on JobId, CandidateId.
Otherwise, two rows with JobApplicationId=1, JobId=1, CandidateId=1 and JobApplicationId=2, JobId=1, CandidateId=1 would still be valid in terms of your current primary key approach, but would be invalid in terms of the business case.

From both a performance and usability perspective, a compound primary key can be a hassle and can create performance issues. Personally, I would choose JobApplicationId as the primary key (because this is an identity column and will be unique for each record). Then, if you need to constrain the table so that JobId and CandidateId are always unique (not allowing more than 1 record for any given candidate and the job they've applied for) then I would use a compound Unique Constraint.
However, I would suggest that you evaluate those requirements more closely because what if a candidate applies for the same position in a different time frame? It might stand to reason that having the same candidate applied to the same job more than once in that table might be valid data.

In SQL does a Primary Key in a create table enforce uniqueness?

Im wondering if on a relational table I set the two values below as a PRIMARY KEY if that automatically makes the table know that all entries should be unique....
CREATE TABLE UserHasSecurity
(
userID int REFERENCES Users(userID) NOT NULL,
securityID int REFERENCES Security(securityID) NOT NULL,
PRIMARY KEY(userID,securityID)
)
or do I need to be more explicit like this...
CREATE TABLE UserHasSecurity
(
userID int REFERENCES Users(userID) NOT NULL,
securityID int REFERENCES Security(securityID) NOT NULL,
PRIMARY KEY(userID,securityID),
UNIQUE(userID,securityID)
)

You don't need UNIQUE here. PRIMARY KEY will make sure there is no duplicate (userID,securityID) pairs.

No, you don't need to specify UNIQUE in addition to PRIMARY KEY. A primary key by definition must be unique.

A PRIMARY KEY has to be unique, so you only need to declare as a primary key. The underlying index is unique by definition.
Creating Unique Indexes

What data type is optimal for clustered index of a table published by using transactional replication?

We have an application which stores data in SQL server database. (Currently we support SQL Server 2005 and higher). Our DB has more than 400 tables. The structure of the database is not ideal. The biggest problem is that we have a lot of tables with GUIDs (NEWID()) as Primary CLUSTERED Keys. When I asked our main database architect “why?”, he said: “it is because of the replication”. Our DB should support transactional replication. Initially, all primary keys were INT IDENTITY(1,1) CLUSTERED. But later when it came to replication support, this fields were replaced by UNIQUEIDENTIFIER DEFAULT NEWID(). He said “otherwise it was a nightmare to deal with replication”. NEWSEQUENTIALID() was not supported by SQL 7/2000 at that time. So now we have tables with the following structure:
CREATE TABLE Table1(
Table1_PID uniqueidentifier DEFAULT NEWID() NOT NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table1 PRIMARY KEY CLUSTERED (Table1_PID)
)
GO
CREATE TABLE Table2(
Table2_PID uniqueidentifier DEFAULT NEWID() NOT NULL,
Table1_PID uniqueidentifier NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table2 PRIMARY KEY CLUSTERED (Table2_PID),
CONSTRAINT FK_Table2_Table1 FOREIGN KEY (Table1_PID) REFERENCES Table1 (Table1_PID)
)
GO
All the tables actually have a lot of fields (up to 35) and up to 15 non-clustered indexes.
I know that a GUID that is not sequential - like one that has it's values generated in the client (using .NET) OR generated by the NEWID() SQL function (like in our case) is a horribly bad choice to be clustered index for two reasons:
fragmentation
size
I also know that A GOOD clustering key is that it is:
unique,
narrow,
static,
ever-increasing,
non-nullable,
and fixed-width
For more details on the reasons behind this, check out the following great video: http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx.
So, INT IDENTITY really is the best choice. BIGINT IDENTITY is also good, but typically an INT with 2+ billion rows should be sufficient for the vast majority of tables.
When our customers began suffering from fragmentation, it was decided to make primary keys NON-clustered. As a result, those tables remained without a clustered index. In other words, those tables were turned into HEAPS. I personally don’t like this solution because I am sure that heap tables are not part of a good database design. Please, check this SQL Server Best Practices Article: http://technet.microsoft.com/en-us/library/cc917672.aspx.
Currently we consider two options to improve the database structure:
The first option is to replace DEFAULT NEWID() by DEFAULT NEWSEQUENTIALID() for the Primary clustered key:
CREATE TABLE Table1_GUID (
Table1_PID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table1 PRIMARY KEY CLUSTERED (Table1_PID)
)
GO
The second option is to add INT IDENTITY column to each table and make it the CLUSTERED UNIQUE index, leaving primary key NOT clustered. So the Table1 will look like:
CREATE TABLE Table1_INT (
Table1_ID int IDENTITY(1,1) NOT NULL,
Table1_PID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table1 PRIMARY KEY NONCLUSTERED (Table1_PID),
CONSTRAINT UK_Table1 UNIQUE CLUSTERED (Table1_ID)
)
GO
Table1_PID will be used for replication, (that’s why we left it as PK), while Table1_ID will not be replicated at all.
The long story short, after we run benchmarks to see which approach is better, we found that both solutions are not good:
The first approach (Table1_GUID) revealed the following shortcomings: although sequential GUID's are definitely a lot better than regular random GUIDs, they are still four times larger than an INT (16 vs 4 byte) and this is a factor in our case because we have lots of rows in our tables (up to 60 million), and lots of non-clustered indexes on that tables (up to 15). The clustering key is being added to each and every non-clustered index, so that significantly increases the negative effect of having 16 vs. 4 bytes in size. More bytes means more pages on disk and in SQL Server RAM and thus more disk I/O and more work for SQL Server.
To be more precise, after I inserted 25mln rows of real data to each table and then created 15 non-clustered indexes on each table, I saw a big difference in the space used by the tables:
EXEC sp_spaceused 'Table1_GUID' -- 14.85 GB
EXEC sp_spaceused 'Table1_INT' -- 11.68 GB
Furthermore, the test showed that INSERTs into Table1_GUID were a bit slower than to Table1_INT.
The second approach (Table1_INT) revealed that in most queries (SELECT) joining two tables on Table1_INT.Table1_PID = Table2_INT.Table1_PID execution plan became worse because additional Key Lookup operator appeared.
Now the question: I believe there should be a better solution for our problem. If you could recommend me something or point me to a good resource, I would appreciate it greatly. Thank you in advance.
Updated:
Let me give you an example of a SELECT statement where additional Key Lookup operator appears:
--Create 2 tables with int IDENTITY(1,1) as CLUSTERED KEY.
--These tables have one-to-many relationship.
CREATE TABLE Table1_INT (
Table1_ID int IDENTITY(1,1) NOT NULL,
Table1_PID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table1_INT PRIMARY KEY NONCLUSTERED (Table1_PID),
CONSTRAINT UK_Table1_INT UNIQUE CLUSTERED (Table1_ID)
)
GO
CREATE TABLE Table2_INT(
Table2_ID int IDENTITY(1,1) NOT NULL,
Table2_PID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL,
Table1_PID uniqueidentifier NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table2_INT PRIMARY KEY NONCLUSTERED (Table2_PID),
CONSTRAINT UK_Table2_INT UNIQUE CLUSTERED (Table2_ID),
CONSTRAINT FK_Table2_Table1_INT FOREIGN KEY (Table1_PID) REFERENCES Table1_INT (Table1_PID)
)
GO
And create other two tables for comperison:
--Create the same 2 tables, BUT with uniqueidentifier NEWSEQUENTIALID() as CLUSTERED KEY.
CREATE TABLE Table1_GUID (
Table1_PID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table1_GUID PRIMARY KEY CLUSTERED (Table1_PID),
)
GO
CREATE TABLE Table2_GUID(
Table2_PID uniqueidentifier DEFAULT NEWSEQUENTIALID() NOT NULL,
Table1_PID uniqueidentifier NULL,
Field1 varchar(50) NULL,
FieldN varchar(50) NULL,
CONSTRAINT PK_Table2_GUID PRIMARY KEY CLUSTERED (Table2_PID),
CONSTRAINT FK_Table2_Table1_GUID FOREIGN KEY (Table1_PID) REFERENCES Table1_GUID (Table1_PID)
)
GO
Now run the following select statements and look at the execution plan to compare:
SELECT T1.Field1, T2.FieldN
FROM Table1_INT T1
INNER JOIN Table2_INT T2
ON T1.Table1_PID = T2.Table1_PID;
SELECT T1.Field1, T2.FieldN
FROM Table1_GUID T1
INNER JOIN Table2_GUID T2
ON T1.Table1_PID = T2.Table1_PID;

I personally use INT IDENTITY for most of my primary and clustering keys.
You need to keep apart the primary key which is a logical construct - it uniquely identifies your rows, it has to be unique and stable and NOT NULL. A GUID works well for a primary key, too - since it's guaranteed to be unique. A GUID as your primary key is a good choice if you use SQL Server replication, since in that case, you need an uniquely identifying GUID column anyway.
The clustering key in SQL Server is a physical construct is used for the physical ordering of the data, and is a lot more difficult to get right. Typically, the Queen of Indexing on SQL Server, Kimberly Tripp, also requires a good clustering key to be unique, stable, as narrow as possible, and ideally ever-increasing (which a INT IDENTITY is).
See her articles on indexing here:
GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues...
Ever-increasing clustering key - the Clustered Index Debate..........again!
Disk space is cheap - that's not the point!
and also see Jimmy Nilsson's The Cost of GUIDs as Primary Key
A GUID is a really bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance. Also, the clustering key row(s) is also stored in each and every entry of each and every non-clustered (additional) index, so you really want to keep it small - GUID is 16 byte vs. INT is 4 byte, and with several non-clustered indices and several million rows, this makes a HUGE difference.
In SQL Server, your primary key is by default your clustering key - but it doesn't have to be. You can easily use a GUID as your NON-Clustered primary key, and an INT IDENTITY as your clustering key - it just takes a bit of being aware of it.

Foreign keys issue

I have a table created using the query
CREATE TABLE branch_dim (
branch_id numeric(18,0) NOT NULL,
country_name varchar(30),
island_name char(30),
region_name varchar(30),
branch_name varchar(30),
region_manager varchar(30),
marketing_manager varchar(30),
branch_manager varchar(30),
promoter_main varchar(30),
promoter_other varchar(30),
PRIMARY KEY (branch_id,island_name)
) ON branch_dim_scheme(island_name)
Now I have another table
CREATE TABLE order_fact (
branch_id numeric(18,0) NOT NULL,
product_id numeric(18,0) NOT NULL,
order_id numeric(18,0) NOT NULL,
day_id numeric(18,0) NOT NULL,
FOREIGN KEY (branch_id) REFERENCES branch_dim (branch_id),
)
First query has partition in it that is why I have 2 primary keys. Now if I run the second query I am getting the error
"There are no primary keys or
candidate keys in the referenced table
'branch_dim' that matches the
reference column list in the foreign
key 'FK_order_fac_branc_10234AD'"
What might be the problem ?

You've defined the primary key on branch_dim as a composite primary key made up of branch_id and island_name. When you create order_fact, you're trying to reference only branch_id as your foreign key.

Your table has a composite primary key :
CREATE TABLE branch_dim (
PRIMARY KEY (branch_id,island_name)
Hence, any foreign key reference to that table also must use both elements for its foreign key (you need to reference the key, the whole key, and nothing but the key - so help you Codd :-):
CREATE TABLE order_fact (
branch_id numeric(18,0) NOT NULL,
island_name char(30),
product_id numeric(18,0) NOT NULL,
order_id numeric(18,0) NOT NULL,
day_id numeric(18,0) NOT NULL,
FOREIGN KEY (branch_id, island_name)
REFERENCES branch_dim (branch_id, island_name)
Word of advice: for anything longer than 5 character or so, I would never use CHAR(x) as the data type - this will create a field that is always 30 characters long - whether you store that many chars in it or not. If you store less, the value is padded with spaces to the defined length (30 chars).
For anything larger than 5 or so characters, I would recommend to always use VARCHAR instead !
Same goes for numeric(18,0) : for an ID field, I would always use INT - much nicer, cleaner, smaller, just plain better!

You need to make the primary key of branch_dim just branch_id and add an index on island_name. Also, are you branch_ids really numeric(18, 0)? If so I would make a surrogate primary key (something that can be auto incremented, int or bigint, identity).
As it is, your primary key (and thus clustered index) is very wide. This will degrade performance and I'm guessing, in your scenario, fragment your clustered index (bad).

I solved the problem by making the primary key field (branch_id) as NONCLUSTERED and UNIQUE and made the island_name field as and so i had only one primary key and my partition key is island_name. This solved my problem. Thanks all for the help..

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas