Why do we require secondary indices in DBMS?

Why do we require secondary indices in DBMS? - sql

I get the point that primary indices are unique to each record and hence retrieving a record gets faster using primary indexing. What happens when we use secondary indexing.
Of what I can think of,
ID Name School
1 John XYZ
2 Roger XYZ
3 Ray ABC
4 Matt KJL
5 Roger ABC
if we have secondary indexing on Name, then it will help me retrieve records relevant to names and not with id hence it would not restrict me to one record if I query a record for Roger and I would be able to get result pertaining to both Rogers. Hence if the table is extensively queried based on the secondary index, it should be used.
Am I right?

Apart from speeding up specific queries, perhaps the most common case for secondary indexes is to speed up checking of UNIQUE constraints. Consider e.g. a table
CREATE TABLE Person (
id int primary key,
fname text not null,
lname text not null,
date_of_birth date not null,
...
UNIQUE (fname, lname, date_of_birth)
)
Here we want to enforce the UNIQUE constraint to ensure the same person doesn't appear in the table multiple times under different ids. But at the same time we wouldn't want to make (fname, lname, date_of_birth) the primary key, because a person's name could potentially change, and because using 3 attributes as reference can be cumbersome.
Now, when inserting a new record into the table, the DBMS needs to check whether it already contains another tuple with the same (fname, lname, date_of_birth), and a secondary index on these attributes can help speed this check up.
Note that UNIQUE constraints automatically generate their indexes, so there is no need to create them explicitly.
Another common case where secondary indexes are required (and must be created explicitly) are foreign key constraints that target attributes that do not make up the primary key for the target table.

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?

Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

Can a primary key be equal to a different column?

I know that a primary key must be unique, but is it okay for a primary key to be equal to a different column in the same table by coincidence?
For instance, I have 2 tables. One table is called person that holds information about a person (ID, email, telephone, address, name). The other table is staff (ID, pID(person ID), salary, position).
In staff the ID column is the primary key and is used to uniquely identify a staff member. The number is from 1 - 100. However, the pID (person ID) may be equal to the ID. For instance the staff ID may be 1 and the pID that it references to may be equal to 1.
Is that okay?

The job of the primary key is to uniquely and reliably identify each row - therefore, it must be unique and NOT NULL - anything else is irrelevant.
If you just happen to have a second column with the exact same values - I'd be wondering why that is the case - but that doesn't in any way affect the primary key negatively.

Primary key of a table must be unique and not null. There are no restrictions on uniquity between tables. It's 100% up to you.

Yes. There's no checking of relationships between different columns in a table.
The restriction you're worried about doesn't even make sense. Suppose you had a table for persons with columns ID, name, and year_of_birth. It wouldn't allow someone who was born in 1975 to have ID = 1975.

Should I use a unique constraint in a table even though it isn't necessarily required?

In Microsoft SQL Server, when creating tables, are there any downsides to using a unique constraint on a column even though you don't really need it to be unique?
An example would be descriptions for say a role in a user management system:
CREATE TABLE Role
(
ID TINYINT PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title CHARACTER VARYING(32) NOT NULL UNIQUE,
Description CHARACTER VARYING(MAX) NOT NULL UNIQUE
)
My fear is that validating this constraint when doing frequent insertions in other tables will be a very time consuming process. I am unsure as to how this constraint is validated, but I feel like it could be done in a very efficient way or as a linear comparison.

Your fear becomes true: UNIQUE constraint are implemented as indices, and this is time and space consuming.
So, whenever you insert a new row, the database have to update the table, and also one index for each unique constraint.
So, according to you:
using a unique constraint on a column even though you don't really need it to be unique
the answer is no, don't use it. there are time and space downsides.
Your sample table would need a clustered index for the Id, and 2 extra indices, one for each unique constraint. This takes up space, and time to update the 3 indices on the inserts.
This would only be justified if you made queries filtering by those fields.
BY THE WAY:
The original post sample table have several flaws:
that syntax is not SQL Server syntax (and you tagged this as SQL Server)
you cannot create an index in a varchar(max) column
If you correct the syntax and create this table:
CREATE TABLE Role
(
ID tinyint PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title varchar(32) NOT NULL UNIQUE,
Description varchar(32) NOT NULL UNIQUE
)
You can then execute sp_help Role and you'll find the 3 indices.

The database creates an index which backs up the UNIQUE constraint, so it should be very low-cost to do the uniqueness check.
http://msdn.microsoft.com/en-us/library/ms177420.aspx
The Database Engine automatically creates a UNIQUE index to enforce the uniqueness requirement of the UNIQUE constraint. Therefore, if an attempt to insert a duplicate row is made, the Database Engine returns an error message that states the UNIQUE constraint has been violated and does not add the row to the table. Unless a clustered index is explicitly specified, a unique, nonclustered index is created by default to enforce the UNIQUE constraint.

Is it typically a good practice to constrain it if you know the data
will always be unique but it doesn't necessarily need to be unique for
the application to function correctly?
My question to you: would it make sense for two roles to have different titles but the same description? e.g.
INSERT INTO Role ( Title , Description )
VALUES ( 'CEO' , 'Senior manager' ),
( 'CTO' , 'Senior manager' );
To me it would seem to devalue the use of the description; if there were many duplications then it might make more sense to do something more like this:
INSERT INTO Role ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
INSERT INTO SeniorManagers ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
But then again you are not expecting duplicates.
I assume this is a low activity table. You say you fear validating this constraint when doing frequent insertions in other tables. Well, that will not happen (unless there is a trigger we cannot see that might update this table when another table is updated).
Personally, I would ask the designer (business analyst, whatever) to justify not applying a unique constraint. If they cannot then I would impose the unqiue constraint based on common sense. As is usual for such a text column, I would also apply CHECK constraints e.g. to disallow leading/trailing/double spaces, zero-length string, etc.

On SQL Server, the data type tinyint only gives you 256 distinct values. No matter what you do outside of the id column, you're not going to end up with a very big table. It will surely perform quickly even with a dozen indexed columns.
You usually need at least one unique constraint besides the surrogate key, though. If you don't have one, you're liable to end up with data like this.
1 First title First description
2 First title First description
3 First title First description
...
17 Third title Third description
18 First title First description
Tables that permit data like that are usually wrong. Any table that uses foreign key references to this table won't be able to report correctly, say, the number of "First title" used.
I'd argue that allowing multiple, identical titles for roles in a user management system is a design error. I'd probably argue that "title" is a really bad name for that column, too.

How to enforce uniques across multiple tables

I have the following tables in MySQL server:
Companies:
- UID (unique)
- NAME
- other relevant data
Offices:
- UID (unique)
- CompanyID
- ExternalID
- other data
Employees:
- UID (unique)
- OfficeID
- ExternalID
- other data
In each one of them the UID is unique identifier, created by the database.
There are foreign keys to ensure the links between Employee -> Office -> Company on the UID.
The ExternalID fields in Offices and Employees is the ID provided to my application by the Company (my client(s) actually). The clients does not have (and do not care) about my own IDs, and all the data my application receives from them is identified solely based on their IDs (i.e. ExternalID in my tables).
I.e. a request from the client in pseudo-language is like "I'm Company X, update the data for my employee Y".
I need to enforce uniqueness on the combination of CompanyID and Employees.ExternalID, so in my database there will be no duplicate ExternalID for the employees of the same company.
I was thinking about 3 possible solutions:
Change the schema for Employees to include CompanyID, and create unique constrain on the two fields.
Enforce a trigger, which upon update/insert in Employees validates the uniqueness.
Enforce the check on application level (i.e. my receiving service).
My alternative-dbadmin-in-me sais that (3) is the worst solution, as it does not protect the database of inconsistency in case of application bug or something else, and most probably will be the slowest one.
The trigger solution may be what I want, but it may become complicated, especially if a multiple inserts/updates need to be performed in a single statement, and I'm not sure about the performance vs. (1).
And (1) looks the fastest and easiest approach, but kind of goes against my understanding of relational model.
What SO DB experts opinion is about pros and cons of each of the approaches, especially if there is a possibility for adding an additional level of indirection - i.e. Company -> Office -> Department -> Employee, and the same uniqueness needs to be preserved (Company/Employee).

You're right - #1 is the best option.
Granted, I would question it at first glance (because of shortcutting) but knowing the business rule to ensure an employee is only related to one company - it makes sense.
Additionally, I'd have a foreign key relating the companyid in the employee table to the companyid in the office table. Otherwise, you allow an employee to be related to a company without an office. Unless that is acceptable...
Triggers are a last resort if the relationship can not be demonstrated in the data model, and servicing the logic from the application means the logic is centralized - there's no opportunity for bad data to occur, unless someone drops constraints (which means you have bigger problems).

Each of your company-provided tables should include CompanyID into the `UNIQUE KEY' over the company-provided ids.
Company-provided referential integrity should use company-provided ids:
CREATE TABLE company (
uid INT NOT NULL PRIMARY KEY,
name TEXT
);
CREATE TABLE office (
uid INT NOT NULL PRIMARY KEY,
companyID INT NOT NULL,
externalID INT NOT NULL,
UNIQIE KEY (companyID, externalID),
FOREIGN KEY (companyID) REFERENCES company (uid)
);
CREATE TABLE employee (
uid INT NOT NULL PRIMARY KEY,
companyID INT NOT NULL,
officeID INT NOT NULL,
externalID INT NOT NULL,
UNIQIE KEY (companyID, externalID),
FOREIGN KEY (companyID) REFERENCES company(uid)
FOREIGN KEY (companyID, officeID) REFERENCES office (companyID, externalID)
);
etc.

Set auto_increment_increment to the number of table you have.
SET auto_increment_increment = 3; (you might want to set this in your my.cnf)
Then manually set the starting auto_increment value of each table to different values
first table to 1, second table to 2, third table to 3
Table 1 will have values like 1,4,7,10,13,etc
Table 2 will have values like 2,5,8,11,14,etc
Table 3 will have values like 3,6,9,12,15,etc
Of course this is just ONE option, personally I'd just make it a combo value. Could be as simple as TableID, AutoincrementID, Where the TableID is constant in all rows.

Database Design

I am making a webapp right now and I am trying to get my head around the database design.
I have a user model(username (which is primary key), password, email, website)
I have a entry model(id, title, content, comments, commentCount)
A user can only comment on an entry once. What is the best and most efficient way to go about doing this?
At the moment, I am thinking of another table that has username (from user model) and entry id (from entry model)
**username id**
Sonic 4
Sonic 5
Knuckles 2
Sonic 6
Amy 15
Sonic 20
Knuckles 5
Amy 4
So then to list comments for entry 4 it searches for id=4.
On a side note:
Instead of storing a commentCount, would it be better to calculate the comment count from the database each time when needed?

Your design is basically sound. Your third table should be named something like UsersEntriesComments, with fields UserName, EntryID and Comment. In this table, you would have a compound primary key consisting of the UserName and EntryID fields; this would enforce the rule that each user can comment on each entry only once. The table would also have foreign key constraints such that UserName must be in the Users table, and EntryID must be in the Entries table (the ID field, specifically).
You could add an ID field to the Users table, but many programmers (myself included) advocate the use of "natural" keys where possible. Since UserNames must be unique in your system, this is a perfectly valid (and easily readable) primary key.
Update: just read your question again. You don't need the Comments or the CommentsCount fields in your Entries table. Comments would properly be stored in the UsersEntriesComments table, and the counts would be calculated dynamically in your queries (saving you the trouble of updating this value yourself).
Update 2: James Black makes a good point in favor of not using UserName as the primary key, and instead adding an artificial primary key to the table (UserID or some such). If you use UserName as the primary key, allowing a user to change their user name is more difficult, as you have to change the username in all the related tables as well.

What exactly do you mean by
entry model(id, title, content, **comments**, commentCount)
(emphasis mine)? Since it looks like you have multiple comments per entity, they should be stored in a separate table:
comments(id, entry_id, content, user_id)
entry_id and user_id are foreign keys to respective tables. Now you just need to create a unique index on (entry_id, user_id) to ensure user can only add one comment per entity.
Also, you may want to create a surrogate (numeric, generated via sequence / identity) primary key for your users table instead of making user name your PK.

Here's my recommendation for your data model:
USERS table
USER_ID (pk, int)
USER_NAME
PASSWORD
EMAIL
WEBSITE
ENTRY table
ENTRY_ID (pk, int)
ENTRY_TITLE
CONTENT
ENTRY_COMMENTS table
ENTRY_ID (pk, fk)
USER_ID (pk, fk)
COMMENT
This setup allows an ENTRY to have 0+ comments. When a comment is added, the primary key being a composite key of ENTRY_ID and USER_ID means that the pair can only exist once in the table (IE: 1, 1 won't allow 1, 1 to be added again).
Do not store counts in a table - use a VIEW for that so the number can be generated based on existing data at the time of execution.

I wouldn't use the username as a primary ID. I would make a numeric id with autoincrement
I would use that new id in the relations table with a unique key on the 2 fields

Even though it isn't in the question, you may want to have a userid that is the primary key, otherwise it will be difficult if the user is allowed to change their username, or make certain people know you cannot change your username.
Make the joined table have a unique constraint on the userid and entryid. That way the database forces that there is only one comment/entry/user.
It would help if you specified a database, btw.

It sounds like you want to guarantee that the set of comments is unique with respect to username X post_id. You can do this by using a unique constraint, or if your database system doesn't support that explicitly, with an index that does the same. Here's some SQL expressing that:
CREATE TABLE users (
username VARCHAR(10) PRIMARY KEY,
-- any other data ...
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
-- any other data ...
);
CREATE TABLE comments (
username VARCHAR(10) REFERENCES users(username),
post_id INTEGER REFERENCES posts(post_id),
-- any other data ...
UNIQUE (username, post_id) -- Here's the important bit!
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas