UNIQUE argument for INDEX creation - what's for? - sql

Why does INDEX creation statement have UNIQUE argument?
As I understand, the non-clustered index contains a bookmark, a pointer to a row, which should be unique to distinguish even non-unique rows,
so insuring non-clustered index to be unique ?
Correct?
So, do I understand that no-unique index can be only on clustered table? since
"A clustered index on a view must be unique" [1]
Since "The bottom, or leaf, level of the clustered index contains the actual data rows of the table" [1], do I understand correctly that the same effect as UNIUE on clustered index can be achieved by unique constraint on (possibly all or part of) columns of a table [2]?
Then, what does bring UNIQUE argument for index?
except confusion to basic concepts definitions [3]
Update:
This is again the same pitfall - explaining something already explained many times based on undefined terms converting all explanation to never-ending guessing game.
Please see my subquestion [4] which is really re-wording of this same question here.
Update2:
The problem is in ambiguous, lacking definitions or improper use of terms in improper contexts. If index is defined as structure serving to (find and) identify/point to real data, then non-unique or NULL indexes do not make any sense. Bye
Cited:
[1]
CREATE INDEX (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms188783.aspx
[2]
CREATE TABLE (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms174979.aspx
[3]
Unique index or unique key?
Unique index or unique key?
[4]
what is index and can non-clustered index be non-unique?
what is index and can non-clustered index be non-unique?

While a non-unique index is sufficient to distinguish between rows (as you said), the UNIQUE index serves as a constraint: it will prevent duplicates from being entered into the database - where "duplicates" are rows containing the same data in the indexed columns.
Example:
Firstname | Lastname | Login
================================
Joe | Smith | joes
Joe | Taylor | joet
Susan | Smith | susans
Let's assume that login names are by default generated from first name + first letter of last name.
What happens when we try to add Joe Sciavillo to the database? Normally, the system would happily generate loginname joes and insert (Joe,Sciavillo,joes). Now we'd have two users with the same username - probably a Bad Thing.
Now let's say we have a UNIQUE index on Login column - the database will check that no other row with the same data already exists, before it allows inserting the new row. In other words, the attempt to insert another joes will be rejected, because that data wouldn't be unique in that row any more.
Of course, you could have unique indexes on multiple columns, in which case the combination of data would have to be unique (e.g. a unique index on Firstname,Lastname will happily accept a row with (Joe,Badzhanov), as the combination is not in the table yet, but would reject a second row with (Joe,Smith))

The UNIQUE index clause is really just a quirk of syntax in SQL Server and some other DBMSs. In Standard SQL, uniqueness constraints are implemented through the use of the PRIMARY KEY and UNIQUE CONSTRAINT syntax, not through indexes (there are no indexes in standard SQL).
The mechanism SQL Server uses internally to implement uniqueness constraints is called a unique index. A unique index gets created automatically for you whenever you create a PRIMARY KEY or UNIQUE constraint. For reasons best known to the SQL Server development team they decided to expose the UNIQUE keyword as part of the CREATE INDEX syntax, even though the constraint syntax does the same job.
In the interests of clarity and standards support I would recommend you avoid creating UNIQUE indexes explicitly wherever possible. Use the PRIMARY KEY or UNQIUE constraint syntax instead.

The UNIQUE clause specifies that the values in the column(s) must be unique across the table, essentially adding a unique constraint. A clustered index on a table specifies that the ordering of the rows in the table will be the same as the index. A non-clustered index does not change the physical ordering, which is why it is OK to have multiple non-clustered but only one clustered index. You can have unique or non-unique clustered and non-clustered indexes on a table.

I think the underlying question is: what is the difference between unique and non-unique indexes?
The answer is that entries in unique indexes can each only point to a single row, while entries in non-unique indexes can point to many rows.
For example, consider an order item table:
ORDER_NO INTEGER
LINE_NO INTEGER
PRODUCT_NO INTEGER
QUANTITY DECIMAL
- with a unique index on ORDER_NO and LINE_NO, and a non-unique index on PRODUCT_NO.
For a single combination of ORDER_NO and LINE_NO there can only be one entry in the table, while for a single value of PRODUCT_NO there can be many entries in the table (because there will be many entries for that value in the index).

Related

Are there performance differences in queries with UNIQUE NON NULL indexes and Primary keys?

I want to search a DB with either the PK or a unique non null field that is indexed. Are there any performance differences between those? I am using Postgres as my DB. But a general DB-independent answer would be good too.
In postgreSQL, all indexes are secondary or unclustered indexes. That means the the index points to the heap, the data structure holding the actual column data. So, a primary key's index doesn't have any structural advantage over a UNIQUE index: SELECTs using the index for filtering must then bounce over to the heap for the data.
In fact, it might be the other way around, because postgreSQL indexes can have INCLUDES clauses.
For example consider a table with uniqueid, a, b, and c columns. If your workload is heavy with SELECT b FROM tbl WHERE uniqueid = something queries, you can declare this covering index.
CREATE UNIQUE INDEX uniq ON tbl(uniqueid) INCLUDE (b);
Your whole query can then be satisfied from the index. That saves the extra trip to the heap, and so saves IO and CPU time.
MySQL and SQL Server, on the other hand, use clustered indexes for their primary keys. That is, the table's data is stored in the primary key's index. So, the PK is, automatically, basically an index created like this.
CREATE UNIQUE INDEX pk ON tbl(uniqueid) INCLUDE (a, b, c);
In those databases the PK's index does have an advantage over a separate UNIQUE index, which necessarily is a secondary or unclustered index. (Note: MySQL's indexes don't have INCLUDE() clauses.)

SQL primary key on lookup table or unique constraint?

I want to create a lookup table 'orderstatus'. i.e. below, just to clarify this is to be used in a Data Warehouse. I will need to join through OrderStatus to retrieve the INT (if i create one) to be used elsewhere if need be. Like in a fact table for example, I would store the int in the fact table to link to the lookup table.
+------------------------+------------------+
| OrderStatus | ConnectionStatus |
+------------------------+------------------+
| CLOSED | APPROVE |
+------------------------+------------------+
| COMPLETED | APPROVE |
+------------------------+------------------+
| FULFILLED | APPROVE |
+------------------------+------------------+
| CANCELLED | CLOSED |
+------------------------+------------------+
| DECLINED | CLOSED |
+------------------------+------------------+
| AVS_CHECK_SYSTEM_ERROR | CLOSED |
+------------------------+------------------+
What is best practise in terms of primary key/unique key? Should i just create an OrderStatusKey INT as PrimaryKey with identity? Or create a unique constraint on order status (unique)? Thanks.
For this, I would suggest you create an Identity column, and make that the clustered primary key.
It is considered best practice for tables to have a primary key of some kind, but having a clustered index for a table like this is the fastest way to allow for the use of this table in multi table queries ( with joins ).
Here is a sample as to how to add it:
ALTER TABLE dbo.orderstatus
ADD CONSTRAINT PK_orderstatus_OrderStatusID PRIMARY KEY CLUSTERED (OrderStatusID);
GO
Article with more details MSDN
And here is another resource for explaining a primary key Primary Key Primer
If OrderStatus is unique and the primary identifier AND you will be reusing this status code directly in related tables (and not a numeric pointer to this status code) then keep the columns as is and make OrderStatus the primary clustered index.
A little explanation:
A primary key is unique across the table; a clustered index ties all record data back to that index. It is not always necessary to have the primary key also be the clustered index on the table but usually this is the case.
If you are going to be linking to the order status using something other than the status code then create another column of type int as an IDENTITY and make that the primary clustered key. Also add a unique non-clustered index to OrderStatus to ensure that no duplicates could ever be added.
Either way you go every table should have a primary key as well as a clustered index (again, usually they are the same index).
Here are some things to consider:
PRIMARY KEY ensures that there is no NULL values or duplicates in the table
UNIQUE KEY can contain NULL and (by the ANSI standard) any number of NULLs. (This behavior depends on SQL Server settings and possible index filters r not null constraints)
The CLUSTERED INDEX contains all the data related to a row on the leaves.
When the CLUSTERED INDEX is not unique (and not null), the SQL Server will add a hidden GUID to each row.
SQL Server add a hidden GUID column to the key column list when the key columns are not unique to distinguish the individual records)
All indexes are using either values of the key columns of the clustered index or the rowid of a heap table.
The query optimizer uses the index stats to find out the best way to execute a query
For small tables, the indexes are ignore usually, since doing an index scan, then a lookup for each values is more expensive than doing a full table scan (which will read one or two pages when you have really small tables)
Status lookup tables are usually very small and can be stored on one page.
The referencing tables will store the PK value (or unique) in their structure (this is what you'll use to do a join too). You can have a slight performance benefit if you have an integer key to use as reference (aka IDENTITY in SQL Server).
If you usually don't want to list the ConnectionStatus, then using the actual display value (OrderStatus) can be beneficial, since you don't have to join the lookup table.
You can store both values in the referencing tables, but the maintaining both columns have some overhead and more space for errors.
The clustered/non-clustered question depends on the use cases of this table. If you usually use the OrderStatus for filtering (using the textual form), a NON CLUSTERED IDENTITY PK and a CLUESTERED UNIQUE on the OrderStatus can be beneficial. However (as you can read it above), in small tables the effect/performance gain is usually negligible.
If you are not familiar with the above things and you feel it safer, then create an identity clustered PK (OrderKey or OrderID) and a unique non clustered key on the OrderStatus.
Use the PK as referencing/referenced column in foreign keys.
One more thing: if this column will be referenced by only one table, you may want to consider to create an indexed view which contains both table's data.
Also, I would suggest to add a dummy value what you can use if there is no status set (and use it as default for all referencing columns). Because not set is still a status, isn't it?

What is the most efficient strategy for lookups on a large, static table which is already in sorted order (sqlite)?

I have a basic reverse lookup table in which the ids are already sorted in ascending numerical order:
id INT NOT NULL,
value INT NOT NULL
The ids are not unique; each id has from 5 to 25,000 associated values. Each id is independent, i.e., no relationships between the ids.
The table is static. Read only, no inserts or updates ever. The table has 100-200 million records. The database itself will be around 7-12gb. Sqlite.
I will do frequent lookups in this table and want the fastest response time for each query. Lookups are one-direction only, unordered, and always of the form:
SELECT value WHERE id IN (x,y,z)
What advantages does the pre-sorted order give me in terms of database efficiency? What should I do differently than I would with typical unordered tables? How do I tell sql that it's an ordered list?
What about indices: is it necessary or even helpful to create an index on id?
[Updated for clustered comment thanks to Gordon Linoff]. As far as I can tell, sqlite doesn't support clustered indices directly. The wiki says: "Are [clustered indices] supported? No, but if you use INTEGER PRIMARY KEY it acts as a clustered index." In my situation, the column id is not unique...
Assuming that space is not an issue, you should create an index on (id, value). This should be sufficient for your purposes.
However, if the table is static, then I would recommend that you create a clustered index when you create the table. The index would have the same keys, (id, value).
If the table happens to be sorted, the database does not know about this, so you'd still need an index.
It is a better idea to use a WITHOUT ROWID table (what other DBs call a clustered index):
CREATE TABLE MyLittleLookupTable (
id INTEGER,
value INTEGER,
PRIMARY KEY (id, value)
) WITHOUT ROWID;

Unique Constraint in SQL with Multiple NULL Values

I recently read about a way to ensure unique values in a column in SQL while allowing multiple NULLS.
This was done using filtered indexes:
CREATE UNIQUE INDEX indexName ON tableName(columns) INCLUDE includeColumns
WHERE columnName IS NOT NULL
Could someone explains how this actually works? Is the UNIQUE constraint created on the column or not ?
To answer your first question: When the index is filtered, anything that doesn't fix the criteria in the where clause is simply left out of the index.
If the index is unique, the uniqueness is enforced only on the data that fits the criteria in the where clause.
To answer your second question: In Sql server unique constraints are implemented by creating unique indexes under the hood, so there really is not much difference between them. In any case the uniqueness is enforced on an index and not directly on the table column.

Unique clustered index on two columns for an indexed view

I'm trying to setup an indexed view on a table that doesn't have a unique id. It has two unique identifiers that if combined would be unique for it's row. I'm having trouble actually creating the unique clustered index that the indexed view requires when I found an thread on MSDN that folks all agree it is possible to create a unique clustered index out of 2 columns for a indexed view # http://social.msdn.microsoft.com/Forums/en/transactsql/thread/f2c99845-3af1-46e8-9b52-363c24988744
But for the life of me, can't figure out how to create it. I'm rolling with this query, but it doesn't seem to cut it.
CREATE UNIQUE CLUSTERED INDEX [PK] ON MyView
(
MyId1, MyId2
)
Error:
The CREATE UNIQUE INDEX statement terminated because a duplicate key
was found for the object name 'dbo.MyView' and the index name 'PK'.
The duplicate key value is (71cd9b68-1a9e-47bc-bc6b-0008b230a6d8,
0e64aa3a-0631-4caf-82d9-73609ee79b19).
The two IDs listed as duplicates are IDs from MyId2.
So, how could I create a unique clustered index here?
Well the error message seems to suggest that there is more than one record where MyId1 = 71cd9b68-1a9e-47bc-bc6b-0008b230a6d8 and MyId2 = 0e64aa3a-0631-4caf-82d9-73609ee79b19.
I would recommend running a query that selects based only on that criteria and confirming that this only returns one record. If it returns more, then you cannot recreate a UNIQUE constraint on these two columns unless you eliminate the duplicates.