If I add the same row to database 999999 times, does the sql engine recognize it? - sql-server-express

I use sql server express 2012. If I add the same row to database 999999 times does the sql engine recgonize it and not
add this data 99999 times because its the same data and under the hood
it just save the id and point the first record to save space?

It depends as to how you have defined your table structure. If there is a primary key or unique(which does not appear to be present as per your statement) key then it will not allow. Else it will store all the data in the table. Also it does not make sense to store that huge amount of duplicates in your table. :)
Also check this:
SQL server will allow you to create a clustered index on non unique
column, but uniqueness is one of the most desirable attribute for any
indexes especially for a clustered index.Even if SQL server allow to
create clustered index on a non unique column , internally SQL server
add 4 bytes value for all duplicate instance of clustered key and this
4 bytes variable length column is known as uniquiifier .In this case
SQL server consider the clustered key as the combination of non unique
column on which clustered index is defined and internally generated
uniquifier column.This value will be stored where ever the clustered
key to be stored.

Related

New IDENTITY column on existing table - what will be the order?

I added an IDENTITY column to an existing table in SQL Server 2012. Expected the numbers to be generated in the order of clustered index (non-sequential GUID), but to my surprise they got generated in the order of one of the indexes which is not even unique, and also coincidentally exactly the order I wanted!
Can anyone explain this? Here are the details of the table:
id - guid (non-sequential), clustered index, primary key
eventdate - bigint (unix date), not null, non-unique index
more columns, some indexed, all indexes non-unique
Identity values got assigned in the order of eventdate. I even found a few examples where several rows had the same eventdate, and they always had sequential identity numbers.
MSDN says that the order in which identity values are generated for the new column of existing table is not defined.
IDENTITY
Specifies that the new column is an identity column. The
SQL Server Database Engine provides a unique, incremental value for
the column. When you add identifier columns to existing tables, the
identity numbers are added to the existing rows of the table with the
seed and increment values. The order in which the rows are updated
is not guaranteed. Identity numbers are also generated for any new
rows that are added.
So, you'd better check thoroughly that you got new IDENTITY values in the order that you need. Check all rows of the table.
Edit
"The order is not guaranteed" doesn't mean that it is random, it just means that optimizer is free to pick any method to scan the table. In your system it apparently picked that index on eventdate (maybe it has the least amount of pages), but on another hardware or another version of the server the choice may change and you should not rely on it. Any change to the table structure or indexes may change the choice as well. Most likely the optimizer's decision is deterministic (i.e. not random), but it is not disclosed in the docs and may depend on many internal things and may change at any time.
Your result is not unexpected. Identity values were assigned in some unspecified order, which coincided with the order of the index.

Deciding on a primary key according to value size in SQL Server

I want to ask a question to optimize SQL Server performance. Assume I have an entity - say Item - and I must assign a primary key for it. It has columns and two of them are expected to be unique, one of them is expected to be bigger than the other as tens of characters.
How should I decide primary key?
Should one of them be PK, if so which one, or both, or should I create an Identity number as PK? This is important for me because the entity "Item" would have relations with some other entities and I think the complexity of PK would affect the performance of SQL Server queries.
Personally, I would go with an IDENTITY Primary Key with unique constraints on both the mentioned unique keys and indexes for additonal lookups.
You have to remember that by default SQL Server creates the primary key as the clustered index, which impacts how it is stored on disc. If the new ITEMS came in at random, variance there could be a lot of fragmentation on either the primary keys.
Also, unless cascades and foreign keys are switched on, you would have to manually maintain the relational integrety of the data (unless you use IDENTITY)
Well, the primary key is really only used to uniquely identify each row - so the only requirements for it are: it has to be unique and typically also should not contain NULL.
Anything else is most likely more relevant for the clustering key in SQL Server - the column (or set of columns) by which the data is physically ordered on disk. By default, the primary key is also the clustering key in SQL Server.
The clustering key is the most important choice in SQL Server because it has far reaching performance implications. A good clustering key is
narrow
unique
stable
if possible ever-increasing
It has to be unique so that it can be added to each and every single nonclustered index for lookup into the actual data tables - if you pick a non-unique column (or set of columns), SQL Server will add a 4-byte "uniquefier" for you.
It should be as narrow as possible, since it's stored in a lot of places. Try to stick to 4 bytes for an INT or 8 bytes for a BIGINT - avoid long and variable length VARCHAR columns since those are both too wide, and the variable length also carries additional overhead. Because of this, sets of columns are also rather rarely a good choice.
The clustering key should be stable - value shouldn't change over time - since every time a value changes, potentially a lot of index entries (in the clustered index itself, and every single nonclustered index, too) need to be updated which causes a lot of unnecessary overhead.
And if it's ever-increasing (like an INT IDENTITY), you also can avoid most page splits - an extremely expensive and involved procedure that happens if you use random values (like GUID's) as your clustering key.
So in brief: an INT IDENTITY is ideal - GUIDs, variable length strings, or combinations of columns are typically less of a good choice.
Choose the one you will use to identify the records in queries and joins to other tables. Size is relative, and whilst a consideration usually not an issue since the PK will be indexed and the other unique column can make use also of a unique index.
The uniqueidentifier data type for e.g. is a 36 character long string representation and performs fine as a primary key under the majority of circumstances.

What number of "Unique Keys" can be created in a Table in SQL Server?

As we know we can have only one Primary key in a table, so is there any way to know the maximum number of Unique keys in a table.I have read a post according to which no of Unique keys depend on number of clustered index.
Check this link
http://www.orafaq.com/forum/t/152739/
If this SQL Server then check this Maximum Capacity Specifications for SQL Server
I think you are confusing youself with unique constraints and clustered constraints.
Unique key is a key by which you can easily tell which exatcly record you are dealing with.
Clustered key is a key which handles the physical position of the record on the hard disk.
So you can have many unique constraints for your record, but only the one which will order the records on the disk. Please refer the link mentioned by #user3414693, if you are using the MS SQL server:
Nonclustered indexes per table:
999 (Maximum sizes/numbers SQL Server (32-bit))
999 Maximum sizes/numbers SQL Server (64-bit)
You should note that here are all the indexes for the table, not only the unique ones.
Can't find specific information for the Oracle right now.
PS: Also you should note that having the primary key as clustered key can be a serious performance problem for the huge tables if you have a primary key a uniqueidentifier-type of column. Uniqueidentifier for real is a very-very big number, and it isn't being sorted like IDENTITY columns, so it is possible to face a situation during adding the new record when you have to move physically all the table data.

Unique Key or Index with 'Is Unique'

I'm having a rather silly problem. I'll simplify the situation: I have a table in SQL Server 2008 R2 where I have a field 'ID' (int, PK) and a Name (nvarchar(50)) and Description (text) field. The values in the Name - field should be Unique. When searching the table, the Name - field will be used so performance is key here.
I have been looking for 2 hours on the internet to completely understand the differences between Unique Key, Primary Key, Unique Index and so on, but it doesn't help me solve my problem about what key/constraint/index I should use.
I'm altering the tables in SQL Server Management Studio. My question for altering that Name - field is: should I use "Type = Index" with "Is Unique = Yes" or use "Type = Unique Key"?
Thanks in advance!
Kind regards,
Abbas
A unique key and a primary key are both logical constraints. They are both backed up by a unique index. Columns that participate in a primary key are not allowed to be NULL-able.
From the point of view of creating a Foreign Key the unique index is what is important so all three options will work.
Constraint based indexes have additional metadata stored that regular indexes don't (e.g. create_date in sys.objects). Creating a non constraint based unique index can allow greater flexibility in that it allows you to define included columns in the index definition for example (I think there might be a few other things also).
A unique key cannot have the same value as any other row of a column in a table. A primary key is the column field(s) that is a unique key and not null which is used as the main look up mechanism (meaning every table should have a primary key as either a column or combination of columns that represent a unique entry).
I haven't really used indexes much, but I believe it follows the same logic.
See http://en.wikipedia.org/wiki/Unique_key for more information.
An index is a collection the DBMS uses to organize your table data efficiently. Usually you want to create an index on columns and groups of columns that you frequently search on. For example, if you have a column 'name' and you are searching your table where name = '?' and index on that column will create separate storage that orders that table so searching for a record by name is fast. Typically primary keys are automatically indexed.
Of course the above is a bit too general and you should consider profiling queries before and after adding an index to ensure it's being used and speeding things up. There are quiet a few subtleties to indexes that make the application specific. They take extra storage and time to build and maintain so you always want to be judicious about adding them.
Hope this helps.

Slow progress when adding sequential identity column

We have 8 million row table and we need to add a sequential id column to it. It is used for data warehousing.
From testing, we know that if we remove all the indexes, including the primary key index, adding a new sequential id column was like 10x faster. I still haven't figure out why dropping the indexes would help adding a identity column.
Here is the SQL that add identity column:
ALTER TABLE MyTable ADD MyTableSeqId BIGINT IDENTITY(1,1)
However, the table in question has dependencies, thus I cannot drop the primary key index unless I remove all the FK constraints. As a result adding identity column.
Is there other ways to improve the speed when adding a identity column, so that client down time is minimal?
or
Is there a way to add an identity column without locking the table, so that table can be access, or at least be queried?
The database is SQL Server 2005 Standard Edition.
Adding a new column to a table will acquire a Sch-M (schema modification) lock, which prevents all access to the table for the duration of the operation.
You may get some benefit from switching the database into bulk-logged or simple mode for the duration of the operation, but of course, do so only if you're aware of the effects this will have on your backup / restore strategy.