Simplest example: I've got the following table:
create table test
( ID int identity (1,1) not null primary key,
text char(20) not null
)
I have already created 3 values:
(1,a)
(2,b)
(3,c)
Now I delete (2,b), and my rows are (1,a) and (3,c).
Is it possible to make it automatically (1,a) and (2,c)?
Or do I have to create a procedure?
No, and you shouldn't. There are too many ways that this can go bad - suppose you had related records in another table with 2 as a foreign key - instead of being orphaned (and easily identified as such) they would instead be related to a different (incorrect) record.
In addition, IDENTITY values are usually used as the clustered index (meaning they determine the physical storage location of the record). If you change the value, the data will have to be physically relocated -which wont hurt anything but will cause unnecessary I/O.
IDENTITY values are not guaranteed to be consecutive, they're guaranteed to be unique. If you need consecutive numbers, the best way is to derive that in your output by adding a ROW_NUMBER column:
SELECT
ROW_NUMBER() OVER (ORDER BY ID) RowNum,
ID,
text
FROM test
However note that RowNum will be DIFFERENT if records in the middle are deleted (as you could guess). So you can't build any relationships off of that value - you can just use it to show consecutive ordering.
Related
Why does CockroachDB add a rowid column to my tables? They are INT values and do not look ordered, does this column give up the sort order and/or impact how range scans work?
CockroachDB automatically adds a rowid column that serves as a primary key if no primary key is specified for the table. rowid values are generated as a combination of the insert timestamp and the ID of the node executing the statement, as such, ordering is maintained.
To create your own rowid, two functions are commonly used:
unique_rowid(): ensures a unique integer for a primary key, value always increases
unordered_unique_rowid(): ensures a unique integer for a primary key but the rowid value does not always increase. Having rowid values that do not always increase helps divide the key-space more evenly, preventing range hotspots.
Helpful docs from CockroachDB:
Create a table
Auto-generate unique row ids
ID generation functions
Helpful Blog Post:
CockroachDB Key Generation Part 3 - Unordered RowID
I have an excel sheet with several values which I imported into SQL (book1$) and I want to transfer the values into ProcessList. Several rows have the same primary keys which is the ProcessID because the rows contain original and modified values, both of which I want to keep. How do I make SQL ignore the duplicate primary keys?
I tried the IGNORE_DUP_KEY = ON but for rows with duplicated primary key, only 1 the latest row shows up.
CREATE TABLE dbo.ProcessList
(
Edited varchar(1),
ProcessId int NOT NULL PRIMARY KEY WITH (IGNORE_DUP_KEY = ON),
Name varchar(30) NOT NULL,
Amount smallmoney NOT NULL,
CreationDate datetime NOT NULL,
ModificationDate datetime
)
INSERT INTO ProcessList SELECT Edited, ProcessId, Name, Amount, CreationDate, ModificationDate FROM Book1$
SELECT * FROM ProcessList
Also, if I have a row and I update the values of that row, is there any way to keep the original values of the row and insert a clone of that row below, with the updated values and creation/modification date updated automatically?
How do I make SQL ignore the duplicate primary keys?
Under no circumstances can a transaction be committed that results in a table containing two distinct rows with the same primary key. That is fundamental to the nature of a primary key. SQL Server's IGNORE_DUP_KEY option does not change that -- it merely affects how SQL Server handles the problem. (With the option turned on it silently refuses to insert rows having the same primary key as any existing row; otherwise, such an insertion attempt causes an error.)
You can address the situation either by dropping the primary key constraint or by adding one or more columns to the primary key to yield a composite key whose collective value is not duplicated. I don't see any good candidate columns for an expanded PK among those you described, though. If you drop the PK then it might make sense to add a synthetic, autogenerated PK column.
Also, if I have a row and I update the values of that row, is there any way to keep the original values of the row and insert a clone of that row below, with the updated values and creation/modification date updated automatically?
If you want to ensure that this happens automatically, however a row happens to be updated, then look into triggers. If you want a way to automate it, but you're willing to make the user ask for the behavior, then consider a stored procedure.
try this
INSERT IGNORE INTO ProcessList SELECT Edited, ProcessId, Name, Amount, CreationDate, ModificationDate FROM Book1$
SELECT * FROM ProcessList
You drop the constraint. Something like this:
alter table dbo.ProcessList drop constraint PK_ProcessId;
You need to know the constraint name.
In other words, you can't ignore a primary key. It is defined as unique and not-null. If you want the table to have duplicates, then that is not the primary key.
I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram
The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.
It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres
In Microsoft SQL Server, when creating tables, are there any downsides to using a unique constraint on a column even though you don't really need it to be unique?
An example would be descriptions for say a role in a user management system:
CREATE TABLE Role
(
ID TINYINT PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title CHARACTER VARYING(32) NOT NULL UNIQUE,
Description CHARACTER VARYING(MAX) NOT NULL UNIQUE
)
My fear is that validating this constraint when doing frequent insertions in other tables will be a very time consuming process. I am unsure as to how this constraint is validated, but I feel like it could be done in a very efficient way or as a linear comparison.
Your fear becomes true: UNIQUE constraint are implemented as indices, and this is time and space consuming.
So, whenever you insert a new row, the database have to update the table, and also one index for each unique constraint.
So, according to you:
using a unique constraint on a column even though you don't really need it to be unique
the answer is no, don't use it. there are time and space downsides.
Your sample table would need a clustered index for the Id, and 2 extra indices, one for each unique constraint. This takes up space, and time to update the 3 indices on the inserts.
This would only be justified if you made queries filtering by those fields.
BY THE WAY:
The original post sample table have several flaws:
that syntax is not SQL Server syntax (and you tagged this as SQL Server)
you cannot create an index in a varchar(max) column
If you correct the syntax and create this table:
CREATE TABLE Role
(
ID tinyint PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title varchar(32) NOT NULL UNIQUE,
Description varchar(32) NOT NULL UNIQUE
)
You can then execute sp_help Role and you'll find the 3 indices.
The database creates an index which backs up the UNIQUE constraint, so it should be very low-cost to do the uniqueness check.
http://msdn.microsoft.com/en-us/library/ms177420.aspx
The Database Engine automatically creates a UNIQUE index to enforce the uniqueness requirement of the UNIQUE constraint. Therefore, if an attempt to insert a duplicate row is made, the Database Engine returns an error message that states the UNIQUE constraint has been violated and does not add the row to the table. Unless a clustered index is explicitly specified, a unique, nonclustered index is created by default to enforce the UNIQUE constraint.
Is it typically a good practice to constrain it if you know the data
will always be unique but it doesn't necessarily need to be unique for
the application to function correctly?
My question to you: would it make sense for two roles to have different titles but the same description? e.g.
INSERT INTO Role ( Title , Description )
VALUES ( 'CEO' , 'Senior manager' ),
( 'CTO' , 'Senior manager' );
To me it would seem to devalue the use of the description; if there were many duplications then it might make more sense to do something more like this:
INSERT INTO Role ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
INSERT INTO SeniorManagers ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
But then again you are not expecting duplicates.
I assume this is a low activity table. You say you fear validating this constraint when doing frequent insertions in other tables. Well, that will not happen (unless there is a trigger we cannot see that might update this table when another table is updated).
Personally, I would ask the designer (business analyst, whatever) to justify not applying a unique constraint. If they cannot then I would impose the unqiue constraint based on common sense. As is usual for such a text column, I would also apply CHECK constraints e.g. to disallow leading/trailing/double spaces, zero-length string, etc.
On SQL Server, the data type tinyint only gives you 256 distinct values. No matter what you do outside of the id column, you're not going to end up with a very big table. It will surely perform quickly even with a dozen indexed columns.
You usually need at least one unique constraint besides the surrogate key, though. If you don't have one, you're liable to end up with data like this.
1 First title First description
2 First title First description
3 First title First description
...
17 Third title Third description
18 First title First description
Tables that permit data like that are usually wrong. Any table that uses foreign key references to this table won't be able to report correctly, say, the number of "First title" used.
I'd argue that allowing multiple, identical titles for roles in a user management system is a design error. I'd probably argue that "title" is a really bad name for that column, too.
CREATE TABLE SupplierQuote
(
supplierQuoteID int identity (3504,2) CONSTRAINT supquoteid_pk PRIMARY KEY,
PONumber int identity (9553,20) NOT NULL
.
.
.
CONSTRAINT ponumber_uq UNIQUE(PONumber)
);
The above ddl produces an error:
Msg 2744, Level 16, State 2, Line 1
Multiple identity columns specified
for table 'SupplierQuote'. Only one
identity column per table is allowed.
How can i solve it? I want PONumber to be auto-incremented.
If SupplierQuoteId and PONumber are generated when a row is inserted, then the two "identity" columns would be assigned in lockstep (3504 goes with 9553, 3506 goes with 9573, 3508 goes with 9593, etc.). If this assumption is true, then you presumably could make PONumber a calculated column, like so:
CREATE TABLE SupplierQuote
(
supplierQuoteID int NOT NULL identity (3504,2) CONSTRAINT supquoteid_pk PRIMARY KEY,
PONumber AS (10 * supplierQuoteID - 25487)
.
.
.
);
I made supplierQuoteId NOT NULL, which ensures that PONumber will also be NOT NULL. Similarly, you no longer need the unique constraint on PONumber, as it will always be unique. (It is possible to build indexes on calculated columns, if you need one for performance.)
You can't have more than one identity column per table. I think your best bet would be to pull the PO data into a separate table, then relate the two with a FK column.
SupplierQuote
-------------
supplierQuoteID (PK/identity)
purchaseOrderID (FK to PurchaseOrder.purchaseOrderID)
otherColumn1
PurchaseOrder
-------------
purchaseOrderID (PK/identity)
otherColumn1
You can't solve you - you can only have a single IDENTITY column per table. No way around that, sorry.
The only "hackish" solution would be to have a separate table for nothing more than having an INT IDENTITY field, and grabbing the newest value from that helper table into your entity upon insertion (e.g. with a trigger). Not very pretty, but it might work for you.
If there is only one PO id per supplier quote, then why not simply use the supplier quote id as the PO id?
If there can be more than one, you must have a sepapate table with a foreign key constraint. You can of course use cascade delete to delete from this table but this can be dangerous if you delete too many records (causing lockups) or personally I wouldn't want to delete a supplier quote if a PO number has been created as that means the item quoted was actually bought. You do not want to ever destroy records of things that were actually purchased. Since you will likely have multiple POS (I got a quote on six things and first bought three of them, then bought two others the next week) per quote and since it is likely you will want to store specific information about the purchase order, I recommend a separate table. To do anything else is going to cause you problems in the long run.
I think I'd use a trigger to fill the "second identity".
Circumvent auto increment in non identity column.(MS SQL) I don't think this is the best practice though! JUst a quick fix solution.
INSERT INTO [dbo].[Employee]
([EmpID]
,[Name]
,[Salary]
,[Address]
,[datecoded])
VALUES
( (select top 1 EmpID from dbo.Employee order by EmpID desc) + 1
, 'name_value'
, 123456
,'address_value'
, GETDATE())