Block parallel inserts in a TABLE - sql

Is there a way to block parallel inserts in a table and not just row-lock level?
The insert is very fast (millisecond level) but I want to have some sort of guarantee that only 1 row can be inserted in a particular millisecond entry.
By design it already makes sure the data will never be inconsistent (see load_id_by_date):
CREATE TABLE my_table
(
load_id uniqueidentifier NOT NULL,
load_date datetime NOT NULL DEFAULT (GETDATE()),
load_id_by_date bigint NOT NULL DEFAULT (CAST(GETDATE() as decimal(19,9)) * 1000000000) UNIQUE,
is_processed bit DEFAULT(0)
PRIMARY KEY (load_id_by_date)
)
But I was just wondering if there is a way to stop parallel inserts from happening from multi-threaded calls. A simple (single threaded) simulation below highlights the issue.
-- TO TEST:
WHILE (1=1)
BEGIN
INSERT INTO my_table (load_id)
SELECT NEWID()
END
will have
Msg 2627, Level 14, State 1, Line 6
Violation of UNIQUE KEY constraint 'UQ__config_l__A307163DB6D0D819'. Cannot insert duplicate key in object 'my_table.config_load_id_toprocess'. The duplicate key value is (43507564143441).
But now I am thinking the approach on timestamp uniqueness might be the wrong way to go by. But the actual calls will not be that fast, with a frequency of 2 seconds fastest but multi-threaded.

Thanks #Mitch Wheat on the XY problem. I have narrowed on what i needed to do.
The load_id_by_int (formerly load_id_by_date) is now generated from a bigint representation of NEWID(). The chances of conflict is now acceptable (at least in my opinion). Thanks for the assistance everyone who commented.
CREATE TABLE my_table
(
load_id uniqueidentifier NOT NULL,
load_date datetime NOT NULL DEFAULT (GETDATE()),
load_id_by_int bigint NOT NULL DEFAULT (ABS(convert(bigint, convert (varbinary(8), NEWID(), 1)))),
is_processed bit DEFAULT(0)
PRIMARY KEY (load_id_by_int)
)
The concept was derived from Convert from UniqueIdentifier to BigInt and Back?

Related

What happens when Identity seed reaches an existent value in a primary key?

I have an identity column that is also the primary key, of INT datatype. Due to the issue discussed here (cache loss), the identity has gaps and I chose to reseed to the previous value. In concrete terms, I have a situation that looks like this:
Table1
ID_PK Field1
---------------
28 'd'
29 'e'
30 'h'
1029 'f'
1030 'g'
I looked around and couldn't find a clear answer to what happens when I make an insertion and the seed reaches the existent value that would break the constraint. Suppose I were to insert values 'x' and 'y' in two separated queries to the table, I can think of the following possibilities:
The identity will be reseeded before the first insertion and I will have both values inserted correctly.
The first insertion will fail, then the column will be reseeded, and only then the second insertion would succeed.
Neither will work and I will have to explicitly call DBCC CHECKIDENT to reseed before inserting values in the table
So, which is it? Or none of the above? Would this behavior be different if I inserted a multi-row result query into Table1? Thanks in advance
For completeness anyway, here's a script you can use to test:
USE Sandbox;
GO
CREATE TABLE test(ID int IDENTITY(1,1) PRIMARY KEY CLUSTERED, string char(1));
GO
INSERT INTO test (string)
VALUES ('a'),('b'),('c'),('d');
GO
SELECT *
FROM test;
GO
DELETE FROM test
WHERE string IN ('b','c');
GO
SELECT *
FROM test;
GO
DBCC CHECKIDENT ('dbo.test', RESEED, 1);
GO
INSERT INTO test (string)
VALUES ('e'),('f');
GO
SELECT *
FROM test;
GO
INSERT INTO test (string)
VALUES ('g');
GO
SELECT *
FROM test;
GO
DROP TABLE test;
Running this script will give you the answer you need. If you wonder why I have used 1 as the RESEED value, this is explained in the documentation:
The following example forces the current identity value in the
AddressTypeID column in the AddressType table to a value of 10.
Because the table has existing rows, the next row inserted will use 11
as the value, that is, the new current increment value defined for the
column value plus 1.
In my script, this means that the next row to be inserted after the RESEED will have a value of 2 for its IDENTITY, not 1 (as rows already existing in the table (ID's 1 and 4)).
As several have said in the comments though, there's really no need to use RESEED on an IDENTITY column. If you need to maintain a sequence, you should (unsurprisingly) be using a SEQUENCE: CREATE SEQUENCE (Transact-SQL)
It depends:
Scenario 1
You get duplicates in the IDENTITY column, as no unique index or PK constraint.
create table I (
id int identity(1,1) not null,
i int null
)
Scenario 2
You get the following error as the inserted value conflicts with the Primary Key constraint:
Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY
constraint 'PK__I__3213E83FE0B0E009'. Cannot insert duplicate key in
object 'dbo.I'. The duplicate key value is (11). The statement has
been terminated.
create table I (
id int identity(1,1) not null primary key,
i int null
)
This proves that IDENTITY on it's own does not guarantee uniqueness, only a UNIQUE CONSTRAINT does that.
To close, turns out it's (2).
First insertion fails, reseed is automatic to the highest value, and only next insertion suceeds. Multi-value insertions behave the same if any of the values would break the primary key constraint.

Ensuring uniqueness of multiple large URL fields in MS SQL

I have a table with the following definition:
CREATE TABLE url_tracker (
id int not null identity(1, 1),
active bit not null,
install_date int not null,
partner_url nvarchar(512) not null,
local_url nvarchar(512) not null,
public_url nvarchar(512) not null,
primary key(id)
);
And I have a requirement that these three URLs always be unique - any individual URL can appear many times, but the combination of the three must be unique (for a given day).
Initially I thought I could do this:
CREATE UNIQUE INDEX uniques ON url_tracker
(install_date, partner_url, local_url, public_url);
However this gives me back the warning:
Warning! The maximum key length is 900 bytes. The index 'uniques' has maximum
length of 3076 bytes. For some combination of large values, the insert/update
operation will fail.
Digging around I learned about the INCLUDE argument to CREATE INDEX, but according to this question converting the command to use INCLUDE will not enforce uniqueness on the URLs.
CREATE UNIQUE INDEX uniques ON url_tracker (install_date)
INCLUDE (partner_url, local_url, public_url);
How can I enforce uniqueness on several relatively large nvarchar fields?
Resolution
So from the comments and answers and more research I'm concluding I can do this:
CREATE TABLE url_tracker (
id int not null identity(1, 1),
active bit not null,
install_date int not null,
partner_url nvarchar(512) not null,
local_url nvarchar(512) not null,
public_url nvarchar(512) not null,
uniquehash AS HashBytes('SHA1',partner_url+local_url+public_url) PERSISTED,
primary key(id)
);
CREATE UNIQUE INDEX uniques ON url_tracker (install_date,uniquehash);
Thoughts?
I would make a computed column with the hash of the URLs, then make a unique index/constraint on that. Consider making the hash a persisted computed column. It shouldn't have to be recalculated after insertion.
Following the ideas from the conversation in the comments. Assuming that you can change the datatype of the URL to be VARCHAR(900) (or NVARCHAR(450) if you really think you need Unicode URLs) and be happy with the limitation on the length of the URL, this solution could work. This also assumes SQL Server 2008 or better. Please, always specify what version you're working with; sql-server is not specific enough, since solutions can vary greatly depending on the version.
Setup:
USE tempdb;
GO
CREATE TABLE dbo.urls
(
id INT IDENTITY(1,1) PRIMARY KEY,
url VARCHAR(900) NOT NULL UNIQUE
);
CREATE TABLE dbo.url_tracker
(
id INT IDENTITY(1,1) PRIMARY KEY,
active BIT NOT NULL DEFAULT 1,
install_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP,
partner_url_id INT NOT NULL REFERENCES dbo.urls(id),
local_url_id INT NOT NULL REFERENCES dbo.urls(id),
public_url_id INT NOT NULL REFERENCES dbo.urls(id),
CONSTRAINT unique_urls UNIQUE
(
install_date,partner_url_id, local_url_id, public_url_id
)
);
Insert some URLs:
INSERT dbo.urls(url) VALUES
('http://msn.com/'),
('http://aol.com/'),
('http://yahoo.com/'),
('http://google.com/'),
('http://gmail.com/'),
('http://stackoverflow.com/');
Now let's insert some data:
-- succeeds:
INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id)
VALUES (1,2,3), (2,3,4), (3,4,5), (4,5,6);
-- fails:
INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id)
VALUES(1,2,3);
GO
/*
Msg 2627, Level 14, State 1, Line 3
Violation of UNIQUE KEY constraint 'unique_urls'. Cannot insert duplicate key
in object 'dbo.url_tracker'. The duplicate key value is (2011-09-15, 1, 2, 3).
The statement has been terminated.
*/
-- succeeds, since it's for a different day:
INSERT dbo.url_tracker(install_date, partner_url_id, local_url_id, public_url_id)
VALUES('2011-09-01',1,2,3);
Cleanup:
DROP TABLE dbo.url_tracker, dbo.urls;
Now, if 900 bytes is not enough, you could change the URL table slightly:
CREATE TABLE dbo.urls
(
id INT IDENTITY(1,1) PRIMARY KEY,
url VARCHAR(2048) NOT NULL,
url_hash AS CONVERT(VARBINARY(32), HASHBYTES('SHA1', url)) PERSISTED,
CONSTRAINT unique_url UNIQUE(url_hash)
);
The rest doesn't have to change. And if you try to insert the same URL twice, you get a similar violation, e.g.
INSERT dbo.urls(url) SELECT 'http://www.google.com/';
GO
INSERT dbo.urls(url) SELECT 'http://www.google.com/';
GO
/*
Msg 2627, Level 14, State 1, Line 1
Violation of UNIQUE KEY constraint 'unique_url'. Cannot insert duplicate key
in object 'dbo.urls'. The duplicate key value is
(0xd111175e022c19f447895ad6b72ff259552d1b38).
The statement has been terminated.
*/

Zero As Primary Key In SQL Server 2008

Is it possible to insert 0 in the primary key field of a table in SQL server 2008?
As long it's a numeric field, yes... follow along at home!
create table TestTable
(
TestColumn int not null primary key
)
insert TestTable values(0)
The primary key restriction only requires that the value be unique and the column not be nullable.
For an identity field:
create table TestTable
(
TestColumn int identity(1, 1) not null primary key --start at 1
)
set identity_insert TestTable on
insert TestTable (TestColumn) values (0) --explicitly insert 0
set identity_insert TestTable off
The identity(1, 1) means "start at one and increment by one each time something is inserted". You could have identity(-100, 10) to start at -100 and increment by 10 each time. Or you could start at 0. There's no restriction.
You can generally answer questions like these for yourself by just trying them and seeing if they work. This is faster and usually more beneficial than asking on StackOverflow.
Yes, it can be zero. The value can be from −2,147,483,648 to 2,147,483,647, from −(2^31) to 2^31 − 1, the full range of an unsigned integer.
If you expect a lot of records, like up to 4.3 billion, it makes sense to start from the smallest value, and work your way up.
CREATE TABLE TestTable
(
TestColumn INT IDENTITY(−2,147,483,648, 1) NOT NULL PRIMARY KEY --start at 1
)

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

Can "auto_increment" on "sub_groups" be enforced at a database level?

In Rails, I have the following
class Token < ActiveRecord
belongs_to :grid
attr_accessible :turn_order
end
When you insert a new token, turn_order should auto-increment. HOWEVER, it should only auto-increment for tokens belonging to the same grid.
So, take 4 tokens for example:
Token_1 belongs to Grid_1, turn_order should be 1 upon insert.
Token_2 belongs to Grid_2, turn_Order should be 1 upon insert.
If I insert Token_3 to Grid_1, turn_order should be 2 upon insert.
If I insert Token_4 to Grid_2, turn_order should be 2 upon insert.
There is an additional constraint, imagine I execute #Token_3.turn_order = 1, now #Token_1 must automatically set its turn_order to 2, because within these "sub-groups" there can be no turn_order collision.
I know MySQL has auto_increment, I was wondering if there is any logic that can be applied at the DB level to enforce a constraint such as this. Basically auto_incrementing within sub-groups of a query, those sub-groups being based on a foreign key.
Is this something that can be handled at a DB level, or should I just strive for implementing rock-solid constraints at the application layer?
If i understood your question properly then you could use one of the following two methods (innodb vs myisam). Personally, I'd take the innodb road as i'm a fan of clustered indexes which myisam doesnt support and I prefer performance over how many lines of code I need to type, but the decision is yours...
http://dev.mysql.com/doc/refman/5.0/en/innodb-table-and-index.html
Rewriting mysql select to reduce time and writing tmp to disk
full sql script here : http://pastie.org/1259734
innodb implementation (recommended)
-- TABLES
drop table if exists grid;
create table grid
(
grid_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
next_token_id int unsigned not null default 0
)
engine = innodb;
drop table if exists grid_token;
create table grid_token
(
grid_id int unsigned not null,
token_id int unsigned not null,
name varchar(255) not null,
primary key (grid_id, token_id) -- note clustered PK order (innodb only)
)
engine = innodb;
-- TRIGGERS
delimiter #
create trigger grid_token_before_ins_trig before insert on grid_token
for each row
begin
declare tid int unsigned default 0;
select next_token_id + 1 into tid from grid where grid_id = new.grid_id;
set new.token_id = tid;
update grid set next_token_id = tid where grid_id = new.grid_id;
end#
delimiter ;
-- TEST DATA
insert into grid (name) values ('g1'),('g2'),('g3');
insert into grid_token (grid_id, name) values
(1,'g1 t1'),(1,'g1 t2'),(1,'g1 t3'),
(2,'g2 t1'),
(3,'g3 t1'),(3,'g3 t2');
select * from grid;
select * from grid_token;
myisam implementation (not recommended)
-- TABLES
drop table if exists grid;
create table grid
(
grid_id int unsigned not null auto_increment primary key,
name varchar(255) not null
)
engine = myisam;
drop table if exists grid_token;
create table grid_token
(
grid_id int unsigned not null,
token_id int unsigned not null auto_increment,
name varchar(255) not null,
primary key (grid_id, token_id) -- non clustered PK
)
engine = myisam;
-- TEST DATA
insert into grid (name) values ('g1'),('g2'),('g3');
insert into grid_token (grid_id, name) values
(1,'g1 t1'),(1,'g1 t2'),(1,'g1 t3'),
(2,'g2 t1'),
(3,'g3 t1'),(3,'g3 t2');
select * from grid;
select * from grid_token;
My opinion: Rock-solid constraints at the app level. You may get it to work in SQL -- I've seen some people do some pretty amazing stuff. A lot of SQL logic used to be squirreled away in triggers, but I don't see much of that lately.
This smells more like business logic and you absolutely can get it done in Ruby without wrapping yourself around a tree. And... people will be able to see the tests and read the code.
This to me sounds like something you'd want to handle in an after_save method or in an observer. If the model itself doesn't need to be aware of when or how something increments then I'd stick the business logic in the observer. This approach will make the incrementing logic more expressive to other developers and database agnostic.