How can I insert random values into a SQL Server table? - sql

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:
stackoverflow.com/.../update-sql-table-with-random-value-from-other-table
When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.
When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:
SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once
The random value's seed is the same on every record the query updates
I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?
This is the code I'm using:
DECLARE #randomStuff TABLE ([id] INT, [val] VARCHAR(100))
INSERT INTO #randomStuff ([id], [val])
VALUES ( 1, 'Test Value 1' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 2, 'Test Value 2' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 3, 'Test Value 3' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 4, 'Test Value 4' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 5, 'Test Value 5' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 6, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 7, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 8, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 9, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 10, null )
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())

When the query engine sees this...
(SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())
... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"
You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn, but it ain't smart enough to see through this.
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 val
FROM #randomStuff r
INNER JOIN MyTable _MT
ON M.Id = _MT.Id
ORDER BY NEWID())
FROM MyTable M
By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.
A cursor would probably be just as fast, but is most certainly not as fun.

use a cross join to generate random data

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.
Once #randomStuff is set up, we do this (note in my case, #MyTable is a table variable, adjust accordingly for your normal table):
DECLARE #randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)
INSERT INTO #randomMappings
SELECT M.id, val, NEWID() AS sort
FROM #MyTable AS M
CROSS JOIN #randomstuff
so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then
DELETE others FROM #randomMappings AS others
INNER JOIN #randomMappings AS lower
ON (lower.id = others.id) AND (lower.sorter < others.sorter)
This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:
UPDATE #MyTable
SET MyColumn = random.val
FROM #MyTable m, #randomMappings AS random
WHERE (random.id = m.id)
And you're done!
I said it was hacky...

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.
then you would have
UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()

There is no optimization going on here.
Your using a subquery that selects a single value, there is nothing to optimize.
You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID()
WHERE MyTable.MyColumn = MyTable.MyColumn )

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.
In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:
DECLARE #rows INT
DECLARE #currentRow INT
SELECT #rows = COUNT(*) FROM dbo.MyTable
SET #currentRow = 1
WHILE #currentRow < #rows
BEGIN
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
FROM MyTable a) AS b
WHERE #currentRow = b.rownumber
)
SET #currentRow = #currentRow + 1
END

Related

Update Id on INSERT INTO SELECT FROM statement

i want to copy specific columns from Table [From] to Table [To] but also want to insert the foreign key from [To] in [From]
Table definitions:
[From]
Id (int)
pic (varbinary(MAX))
picId (int)
[To]
Id (int)
pic (varbinary(MAX))
My copy query works perfectly but i dont know how to update the "picId" column inside of the Table [From]
INSERT INTO dbo.[To] (Id,pic)
SELECT
isnull(T.m, 0) + row_number() over (order by F.pic),
F.pic
FROM dbo.[From] AS F
outer apply (select max(pic) as m from dbo.[To]) as T;
Now i want to copy the inserted [To].Id to [From].picId.
Can anyone help me please?
2 changes would make your solution much easier, but assuming you can't control anything about the dbo.[TO] table, here is a solution that will work for you.
The first improvement would be making dbo.[to].ID an identity column. Then you could drop your whole "row_number() over" line and let SQL manage the ID. What you're doing works, but it's like cutting wood with a chain saw by dragging the (not running) saw back and forth over the wood. You can do it, but it's a lot easier if you start the engine and let it do the work.
The second change is adding a column dbo.[to].FromID and populating it when you insert the row. The OUTPUT statement can only reference fields in the row being inserted (or deleted, but that's not relevant here) so you can't get the ID of the row in dbo.[from] that you want to update unless you have it in the row in dbo.[to] you just inserted. If you do this, you can use a plain old INSERT with an OUTPUT clause. The trick here is using a MERGE statement, and you can see a full explanation here: Is it possible to for SQL Output clause to return a column not being inserted? I strongly urge you to upvote that answer if you find this useful. I could not have provided you with this without that answer!
Anyway, here is the solution:
--Create some fake data, you'll already have dbo.[From]
CREATE TABLE #TestFrom (FromID INT, Pic nvarchar(1000), ToID INT NULL)
CREATE TABLE #TestTo (ToID INT, Pic nvarchar(1000))
--It would be much easier if your TO used an IDENTITY column instead of managing the ID manually
CREATE TABLE #TestToIdent (ToID INT IDENTITY(1,1), Pic nvarchar(1000))
INSERT INTO #TestFrom
VALUES (1, 'Test 1', NULL)
, (3, 'Test 3', NULL)
, (7, 'Test 7', NULL)
, (13, 'Test 13', NULL)
--Define a table variable to hold your OUTPUT, you'll need this
DECLARE #Mapping as table (FromID INT, ToID INT);
with cteIns as (
SELECT
isnull(T.m, 0) + row_number() over (order by F.pic) as ToID
, F.pic, F.FromID
FROM #TestFrom AS F
outer apply (select max(ToID) as m from #TestTo ) as T
) --From https://stackoverflow.com/questions/10949730/is-it-possible-to-for-sql-output-clause-to-return-a-column-not-being-inserted
MERGE INTO #TestTo as T --Put results here
USING cteIns as F --Source is here
on 0=1 --But this is never true so never merge, just INSERT
WHEN NOT MATCHED THEN --ALWAYS because 0 never = 1
INSERT (ToID, pic)
VALUES (F.ToID, F.pic)
OUTPUT F.FromID, inserted.ToID
INTO #Mapping (FromID, ToID );
--SELECT * FROM #Mapping --Test the mapping if you're curious
--SELECT * FROM #TestTo --Test the insedert if you're curious
--Update the dbo.[FROM] with the ID of the [TO] that got inserted
UPDATE #TestFrom SET ToID = M.ToID
FROM #TestFrom as F
INNER JOIN #Mapping as M
ON M.FromID = F.FromID
--View the results
SELECT * FROM #TestFrom
DROP TABLE #TestFrom
DROP TABLE #TestTo
DROP TABLE #TestToIdent

Left join with nearest value without duplicates

I want to achieve in MS SQL something like below, using 2 tables and through join instead of iteration.
From table A, I want each row to identify from table B which in the list is their nearest value, and when value has been selected, that value cannot re-used. Please help if you've done something like this before. Thank you in advance! #SOreadyToAsk
Below is a set-based solution using CTEs and windowing functions.
The ranked_matches CTE assigns a closest match rank for each row in TableA along with a closest match rank for each row in TableB, using the index value as a tie breaker.
The best_matches CTE returns rows from ranked_matches that have the best rank (rank value 1) for both rankings.
Finally, the outer query uses a LEFT JOIN from TableA to the to the best_matches CTE to include the TableA rows that were not assigned a best match due to the closes match being already assigned.
Note that this does not return a match for the index 3 TableA row indicated in your sample results. The closes match for this row is TableB index 3, a difference of 83. However, that TableB row is a closer match to the TableA index 2 row, a difference of 14 so it was already assigned. Please clarify you question if this isn't what you want. I think this technique can be tweaked accordingly.
CREATE TABLE dbo.TableA(
[index] int NOT NULL
CONSTRAINT PK_TableA PRIMARY KEY
, value int
);
CREATE TABLE dbo.TableB(
[index] int NOT NULL
CONSTRAINT PK_TableB PRIMARY KEY
, value int
);
INSERT INTO dbo.TableA
( [index], value )
VALUES ( 1, 123 ),
( 2, 245 ),
( 3, 342 ),
( 4, 456 ),
( 5, 608 );
INSERT INTO dbo.TableB
( [index], value )
VALUES ( 1, 152 ),
( 2, 159 ),
( 3, 259 );
WITH
ranked_matches AS (
SELECT
a.[index] AS a_index
, a.value AS a_value
, b.[index] b_index
, b.value AS b_value
, RANK() OVER(PARTITION BY a.[index] ORDER BY ABS(a.Value - b.value), b.[index]) AS a_match_rank
, RANK() OVER(PARTITION BY b.[index] ORDER BY ABS(a.Value - b.value), a.[index]) AS b_match_rank
FROM dbo.TableA AS a
CROSS JOIN dbo.TableB AS b
)
, best_matches AS (
SELECT
a_index
, a_value
, b_index
, b_value
FROM ranked_matches
WHERE
a_match_rank = 1
AND b_match_rank= 1
)
SELECT
TableA.[index] AS a_index
, TableA.value AS a_value
, best_matches.b_index
, best_matches.b_value
FROM dbo.TableA
LEFT JOIN best_matches ON
best_matches.a_index = TableA.[index]
ORDER BY
TableA.[index];
EDIT:
Although this method uses CTEs, recursion is not used and is therefore not limited to 32K recursions. There may be room for improvement here from a performance perspective, though.
I don't think it is possible without a cursor.
Even if it is possible to do it without a cursor, it would definitely require self-joins, maybe more than once. As a result performance is likely to be poor, likely worse than straight-forward cursor. And it is likely that it would be hard to understand the logic and later maintain this code. Sometimes cursors are useful.
The main difficulty is this part of the question:
when value has been selected, that value cannot re-used.
There was a similar question just few days ago.
The logic is straight-forward. Cursor loops through all rows of table A and with each iteration adds one row to the temporary destination table. To determine the value to add I use EXCEPT operator that takes all values from the table B and removes from them all values that have been used before. My solution assumes that there are no duplicates in value in table B. EXCEPT operator removes duplicates. If values in table B are not unique, then temporary table would hold unique indexB instead of valueB, but main logic remains the same.
Here is SQL Fiddle.
Sample data
DECLARE #TA TABLE (idx int, value int);
INSERT INTO #TA (idx, value) VALUES
(1, 123),
(2, 245),
(3, 342),
(4, 456),
(5, 608);
DECLARE #TB TABLE (idx int, value int);
INSERT INTO #TB (idx, value) VALUES
(1, 152),
(2, 159),
(3, 259);
Main query inserts result into temporary table #TDst. It is possible to write that INSERT without using explicit variable #CurrValueB, but it looks a bit cleaner with variable.
DECLARE #TDst TABLE (idx int, valueA int, valueB int);
DECLARE #CurrIdx int;
DECLARE #CurrValueA int;
DECLARE #CurrValueB int;
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT idx, value
FROM #TA
ORDER BY idx;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
SET #CurrValueB =
(
SELECT TOP(1) Diff.valueB
FROM
(
SELECT B.value AS valueB
FROM #TB AS B
EXCEPT -- remove values that have been selected before
SELECT Dst.valueB
FROM #TDst AS Dst
) AS Diff
ORDER BY ABS(Diff.valueB - #CurrValueA)
);
INSERT INTO #TDst (idx, valueA, valueB)
VALUES (#CurrIdx, #CurrValueA, #CurrValueB);
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT * FROM #TDst ORDER BY idx;
Result
idx valueA valueB
1 123 152
2 245 259
3 342 159
4 456 NULL
5 608 NULL
It would help to have the following indexes:
TableA - (idx) include (value), because we SELECT idx, value ORDER BY idx;
TableB - (value) unique, Temp destination table - (valueB) unique filtered NOT NULL, to help EXCEPT. So, it may be better to have a temporary #table for result (or permanent table) instead of table variable, because table variables can't have indexes.
Another possible method would be to delete a row from table B (from original or from a copy) as its value is inserted into result. In this method we can avoid performing EXCEPT again and again and it could be faster overall, especially if it is OK to leave table B empty in the end. Still, I don't see how to avoid cursor and processing individual rows in sequence.
SQL Fiddle
DECLARE #TDst TABLE (idx int, valueA int, valueB int);
DECLARE #CurrIdx int;
DECLARE #CurrValueA int;
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT idx, value
FROM #TA
ORDER BY idx;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
WITH
CTE
AS
(
SELECT TOP(1) B.idx, B.value
FROM #TB AS B
ORDER BY ABS(B.value - #CurrValueA)
)
DELETE FROM CTE
OUTPUT #CurrIdx, #CurrValueA, deleted.value INTO #TDst;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT
A.idx
,A.value AS valueA
,Dst.valueB
FROM
#TA AS A
LEFT JOIN #TDst AS Dst ON Dst.idx = A.idx
ORDER BY idx;
I highly believe THIS IS NOT A GOOD PRACTICE because I am bypassing the policy SQL made for itself that functions with side-effects (INSERT,UPDATE,DELETE) is a NO, but due to the fact that I want solve this without resulting to iteration options, I came up with this and gave me better view of things now.
create table tablea
(
num INT,
val MONEY
)
create table tableb
(
num INT,
val MONEY
)
I created a hard-table temp which I shall drop from time-to-time.
if((select 1 from sys.tables where name = 'temp_tableb') is not null) begin drop table temp_tableb end
select * into temp_tableb from tableb
I created a function that executes xp_cmdshell (this is where the side-effect bypassing happens)
CREATE FUNCTION [dbo].[GetNearestMatch]
(
#ParamValue MONEY
)
RETURNS MONEY
AS
BEGIN
DECLARE #ReturnNum MONEY
, #ID INT
SELECT TOP 1
#ID = num
, #ReturnNum = val
FROM temp_tableb ORDER BY ABS(val - #ParamValue)
DECLARE #SQL varchar(500)
SELECT #SQL = 'osql -S' + ##servername + ' -E -q "delete from test..temp_tableb where num = ' + CONVERT(NVARCHAR(150),#ID) + ' "'
EXEC master..xp_cmdshell #SQL
RETURN #ReturnNum
END
and my usage in my query simply looks like this.
-- initialize temp
if((select 1 from sys.tables where name = 'temp_tableb') is not null) begin drop table temp_tableb end
select * into temp_tableb from tableb
-- query nearest match
select
*
, dbo.GetNearestMatch(a.val) AS [NearestValue]
from tablea a
and gave me this..

INSERT INTO With a SubQuery and some operations

I'm trying to insert some data to a table contains two things : "a string" and "maximum number in Order column + 1".
This is my query:
INSERT INTO MyTable ([Text],[Order])
SELECT 'MyText' , (Max([Order]) + 1)
FROM MyTable
What is going wrong with my query?
I'm using Microsoft SQL Server 2005 SP3.
You can test this query like this:
I don't receive error:
create table #MyTable
(
[Text] varchar(40),
[Order] int NOT NULL
)
INSERT INTO #MyTable([Text],[Order])
SELECT 'MyText' [Text], isnull(max([order]) + 1, 0) [Order]
FROM #MyTable
drop table #MyTable
Original:
INSERT INTO MyTable ([Text],[Order])
SELECT 'MyText' [Text], max([Order]) + 1 [Order]
FROM MyTable
or
INSERT INTO MyTable ([Text],[Order])
SELECT top 1 'MyText' [Text], max([Order]) + 1 [Order]
FROM MyTable
limit is not valid in SQL Server as far as I know.
Cannot insert the value NULL into column 'Order', table 'master.dbo.MyTable'; column does not allow nulls. INSERT fails. The statement has been terminated.
This means that the Order column isn't allowed to be null, and that the Max([Order]) + 1 part of your column returns NULL.
This is because your table is empty, as you already noticed by yourself.
You can work around this by replacing NULL by a real number in the query, using ISNULL():
INSERT INTO MyTable ([Text],[Order])
SELECT 'MyText' , (isnull(Max([Order]),0) + 1)
FROM MyTable
Unless he has a column named OrderBy
then he would have to add / assign all values within that Insert especially if the column does not allow for nulls
sounds like fully qualifying the Insert with the dbo.MyTable.Field may make more sense.
also why are you naming fields with SQL Key words...???
INSERT INTO MyTable ([Text],[Order] Values('MyTextTest',1)
try a test insert first..

Select records with order of IN clause

I have
SELECT * FROM Table1 WHERE Col1 IN(4,2,6)
I want to select and return the records with the specified order which i indicate in the IN clause
(first display record with Col1=4, Col1=2, ...)
I can use
SELECT * FROM Table1 WHERE Col1 = 4
UNION ALL
SELECT * FROM Table1 WHERE Col1 = 6 , .....
but I don't want to use that, cause I want to use it as a stored procedure and not auto generated.
I know it's a bit late but the best way would be
SELECT *
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')
Or
SELECT CHARINDEX(CAST(Col1 AS VARCHAR), '4,2,67')s_order,
*
FROM Table1
WHERE Col1 IN( 4, 2, 6 )
ORDER BY s_order
You have a couple of options. Simplest may be to put the IN parameters (they are parameters, right) in a separate table in the order you receive them, and ORDER BY that table.
The solution is along this line:
SELECT * FROM Table1
WHERE Col1 IN(4,2,6)
ORDER BY
CASE Col1
WHEN 4 THEN 1
WHEN 2 THEN 2
WHEN 6 THEN 3
END
select top 0 0 'in', 0 'order' into #i
insert into #i values(4,1)
insert into #i values(2,2)
insert into #i values(6,3)
select t.* from Table1 t inner join #i i on t.[in]=t.[col1] order by i.[order]
Replace the IN values with a table, including a column for sort order to used in the query (and be sure to expose the sort order to the calling application):
WITH OtherTable (Col1, sort_seq)
AS
(
SELECT Col1, sort_seq
FROM (
VALUES (4, 1),
(2, 2),
(6, 3)
) AS OtherTable (Col1, sort_seq)
)
SELECT T1.Col1, O1.sort_seq
FROM Table1 AS T1
INNER JOIN OtherTable AS O1
ON T1.Col1 = O1.Col1
ORDER
BY sort_seq;
In your stored proc, rather than a CTE, split the values into table (a scratch base table, temp table, function that returns a table, etc) with the sort column populated as appropriate.
I have found another solution. It's similar to the answer from onedaywhen, but it's a little shorter.
SELECT sort.n, Table1.Col1
FROM (VALUES (4), (2), (6)) AS sort(n)
JOIN Table1
ON Table1.Col1 = sort.n
I am thinking about this problem two different ways because I can't decide if this is a programming problem or a data architecture problem. Check out the code below incorporating "famous" TV animals. Let's say that we are tracking dolphins, horses, bears, dogs and orangutans. We want to return only the horses, bears, and dogs in our query and we want bears to sort ahead of horses to sort ahead of dogs. I have a personal preference to look at this as an architecture problem, but can wrap my head around looking at it as a programming problem. Let me know if you have questions.
CREATE TABLE #AnimalType (
AnimalTypeId INT NOT NULL PRIMARY KEY
, AnimalType VARCHAR(50) NOT NULL
, SortOrder INT NOT NULL)
INSERT INTO #AnimalType VALUES (1,'Dolphin',5)
INSERT INTO #AnimalType VALUES (2,'Horse',2)
INSERT INTO #AnimalType VALUES (3,'Bear',1)
INSERT INTO #AnimalType VALUES (4,'Dog',4)
INSERT INTO #AnimalType VALUES (5,'Orangutan',3)
CREATE TABLE #Actor (
ActorId INT NOT NULL PRIMARY KEY
, ActorName VARCHAR(50) NOT NULL
, AnimalTypeId INT NOT NULL)
INSERT INTO #Actor VALUES (1,'Benji',4)
INSERT INTO #Actor VALUES (2,'Lassie',4)
INSERT INTO #Actor VALUES (3,'Rin Tin Tin',4)
INSERT INTO #Actor VALUES (4,'Gentle Ben',3)
INSERT INTO #Actor VALUES (5,'Trigger',2)
INSERT INTO #Actor VALUES (6,'Flipper',1)
INSERT INTO #Actor VALUES (7,'CJ',5)
INSERT INTO #Actor VALUES (8,'Mr. Ed',2)
INSERT INTO #Actor VALUES (9,'Tiger',4)
/* If you believe this is a programming problem then this code works */
SELECT *
FROM #Actor a
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY case when a.AnimalTypeId = 3 then 1
when a.AnimalTypeId = 2 then 2
when a.AnimalTypeId = 4 then 3 end
/* If you believe that this is a data architecture problem then this code works */
SELECT *
FROM #Actor a
JOIN #AnimalType at ON a.AnimalTypeId = at.AnimalTypeId
WHERE a.AnimalTypeId IN (2,3,4)
ORDER BY at.SortOrder
DROP TABLE #Actor
DROP TABLE #AnimalType
ORDER BY CHARINDEX(','+convert(varchar,status)+',' ,
',rejected,active,submitted,approved,')
Just put a comma before and after a string in which you are finding the substring index or you can say that second parameter.
And first parameter of CHARINDEX is also surrounded by , (comma).

Insert record only if record does not already exist in table

I'm wondering if there is a way to insert a record into a table only if the table does not already contain that record?
Is there a query that will do this, or will I need a stored procedure?
You don't say what version of SQL Server. If SQL Server 2008 you can use MERGE
NB: It is usual to use Merge for an Upsert which is what I originally thought the question was asking but it is valid without the WHEN MATCHED clause and just with a WHEN NOT MATCHED clause so does work for this case also. Example Usage.
CREATE TABLE #A(
[id] [int] NOT NULL PRIMARY KEY CLUSTERED,
[C] [varchar](200) NOT NULL)
MERGE #A AS target
USING (SELECT 3, 'C') AS source (id, C)
ON (target.id = source.id)
/*Uncomment for Upsert Semantics
WHEN MATCHED THEN
UPDATE SET C = source.C */
WHEN NOT MATCHED THEN
INSERT (id, C)
VALUES (source.id, source.C);
In terms of execution costs the two look roughly equal when an Insert is to be done...
Link to plan images for first run
but on the second run when there is no insert to be done Matthew's answer looks lower cost. I'm not sure if there is a way of improving this.
Link to plan images for second run
Test Script
select *
into #testtable
from master.dbo.spt_values
CREATE UNIQUE CLUSTERED INDEX [ix] ON #testtable([type] ASC,[number] ASC,[name] ASC)
declare #name nvarchar(35)= 'zzz'
declare #number int = 50
declare #type nchar(3) = 'A'
declare #low int
declare #high int
declare #status int = 0;
MERGE #testtable AS target
USING (SELECT #name, #number, #type, #low, #high, #status) AS source (name, number, [type], low, high, [status])
ON (target.[type] = source.[type] AND target.[number] = source.[number] and target.[name] = source.[name] )
WHEN NOT MATCHED THEN
INSERT (name, number, [type], low, high, [status])
VALUES (source.name, source.number, source.[type], source.low, source.high, source.[status]);
set #name = 'yyy'
IF NOT EXISTS
(SELECT *
FROM #testtable
WHERE [type] = #type AND [number] = #number and name = #name)
BEGIN
INSERT INTO #testtable
(name, number, [type], low, high, [status])
VALUES (#name, #number, #type, #low, #high, #status);
END
IF NOT EXISTS
(SELECT {Columns}
FROM {Table}
WHERE {Column1 = SomeValue AND Column2 = SomeOtherVale AND ...})
INSERT INTO {Table} {Values}
In short, you need a table guaranteed to provide you the ability to return one row:
Insert dbo.Table (Col1, Col2, Col3....
Select 'Value1', 'Value2', 'Value3',....
From Information_Schema.Tables
Where Table_Schema = 'dbo'
And Table_Name = 'Table'
And Not Exists (
Select 1
From dbo.Table
Where Col1 = 'Foo'
And Col2 = 'Bar'
And ....
)
I've seen this variation in the wild as well:
Insert Table (Col1, Col2, Col3....
Select 'Value1', 'Value2', 'Value3'....
From (
Select 1 As Num
) As Z
Where Not Exists (
Select 1
From Table
Where Col1 = Foo
And Col2 = Bar
And ....
)
I have to vote for adding a CONSTRAINT. It's the simplest and the most robust answer. I mean, looking at how complicated the other answers are I'd say they're much harder to get right (and keep right).
The downsides: [1] it's not obvious from reading the code that uniqueness is enforced in the DB [2] the client code has to know to catch an exception. In other words, the guy coming after you might wonder "how did this ever work?"
That aside: I used to worry that throwing/catching the exception was a performance hit but I did some testing (on SQL Server 2005) and it wasn't significant.