Update Id on INSERT INTO SELECT FROM statement

Update Id on INSERT INTO SELECT FROM statement - sql

i want to copy specific columns from Table [From] to Table [To] but also want to insert the foreign key from [To] in [From]
Table definitions:
[From]
Id (int)
pic (varbinary(MAX))
picId (int)
[To]
Id (int)
pic (varbinary(MAX))
My copy query works perfectly but i dont know how to update the "picId" column inside of the Table [From]
INSERT INTO dbo.[To] (Id,pic)
SELECT
isnull(T.m, 0) + row_number() over (order by F.pic),
F.pic
FROM dbo.[From] AS F
outer apply (select max(pic) as m from dbo.[To]) as T;
Now i want to copy the inserted [To].Id to [From].picId.
Can anyone help me please?

2 changes would make your solution much easier, but assuming you can't control anything about the dbo.[TO] table, here is a solution that will work for you.
The first improvement would be making dbo.[to].ID an identity column. Then you could drop your whole "row_number() over" line and let SQL manage the ID. What you're doing works, but it's like cutting wood with a chain saw by dragging the (not running) saw back and forth over the wood. You can do it, but it's a lot easier if you start the engine and let it do the work.
The second change is adding a column dbo.[to].FromID and populating it when you insert the row. The OUTPUT statement can only reference fields in the row being inserted (or deleted, but that's not relevant here) so you can't get the ID of the row in dbo.[from] that you want to update unless you have it in the row in dbo.[to] you just inserted. If you do this, you can use a plain old INSERT with an OUTPUT clause. The trick here is using a MERGE statement, and you can see a full explanation here: Is it possible to for SQL Output clause to return a column not being inserted? I strongly urge you to upvote that answer if you find this useful. I could not have provided you with this without that answer!
Anyway, here is the solution:
--Create some fake data, you'll already have dbo.[From]
CREATE TABLE #TestFrom (FromID INT, Pic nvarchar(1000), ToID INT NULL)
CREATE TABLE #TestTo (ToID INT, Pic nvarchar(1000))
--It would be much easier if your TO used an IDENTITY column instead of managing the ID manually
CREATE TABLE #TestToIdent (ToID INT IDENTITY(1,1), Pic nvarchar(1000))
INSERT INTO #TestFrom
VALUES (1, 'Test 1', NULL)
, (3, 'Test 3', NULL)
, (7, 'Test 7', NULL)
, (13, 'Test 13', NULL)
--Define a table variable to hold your OUTPUT, you'll need this
DECLARE #Mapping as table (FromID INT, ToID INT);
with cteIns as (
SELECT
isnull(T.m, 0) + row_number() over (order by F.pic) as ToID
, F.pic, F.FromID
FROM #TestFrom AS F
outer apply (select max(ToID) as m from #TestTo ) as T
) --From https://stackoverflow.com/questions/10949730/is-it-possible-to-for-sql-output-clause-to-return-a-column-not-being-inserted
MERGE INTO #TestTo as T --Put results here
USING cteIns as F --Source is here
on 0=1 --But this is never true so never merge, just INSERT
WHEN NOT MATCHED THEN --ALWAYS because 0 never = 1
INSERT (ToID, pic)
VALUES (F.ToID, F.pic)
OUTPUT F.FromID, inserted.ToID
INTO #Mapping (FromID, ToID );
--SELECT * FROM #Mapping --Test the mapping if you're curious
--SELECT * FROM #TestTo --Test the insedert if you're curious
--Update the dbo.[FROM] with the ID of the [TO] that got inserted
UPDATE #TestFrom SET ToID = M.ToID
FROM #TestFrom as F
INNER JOIN #Mapping as M
ON M.FromID = F.FromID
--View the results
SELECT * FROM #TestFrom
DROP TABLE #TestFrom
DROP TABLE #TestTo
DROP TABLE #TestToIdent

Related

SQL Server: return joined data from insert select

I perform steps:
Create temporal table and fill it with data and unique order column [_oid]
Insert everything from temporal table into real table except fictional [_oid], outputting generated [id]'s
Return those generated [id]'s along with corresponding [_oid]
SQL:
CREATE TABLE #temp
(
[Hash] INT NOT NULL,
[Size] INT NOT NULL,
[Data] NVARCHAR(MAX),
[_oid] INT NOT NULL
)
--here insert data into #temp--
INSERT [dbo].[TestObjects]
OUTPUT INSERTED.[Id]
SELECT [Hash], [Size], [Data]
FROM #temp
DROP TABLE #temp
How I can return ([Id], [_oid]) rows ? ....Or at least return [Id] ordered by [_oid] ?
I know insert does not preserve order of inserted items in it's output, but still...

I think you what you are asking for is INSERT INTO, as so:
INSERT INTO [dbo].[TestObjects]
SELECT Hash, Size, Data FROM #temp
ORDER BY _oid
But as you say, there's no guarantee about order when you select from TestObjects, so if it's important can you not have a field in TestObjects you can ORDER BY when you SELECT from it?

IF your insert into #temp is such that both o_id and (hash,size,data) are unique for each row (ie keys), then you could retrieve the inserted o_id from #temp:
select t.[_oid],to.[Id]
from #temp t
inner join [dbo].[TestObjects] to
on t.Hash=to.Hash and t.Size=to.Size and t.data=to.data

As noted by George Menoutis, I did merge:
MERGE [dbo].[TestObjects] AS T_Base
USING #temp AS T_Source
ON (0<>0)
WHEN NOT MATCHED THEN INSERT ([Hash],[Size],[Data]) VALUES (T_Source.[Hash],T_Source.[Size],T_Source.[Data])
OUTPUT INSERTED.[Id], T_Source.[_oid];
If anyone have better approach - feel free to contribute to this answer.

INSERT inside an INSERT statement and use its ID in the outer INSERT [duplicate]

Very simplified, I have two tables Source and Target.
declare #Source table (SourceID int identity(1,2), SourceName varchar(50))
declare #Target table (TargetID int identity(2,2), TargetName varchar(50))
insert into #Source values ('Row 1'), ('Row 2')
I would like to move all rows from #Source to #Target and know the TargetID for each SourceID because there are also the tables SourceChild and TargetChild that needs to be copied as well and I need to add the new TargetID into TargetChild.TargetID FK column.
There are a couple of solutions to this.
Use a while loop or cursors to insert one row (RBAR) to Target at a time and use scope_identity() to fill the FK of TargetChild.
Add a temp column to #Target and insert SourceID. You can then join that column to fetch the TargetID for the FK in TargetChild.
SET IDENTITY_INSERT OFF for #Target and handle assigning new values yourself. You get a range that you then use in TargetChild.TargetID.
I'm not all that fond of any of them. The one I used so far is cursors.
What I would really like to do is to use the output clause of the insert statement.
insert into #Target(TargetName)
output inserted.TargetID, S.SourceID
select SourceName
from #Source as S
But it is not possible
The multi-part identifier "S.SourceID" could not be bound.
But it is possible with a merge.
merge #Target as T
using #Source as S
on 0=1
when not matched then
insert (TargetName) values (SourceName)
output inserted.TargetID, S.SourceID;
Result
TargetID SourceID
----------- -----------
2 1
4 3
I want to know if you have used this? If you have any thoughts about the solution or see any problems with it? It works fine in simple scenarios but perhaps something ugly could happen when the query plan get really complicated due to a complicated source query. Worst scenario would be that the TargetID/SourceID pairs actually isn't a match.
MSDN has this to say about the from_table_name of the output clause.
Is a column prefix that specifies a table included in the FROM clause of a DELETE, UPDATE, or MERGE statement that is used to specify the rows to update or delete.
For some reason they don't say "rows to insert, update or delete" only "rows to update or delete".
Any thoughts are welcome and totally different solutions to the original problem is much appreciated.

In my opinion this is a great use of MERGE and output. I've used in several scenarios and haven't experienced any oddities to date.
For example, here is test setup that clones a Folder and all Files (identity) within it into a newly created Folder (guid).
DECLARE #FolderIndex TABLE (FolderId UNIQUEIDENTIFIER PRIMARY KEY, FolderName varchar(25));
INSERT INTO #FolderIndex
(FolderId, FolderName)
VALUES(newid(), 'OriginalFolder');
DECLARE #FileIndex TABLE (FileId int identity(1,1) PRIMARY KEY, FileName varchar(10));
INSERT INTO #FileIndex
(FileName)
VALUES('test.txt');
DECLARE #FileFolder TABLE (FolderId UNIQUEIDENTIFIER, FileId int, PRIMARY KEY(FolderId, FileId));
INSERT INTO #FileFolder
(FolderId, FileId)
SELECT FolderId,
FileId
FROM #FolderIndex
CROSS JOIN #FileIndex; -- just to illustrate
DECLARE #sFolder TABLE (FromFolderId UNIQUEIDENTIFIER, ToFolderId UNIQUEIDENTIFIER);
DECLARE #sFile TABLE (FromFileId int, ToFileId int);
-- copy Folder Structure
MERGE #FolderIndex fi
USING ( SELECT 1 [Dummy],
FolderId,
FolderName
FROM #FolderIndex [fi]
WHERE FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT
(FolderId, FolderName)
VALUES (newid(), 'copy_'+FolderName)
OUTPUT d.FolderId,
INSERTED.FolderId
INTO #sFolder (FromFolderId, toFolderId);
-- copy File structure
MERGE #FileIndex fi
USING ( SELECT 1 [Dummy],
fi.FileId,
fi.[FileName]
FROM #FileIndex fi
INNER
JOIN #FileFolder fm ON
fi.FileId = fm.FileId
INNER
JOIN #FolderIndex fo ON
fm.FolderId = fo.FolderId
WHERE fo.FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT ([FileName])
VALUES ([FileName])
OUTPUT d.FileId,
INSERTED.FileId
INTO #sFile (FromFileId, toFileId);
-- link new files to Folders
INSERT INTO #FileFolder (FileId, FolderId)
SELECT sfi.toFileId, sfo.toFolderId
FROM #FileFolder fm
INNER
JOIN #sFile sfi ON
fm.FileId = sfi.FromFileId
INNER
JOIN #sFolder sfo ON
fm.FolderId = sfo.FromFolderId
-- return
SELECT *
FROM #FileIndex fi
JOIN #FileFolder ff ON
fi.FileId = ff.FileId
JOIN #FolderIndex fo ON
ff.FolderId = fo.FolderId

I would like to add another example to add to #Nathan's example, as I found it somewhat confusing.
Mine uses real tables for the most part, and not temp tables.
I also got my inspiration from here: another example
-- Copy the FormSectionInstance
DECLARE #FormSectionInstanceTable TABLE(OldFormSectionInstanceId INT, NewFormSectionInstanceId INT)
;MERGE INTO [dbo].[FormSectionInstance]
USING
(
SELECT
fsi.FormSectionInstanceId [OldFormSectionInstanceId]
, #NewFormHeaderId [NewFormHeaderId]
, fsi.FormSectionId
, fsi.IsClone
, #UserId [NewCreatedByUserId]
, GETDATE() NewCreatedDate
, #UserId [NewUpdatedByUserId]
, GETDATE() NewUpdatedDate
FROM [dbo].[FormSectionInstance] fsi
WHERE fsi.[FormHeaderId] = #FormHeaderId
) tblSource ON 1=0 -- use always false condition
WHEN NOT MATCHED
THEN INSERT
( [FormHeaderId], FormSectionId, IsClone, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
VALUES( [NewFormHeaderId], FormSectionId, IsClone, NewCreatedByUserId, NewCreatedDate, NewUpdatedByUserId, NewUpdatedDate)
OUTPUT tblSource.[OldFormSectionInstanceId], INSERTED.FormSectionInstanceId
INTO #FormSectionInstanceTable(OldFormSectionInstanceId, NewFormSectionInstanceId);
-- Copy the FormDetail
INSERT INTO [dbo].[FormDetail]
(FormHeaderId, FormFieldId, FormSectionInstanceId, IsOther, Value, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
SELECT
#NewFormHeaderId, FormFieldId, fsit.NewFormSectionInstanceId, IsOther, Value, #UserId, CreatedDate, #UserId, UpdatedDate
FROM [dbo].[FormDetail] fd
INNER JOIN #FormSectionInstanceTable fsit ON fsit.OldFormSectionInstanceId = fd.FormSectionInstanceId
WHERE [FormHeaderId] = #FormHeaderId

Here's a solution that doesn't use MERGE (which I've had problems with many times I try to avoid if possible). It relies on two memory tables (you could use temp tables if you want) with IDENTITY columns that get matched, and importantly, using ORDER BY when doing the INSERT, and WHERE conditions that match between the two INSERTs... the first one holds the source IDs and the second one holds the target IDs.
-- Setup... We have a table that we need to know the old IDs and new IDs after copying.
-- We want to copy all of DocID=1
DECLARE #newDocID int = 99;
DECLARE #tbl table (RuleID int PRIMARY KEY NOT NULL IDENTITY(1, 1), DocID int, Val varchar(100));
INSERT INTO #tbl (DocID, Val) VALUES (1, 'RuleA-2'), (1, 'RuleA-1'), (2, 'RuleB-1'), (2, 'RuleB-2'), (3, 'RuleC-1'), (1, 'RuleA-3')
-- Create a break in IDENTITY values.. just to simulate more realistic data
INSERT INTO #tbl (Val) VALUES ('DeleteMe'), ('DeleteMe');
DELETE FROM #tbl WHERE Val = 'DeleteMe';
INSERT INTO #tbl (DocID, Val) VALUES (6, 'RuleE'), (7, 'RuleF');
SELECT * FROM #tbl t;
-- Declare TWO temp tables each with an IDENTITY - one will hold the RuleID of the items we are copying, other will hold the RuleID that we create
DECLARE #input table (RID int IDENTITY(1, 1), SourceRuleID int NOT NULL, Val varchar(100));
DECLARE #output table (RID int IDENTITY(1,1), TargetRuleID int NOT NULL, Val varchar(100));
-- Capture the IDs of the rows we will be copying by inserting them into the #input table
-- Important - we must specify the sort order - best thing is to use the IDENTITY of the source table (t.RuleID) that we are copying
INSERT INTO #input (SourceRuleID, Val) SELECT t.RuleID, t.Val FROM #tbl t WHERE t.DocID = 1 ORDER BY t.RuleID;
-- Copy the rows, and use the OUTPUT clause to capture the IDs of the inserted rows.
-- Important - we must use the same WHERE and ORDER BY clauses as above
INSERT INTO #tbl (DocID, Val)
OUTPUT Inserted.RuleID, Inserted.Val INTO #output(TargetRuleID, Val)
SELECT #newDocID, t.Val FROM #tbl t
WHERE t.DocID = 1
ORDER BY t.RuleID;
-- Now #input and #output should have the same # of rows, and the order of both inserts was the same, so the IDENTITY columns (RID) can be matched
-- Use this as the map from old-to-new when you are copying sub-table rows
-- Technically, #input and #output don't even need the 'Val' columns, just RID and RuleID - they were included here to prove that the rules matched
SELECT i.*, o.* FROM #output o
INNER JOIN #input i ON i.RID = o.RID
-- Confirm the matching worked
SELECT * FROM #tbl t

SQL Server, Select/Output/Insert - need to select value for output but not insert [duplicate]

In my opinion this is a great use of MERGE and output. I've used in several scenarios and haven't experienced any oddities to date.
For example, here is test setup that clones a Folder and all Files (identity) within it into a newly created Folder (guid).
DECLARE #FolderIndex TABLE (FolderId UNIQUEIDENTIFIER PRIMARY KEY, FolderName varchar(25));
INSERT INTO #FolderIndex
(FolderId, FolderName)
VALUES(newid(), 'OriginalFolder');
DECLARE #FileIndex TABLE (FileId int identity(1,1) PRIMARY KEY, FileName varchar(10));
INSERT INTO #FileIndex
(FileName)
VALUES('test.txt');
DECLARE #FileFolder TABLE (FolderId UNIQUEIDENTIFIER, FileId int, PRIMARY KEY(FolderId, FileId));
INSERT INTO #FileFolder
(FolderId, FileId)
SELECT FolderId,
FileId
FROM #FolderIndex
CROSS JOIN #FileIndex; -- just to illustrate
DECLARE #sFolder TABLE (FromFolderId UNIQUEIDENTIFIER, ToFolderId UNIQUEIDENTIFIER);
DECLARE #sFile TABLE (FromFileId int, ToFileId int);
-- copy Folder Structure
MERGE #FolderIndex fi
USING ( SELECT 1 [Dummy],
FolderId,
FolderName
FROM #FolderIndex [fi]
WHERE FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT
(FolderId, FolderName)
VALUES (newid(), 'copy_'+FolderName)
OUTPUT d.FolderId,
INSERTED.FolderId
INTO #sFolder (FromFolderId, toFolderId);
-- copy File structure
MERGE #FileIndex fi
USING ( SELECT 1 [Dummy],
fi.FileId,
fi.[FileName]
FROM #FileIndex fi
INNER
JOIN #FileFolder fm ON
fi.FileId = fm.FileId
INNER
JOIN #FolderIndex fo ON
fm.FolderId = fo.FolderId
WHERE fo.FolderName = 'OriginalFolder'
) d ON d.Dummy = 0
WHEN NOT MATCHED
THEN INSERT ([FileName])
VALUES ([FileName])
OUTPUT d.FileId,
INSERTED.FileId
INTO #sFile (FromFileId, toFileId);
-- link new files to Folders
INSERT INTO #FileFolder (FileId, FolderId)
SELECT sfi.toFileId, sfo.toFolderId
FROM #FileFolder fm
INNER
JOIN #sFile sfi ON
fm.FileId = sfi.FromFileId
INNER
JOIN #sFolder sfo ON
fm.FolderId = sfo.FromFolderId
-- return
SELECT *
FROM #FileIndex fi
JOIN #FileFolder ff ON
fi.FileId = ff.FileId
JOIN #FolderIndex fo ON
ff.FolderId = fo.FolderId

I would like to add another example to add to #Nathan's example, as I found it somewhat confusing.
Mine uses real tables for the most part, and not temp tables.
I also got my inspiration from here: another example
-- Copy the FormSectionInstance
DECLARE #FormSectionInstanceTable TABLE(OldFormSectionInstanceId INT, NewFormSectionInstanceId INT)
;MERGE INTO [dbo].[FormSectionInstance]
USING
(
SELECT
fsi.FormSectionInstanceId [OldFormSectionInstanceId]
, #NewFormHeaderId [NewFormHeaderId]
, fsi.FormSectionId
, fsi.IsClone
, #UserId [NewCreatedByUserId]
, GETDATE() NewCreatedDate
, #UserId [NewUpdatedByUserId]
, GETDATE() NewUpdatedDate
FROM [dbo].[FormSectionInstance] fsi
WHERE fsi.[FormHeaderId] = #FormHeaderId
) tblSource ON 1=0 -- use always false condition
WHEN NOT MATCHED
THEN INSERT
( [FormHeaderId], FormSectionId, IsClone, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
VALUES( [NewFormHeaderId], FormSectionId, IsClone, NewCreatedByUserId, NewCreatedDate, NewUpdatedByUserId, NewUpdatedDate)
OUTPUT tblSource.[OldFormSectionInstanceId], INSERTED.FormSectionInstanceId
INTO #FormSectionInstanceTable(OldFormSectionInstanceId, NewFormSectionInstanceId);
-- Copy the FormDetail
INSERT INTO [dbo].[FormDetail]
(FormHeaderId, FormFieldId, FormSectionInstanceId, IsOther, Value, CreatedByUserId, CreatedDate, UpdatedByUserId, UpdatedDate)
SELECT
#NewFormHeaderId, FormFieldId, fsit.NewFormSectionInstanceId, IsOther, Value, #UserId, CreatedDate, #UserId, UpdatedDate
FROM [dbo].[FormDetail] fd
INNER JOIN #FormSectionInstanceTable fsit ON fsit.OldFormSectionInstanceId = fd.FormSectionInstanceId
WHERE [FormHeaderId] = #FormHeaderId

Here's a solution that doesn't use MERGE (which I've had problems with many times I try to avoid if possible). It relies on two memory tables (you could use temp tables if you want) with IDENTITY columns that get matched, and importantly, using ORDER BY when doing the INSERT, and WHERE conditions that match between the two INSERTs... the first one holds the source IDs and the second one holds the target IDs.
-- Setup... We have a table that we need to know the old IDs and new IDs after copying.
-- We want to copy all of DocID=1
DECLARE #newDocID int = 99;
DECLARE #tbl table (RuleID int PRIMARY KEY NOT NULL IDENTITY(1, 1), DocID int, Val varchar(100));
INSERT INTO #tbl (DocID, Val) VALUES (1, 'RuleA-2'), (1, 'RuleA-1'), (2, 'RuleB-1'), (2, 'RuleB-2'), (3, 'RuleC-1'), (1, 'RuleA-3')
-- Create a break in IDENTITY values.. just to simulate more realistic data
INSERT INTO #tbl (Val) VALUES ('DeleteMe'), ('DeleteMe');
DELETE FROM #tbl WHERE Val = 'DeleteMe';
INSERT INTO #tbl (DocID, Val) VALUES (6, 'RuleE'), (7, 'RuleF');
SELECT * FROM #tbl t;
-- Declare TWO temp tables each with an IDENTITY - one will hold the RuleID of the items we are copying, other will hold the RuleID that we create
DECLARE #input table (RID int IDENTITY(1, 1), SourceRuleID int NOT NULL, Val varchar(100));
DECLARE #output table (RID int IDENTITY(1,1), TargetRuleID int NOT NULL, Val varchar(100));
-- Capture the IDs of the rows we will be copying by inserting them into the #input table
-- Important - we must specify the sort order - best thing is to use the IDENTITY of the source table (t.RuleID) that we are copying
INSERT INTO #input (SourceRuleID, Val) SELECT t.RuleID, t.Val FROM #tbl t WHERE t.DocID = 1 ORDER BY t.RuleID;
-- Copy the rows, and use the OUTPUT clause to capture the IDs of the inserted rows.
-- Important - we must use the same WHERE and ORDER BY clauses as above
INSERT INTO #tbl (DocID, Val)
OUTPUT Inserted.RuleID, Inserted.Val INTO #output(TargetRuleID, Val)
SELECT #newDocID, t.Val FROM #tbl t
WHERE t.DocID = 1
ORDER BY t.RuleID;
-- Now #input and #output should have the same # of rows, and the order of both inserts was the same, so the IDENTITY columns (RID) can be matched
-- Use this as the map from old-to-new when you are copying sub-table rows
-- Technically, #input and #output don't even need the 'Val' columns, just RID and RuleID - they were included here to prove that the rules matched
SELECT i.*, o.* FROM #output o
INNER JOIN #input i ON i.RID = o.RID
-- Confirm the matching worked
SELECT * FROM #tbl t

Filtering a group of records

Please see the SQL structure below:
CREATE table TestTable (id int not null identity, [type] char(1), groupid int)
INSERT INTO TestTable ([type]) values ('a',1)
INSERT INTO TestTable ([type]) values ('a',1)
INSERT INTO TestTable ([type]) values ('b',1)
INSERT INTO TestTable ([type]) values ('b',1)
INSERT INTO TestTable ([type]) values ('a',2)
INSERT INTO TestTable ([type]) values ('a',2)
The first four records are part of group 1 and the fifth and sixth records are part of group 2.
If there is at least one b in the group then I want the query to only return b's for that group. If there are no b's then the query should return all records for that group.

Here you go
SELECT *
FROM testtable
LEFT JOIN (SELECT distinct groupid FROM TestTable WHERE type = 'b'
) blist ON blist.groupid = testtable.groupid
WHERE (blist.groupid = testtable.groupid and type = 'b') OR
(blist.groupid is null)
How it works
join to a list of items that contain b.
Then in where statement... if we exist in that list just take type b. Otherwise take everything.
As an after-note you could be cute with the where clause like this
WHERE ISNULL(blist.groupid,testtable.groupid) = testtable.groupid
I think this is less clear -- but is often how advanced users will do it.

How can I insert random values into a SQL Server table?

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:
stackoverflow.com/.../update-sql-table-with-random-value-from-other-table
When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.
When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:
SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once
The random value's seed is the same on every record the query updates
I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?
This is the code I'm using:
DECLARE #randomStuff TABLE ([id] INT, [val] VARCHAR(100))
INSERT INTO #randomStuff ([id], [val])
VALUES ( 1, 'Test Value 1' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 2, 'Test Value 2' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 3, 'Test Value 3' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 4, 'Test Value 4' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 5, 'Test Value 5' )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 6, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 7, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 8, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 9, null )
INSERT INTO #randomStuff ([id], [val])
VALUES ( 10, null )
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())

When the query engine sees this...
(SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())
... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"
You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn, but it ain't smart enough to see through this.
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 val
FROM #randomStuff r
INNER JOIN MyTable _MT
ON M.Id = _MT.Id
ORDER BY NEWID())
FROM MyTable M
By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.
A cursor would probably be just as fast, but is most certainly not as fun.

use a cross join to generate random data

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.
Once #randomStuff is set up, we do this (note in my case, #MyTable is a table variable, adjust accordingly for your normal table):
DECLARE #randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)
INSERT INTO #randomMappings
SELECT M.id, val, NEWID() AS sort
FROM #MyTable AS M
CROSS JOIN #randomstuff
so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then
DELETE others FROM #randomMappings AS others
INNER JOIN #randomMappings AS lower
ON (lower.id = others.id) AND (lower.sorter < others.sorter)
This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:
UPDATE #MyTable
SET MyColumn = random.val
FROM #MyTable m, #randomMappings AS random
WHERE (random.id = m.id)
And you're done!
I said it was hacky...

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.
then you would have
UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()

There is no optimization going on here.
Your using a subquery that selects a single value, there is nothing to optimize.
You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID()
WHERE MyTable.MyColumn = MyTable.MyColumn )

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.
In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:
DECLARE #rows INT
DECLARE #currentRow INT
SELECT #rows = COUNT(*) FROM dbo.MyTable
SET #currentRow = 1
WHILE #currentRow < #rows
BEGIN
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM #randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
FROM MyTable a) AS b
WHERE #currentRow = b.rownumber
)
SET #currentRow = #currentRow + 1
END

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas