Simple update statement so that all rows are assigned a different value - sql

I'm trying to set a column in one table to a random foreign key for testing purposes.
I attempted using the below query
update table1 set table2Id = (select top 1 table2Id from table2 order by NEWID())
This will get one table2Id at random and assign it as the foreign key in table1 for each row.
It's almost what I want, but I want each row to get a different table2Id value.
I could do this by looping through the rows in table1, but I know there's a more concise way of doing it.

On some test table my end your original plan looks as follows.
It just calculates the result once and caches it in a sppol then replays that result. You could try the following so that SQL Server sees the subquery as correlated and in need of re-evaluating for each outer row.
UPDATE table1
SET table2Id = (SELECT TOP 1 table2Id
FROM table2
ORDER BY Newid(),
table1.table1Id)
For me that gives this plan without the spool.
It is important to correlate on a unique field from table1 however so that even if a spool is added it must always be rebound rather than rewound (replaying the last result) as the correlation value will be different for each row.
If the tables are large this will be slow as work required is a product of the two table's rows (for each row in table1 it needs to do a full scan of table2)

I'm having another go at answering this, since my first answer was incomplete.
As there is no other way to join the two tables until you assign the table2_id you can use row_number to give a temporary key to both table1 and table2.
with
t1 as (
select row_number() over (order by table1_id) as row, table1_id
from table1 )
,
t2 as (
select row_number() over (order by NEWID()) as row, table2_id
from table2 )
update table1
set table2_id = t2.table2_id
from t1 inner join t2
on t1.row = t2.row
select * from table1
SQL Fiddle to test it out: http://sqlfiddle.com/#!6/bf414/12

Broke down and used a loop for it. This worked, although it was very slow.
Select *
Into #Temp
From table1
Declare #Id int
While (Select Count(*) From #Temp) > 0
Begin
Select Top 1 #Id = table1Id From #Temp
update table1 set table2Id = (select top 1 table2Id from table2 order by NEWID()) where table1Id = #Id
Delete #Temp Where table1Id = #Id
End
drop table #Temp

I'm going to assume MS SQL based on top 1:
update table1
set table2Id =
(select top 1 table2Id from table2 tablesample(1 percent))
(sorry, not tested)

Related

Sql query with join on table with ID not match

I have two tables.
Table 1
Id
UpdateId
Name
Table 2
Table1ID
UpdateID
Address
Each time user update, system will insert record to table1. But for table2, system only insert record when there is update in address.
Sample data
Table 1
1,1,name1
1,2,name1
1,3,name1update
1,4,name1update
1,5,name1
1,6,name2
Table 2
1,1,address
1,4,addressupdate
I want to get the result as following
1,1,name1,address
1,2,name1,address
1,3,name1update,address
1,4,name1update,addressupdate
1,5,name1,addressupdate
1,6,name2,addressupdate
How to make use of join condition to achieve as above?
You can use a correlated subquery. Here is standard syntax, but it can be easily adapted to any database:
select t1.*,
(select t2.addressid
from table2 t2
where t2.table1id = t1.id and
t2.updateid <= t1.updateid
order by t2.updateid desc
fetch first 1 row only
) as addressid
from table1 t1;
you can use left join when you want to take all columns from left table t1 even though it doesn't match with the other table with column updateid on t2 table.
select t1.id,t1.updateid,t1.name,t2.address from table1 t1
left join table2 t2
on t2.updateid= t1.updateid
you can read more about joins here

SQL - Update last occurrence with select in clause

In Oracle, I would like do an update on a table for the last occurrence based on select in list, something like :
UPDATE table t1
set t1.fieldA = 0
where t1.id in (
select t2.id, max(t2.TIMESTAMP)
from table t2
where t2.id in (1111,2222,33333)
group by t2.id
);
This query does not works, I received an error "too many values".
Any ideas?
Thanks
Your subselect has 2(too many) fields so it cant't be compared with the id.
compare the timestamp with the max timestamp.
UPDATE table t1 set t1.fieldA = 0
where t1.TIMESTAMP = (select max(t2.TIMESTAMP) from table t2 where t2.id = t1.id )
and id in (1111,2222,33333)
I believe oracle supports multiple columns IN, so you could try:
UPDATE table t1
set t1.fieldA = 0
where
(t1.id,t1.timestamp) in ( select t2.id, max(t2.TIMESTAMP) from table t2 where t2.id in (1111,2222,33333) group by t2.id );
Essentially if you're providing a subquery to the right side of the IN that has a result set with two columns, then you have to provide the names of both the columns in brackets on the left side of the IN
Incidentally, I've never liked updating based on a max value if the requirement is that only one row be updated, as two rows with the same max value will both be updated. This kind of query is better:
Update table
Set fieldA=0
Where rowid in(
select rowid from (
select rowid, row_number() over(partition by id order by timestamp desc
) as rown from table
) where rown=1)
If, however, your primary key on the table includes the time stamp as part of the compound, then there is no concern of here being two rows with the same ID/timestamp.. it's just rarely the case!

How to remove duplicate records in a table?

I've got a table in a testing DB that someone apparently got a little too trigger-happy on when running INSERT scripts to set it up. The schema looks like this:
ID UNIQUEIDENTIFIER
TYPE_INT SMALLINT
SYSTEM_VALUE SMALLINT
NAME VARCHAR
MAPPED_VALUE VARCHAR
It's supposed to have a few dozen rows. It has about 200,000, most of which are duplicates in which TYPE_INT, SYSTEM_VALUE, NAME and MAPPED_VALUE are all identical and ID is not.
Now, I could probably make a script to clean this up that creates a temporary table in memory, uses INSERT .. SELECT DISTINCT to grab all the unique values, TRUNCATE the original table and then copy everything back. But is there a simpler way to do it, like a DELETE query with something special in the WHERE clause?
You don't give your table name but I think something like this should work. Just leaving the record which happens to have the lowest ID. You might want to test with the ROLLBACK in first!
BEGIN TRAN
DELETE <table_name>
FROM <table_name> T1
WHERE EXISTS(
SELECT * FROM <table_name> T2
WHERE
T1.TYPE_INT = T2.TYPE_INT AND
T1.SYSTEM_VALUE = T2.SYSTEM_VALUE AND
T1.NAME = T2.NAME AND
T1.MAPPED_VALUE = T2.MAPPED_VALUE AND
T2.ID > T1.ID
)
SELECT * FROM <table_name>
ROLLBACK
here is a great article on that: Deleting duplicates, which basically uses this pattern:
WITH q AS
(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
FROM t_duplicate d
)
DELETE
FROM q
WHERE rn > 1
SELECT *
FROM t_duplicate
WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE )
AS
(
SELECT Min(Id) ID TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
FROM T1
GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
HAVING Count(Id) > 1
)
DELETE FROM T1
WHERE ID IN (
SELECT T1.Id
FROM T1
INNER JOIN Duplicates
ON T1.TYPE_INT = Duplicates.TYPE_INT
AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE
AND T1.NAME = Duplicates.NAME
AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE
AND T1.Id <> Duplicates.ID
)

Avoid duplicates in INSERT INTO SELECT query in SQL Server

I have the following two tables:
Table1
----------
ID Name
1 A
2 B
3 C
Table2
----------
ID Name
1 Z
I need to insert data from Table1 to Table2. I can use the following syntax:
INSERT INTO Table2(Id, Name) SELECT Id, Name FROM Table1
However, in my case, duplicate IDs might exist in Table2 (in my case, it's just "1") and I don't want to copy that again as that would throw an error.
I can write something like this:
IF NOT EXISTS(SELECT 1 FROM Table2 WHERE Id=1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
ELSE
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1 WHERE Table1.Id<>1
Is there a better way to do this without using IF - ELSE? I want to avoid two INSERT INTO-SELECT statements based on some condition.
Using NOT EXISTS:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS(SELECT id
FROM TABLE_2 t2
WHERE t2.id = t1.id)
Using NOT IN:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE t1.id NOT IN (SELECT id
FROM TABLE_2)
Using LEFT JOIN/IS NULL:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
Of the three options, the LEFT JOIN/IS NULL is less efficient. See this link for more details.
In MySQL you can do this:
INSERT IGNORE INTO Table2(Id, Name) SELECT Id, Name FROM Table1
Does SQL Server have anything similar?
I just had a similar problem, the DISTINCT keyword works magic:
INSERT INTO Table2(Id, Name) SELECT DISTINCT Id, Name FROM Table1
I was facing the same problem recently...
Heres what worked for me in MS SQL server 2017...
The primary key should be set on ID in table 2...
The columns and column properties should be the same of course between both tables. This will work the first time you run the below script. The duplicate ID in table 1, will not insert...
If you run it the second time, you will get a
Violation of PRIMARY KEY constraint error
This is the code:
Insert into Table_2
Select distinct *
from Table_1
where table_1.ID >1
Using ignore Duplicates on the unique index as suggested by IanC here was my solution for a similar issue, creating the index with the Option WITH IGNORE_DUP_KEY
In backward compatible syntax
, WITH IGNORE_DUP_KEY is equivalent to WITH IGNORE_DUP_KEY = ON.
Ref.: index_option
From SQL Server you can set a Unique key index on the table for (Columns that needs to be unique)
A little off topic, but if you want to migrate the data to a new table, and the possible duplicates are in the original table, and the column possibly duplicated is not an id, a GROUP BY will do:
INSERT INTO TABLE_2
(name)
SELECT t1.name
FROM TABLE_1 t1
GROUP BY t1.name
In my case, I had duplicate IDs in the source table, so none of the proposals worked. I don't care about performance, it's just done once.
To solve this I took the records one by one with a cursor to ignore the duplicates.
So here's the code example:
DECLARE #c1 AS VARCHAR(12);
DECLARE #c2 AS VARCHAR(250);
DECLARE #c3 AS VARCHAR(250);
DECLARE MY_cursor CURSOR STATIC FOR
Select
c1,
c2,
c3
from T2
where ....;
OPEN MY_cursor
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
WHILE ##FETCH_STATUS = 0
BEGIN
if (select count(1)
from T1
where a1 = #c1
and a2 = #c2
) = 0
INSERT INTO T1
values (#c1, #c2, #c3)
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
END
CLOSE MY_cursor
DEALLOCATE MY_cursor
I used a MERGE query to fill a table without duplications.
The problem I had was a double key in the tables ( Code , Value ) ,
and the exists query was very slow
The MERGE executed very fast ( more then X100 )
examples for MERGE query
For one table it works perfectly when creating one unique index from multiple field. Then simple "INSERT IGNORE" will ignore duplicates if ALL of 7 fields (in this case) will have SAME values.
Select fields in PMA Structure View and click Unique, new combined index will be created.
A simple DELETE before the INSERT would suffice:
DELETE FROM Table2 WHERE Id = (SELECT Id FROM Table1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
Switching Table1 for Table2 depending on which table's Id and name pairing you want to preserve.

SQL Query Help

Duplicate:
How to do a Select in a Select
I have 2 tables:
TABLE1
Table1Id
TABLE2
Table2Id
Table1Id
UserId
TABLE2 has thousands of entries in it. I want to return a list of TABLE1 entries where there isn't an entry in TABLE2 for it for a particular user. So, where there isn't a foreign key entry in TABLE2. A query like:
select count(*) from TABLE1 where Table1Id not in (
select Table1Id from TABLE2 where id_user = 1)
However, that query runs very slowly. What would be the most efficient way of getting the results I require?
There is a similar question
I think it would be better
SELECT COUNT(*)
FROM TABLE1
WHERE NOT EXISTS (SELECT Table1Id FROM TABLE2 WHERE TABLE2.Table1Id = TABLE1.Table1Id AND UserID = 1)
I would check the indexes also, as ck suggested
What about
select Table1Id from TABLE1
minus
select Table1Id from TABLE2 where id_user = 1
I am not sure, it MsSql support minus. If not, you should try a correlated subquery.
You can also use the 'EXCEPT/MINUS' intersect to get only differences between the two tables as long as the selection returns the same field types/order.
SELECT TABLE1ID
FROM TABLE1
EXCEPT -- or MINUS in Oracle
SELECT TABLE1ID
FROM TABLE2
WHERE USER_ID = 1
See How to do a Select in a Select
Also, make sure that any fields you are querying have a suitable index.