SQL Update with need to order in subquery - sql

Yes, SQL UPDATE can be used with order by in a SELECT-subquery.
Anybody out there who has a workaround for following issue:
From time to time a program generates data errors in TABLE1 (we are not owner of that code but need to use the program...)
We use a trigger that protocols all changes to an AUDIT table.
We can find the error situation (and the correct old value) with following select:
select top 1 audit.OldValue
from TABLE1
left join AUDIT on AUDIT.Table1_ID = TABLE1.ID
where <...some conditions...>
order by AUDIT.UpdateDate desc
As there are several changes logged, we only need the LAST change (order by updatedate and then taking TOP 1)
We would correct the data error, if we could use the UPDATE command like
Update TABLE1
set VALUE =
( select top 1 audit.OldValue
from TABLE1
left join AUDIT on AUDIT.Table1_ID = TABLE1.ID
where <...some conditions...>
order by AUDIT.UpdateDate desc )
where TABLE1.ID = AUDIT.Table1_ID
BUT: you cannot use the order by in a subquery...

Instead of top 1 use ROW_NUMBER() and keep only rows where it is =1
something like:
Update t set Value = A.OldValue
FROM Table1 t
inner join (
select ROW_NUMBER() OVER (PARTITION BY UserCode ORDER BY MEASUREDATE DESC) N, Id Table1_ID, *
from AUDIT
) a on a.Table1_ID = t.ID
where <...some conditions...>
AND (a.N=1)
You will apply all updates to all records with only one update (pay attention to <...some conditions...>)

I changed the UPDATE-command to:
Update TABLE1
set VALUE = CORRECT_VALUES.OldValue
from
( select top 1 audit.OldValue OldValue, AUDIT.Table1_ID Table1_ID
from TABLE1
left join AUDIT on AUDIT.Table1_ID = TABLE1.ID
where <...some conditions...>
order by AUDIT.UpdateDate desc ) as CORRECT_VALUES
where TABLE1.ID = CORRECT_VALUES.Table1_ID
That works fine.
And as this only changes 1 entry (due to TOP 1) we have to use a loop:
Declare #i int = 0
Declare #NUMBER int = (select count(*) from
(select distinct AUDIT.Table1_ID
from TABLE1
left join AUDIT on AUDIT.Table1_ID = TABLE1.ID
where <...some conditions...>
) )
while #i < #NUMBER
BEGIN
SET #i = #i + 1
<... past the whole UPDATE command here ...>
END
Works fine for all records that have to be updated.
Be aware of the distict as there may be lots of changes for the same ID and you only need to change the record ones.

Related

SQL Case Condition On Inner Join

I am currently trying to join a table to itself to check if for one email there exist two or more Ids.
I am trying to join my table with itself on its email. I then wanted to query my table with a case condition saying if the count of the email in the nested query > 1 then select the latest modified record in the outer table.
SELECT *
FROM table1 <-- outer table
WHERE email IN
(SELECT email, COUNT(*)
FROM table1 as src
INNER JOIN table1 ON src.Email = table1.Email AND src.Id = table1.id
GROUP BY src.Email)
How can I write a query to say if the count for the given email is greater than 1 then select the latest record from the outer table?
Why would you go through all that trouble? How about just selecting the last modified record:
select t1.*
from table1 t1
where t1.modified_dt = (select max(tt1.modified_dt)
from table1 tt1
where tt1.email = t1.email
);
Another way to do it using window functions:
DECLARE #Tab TABLE (ID INT, Email VARCHAR(100), LastModified DATE)
INSERT #Tab
VALUES (1,'testemail#none.com','2019-12-01'),
(2,'testemail#none.com','2019-11-19'),
(3,'otheremail#none.com','2019-12-15')
SELECT *
FROM(
SELECT ROW_NUMBER() OVER(PARTITION BY t.Email ORDER BY t.LastModified DESC) rn, t.*
FROM #Tab t
) t2
WHERE t2.rn = 1
If by latest you mean the latest id number (the maximum number) then this should help you
With cte AS
(
SELECT email,
COUNT(id) OVER (PARTITION BY email) AS CountOfIDs,
ROW_NUMBER() OVER (PARITION BY email ORDER BY ID DESC) AS IdIndex
FROM table1
)
SELECT *
FROM cte
WHERE CountOfIDs > 1 AND IdIndex = 1

SQL carry forward non-null value within groups

The following thread (SQL QUERY replace NULL value in a row with a value from the previous known value) was very helpful to carry forward non-null values, but I'm can't figure out how to add a grouping column.
For example, I would like to do the following:
Example Data
Here is how I would have liked to code it:
UPDATE t1
SET
#n = COALESCE(number, #n),
number = COALESCE(number, #n)
GROUP BY grp
I know this isn't possible, and have seen several solutions that rely on inner joins, but those examples focus on aggregation rather than carrying forward values. For example: SQL Server Update Group by.
My attempt to make this work is to do something like the following:
UPDATE t1
SET
#n = COALESCE(number, #n),
number = COALESCE(number, #n)
FROM t1
INNER JOIN (
-- Lost on what to put in the inner join...
SELECT grp, COUNT(*) FROM t1 GROUP BY grp
) t2
on t1.grp = t2.grp
I think you can do what you want with a correlated subquery:
UPDATE t1
SET number = (SELECT TOP (1) tt1.number
FROM t tt1
WHERE tt1.grp = t1.grp AND tt1.? <= t1.? AND tt1.number IS NOT NULL
ORDER BY t1.? DESC
)
FROM t1
WHERE t1.number IS NULL;
The ? is for the column that specifies "forward" in your expression "carry forward".

Adding daily changes to database table

I am trying to build a simple database that keeps track of any changes to a users location attribute. Each day I generate the current information of User,Date,Location and upload to a temporary table in sql server. I am trying to figure out the correct sql to query for new users, modified users and deleted users.
Finding new users is easy with:
SELECT table1.UserGuid,table1.Location
FROM table1
WHERE table1.UserGuid NOT IN
(
SELECT DISTINCT table2.UserGuid
FROM table2
)
The problem I am having is finding modified locations and deleted users.
For modified users I am trying to return users where their last location in the database doesn't match the current location in the daily temp database. This is what i have but i don't think it is correct:
SELECT table1.UserGuid,table1.Location
FROM table1
WHERE EXISTS
(
SELECT TOP 1 table2.UserGuid,table2.Location
FROM table2
WHERE (table2.UserGuid = table1.UserGuid) AND (table2.Location != table1.Location)
ORDER BY table2.Date DESC
)
For deleted users, I am trying the following sql to identify any Users in the main table that doesn't exist in the daily temp table and don't have a location of deleted. (if this returns any deleted users then I add them to the main table with a location of deleted so they are not returned the next time)
SELECT table2.UserGuid,table2.Location
FROM table2
WHERE table2.UserGuid NOT IN
(
SELECT UserGuid
FROM table1
)
AND table2.Location != 'deleted'
after I run all 3 queries to find the adds, modifications and deletes I add them to the main table along with the current date and repeat the next day. So the main table has 3 columns (UserGuid, Date, Location) and new rows get added each day with changed information. So far my New user sql is the only one working reliably. Is there an easier way to do this?
So I think this captures all your requirements.
Select
table1.*,
case when table2.userguid is null then 'INSERT'
when table1.userguid is null and table2.location != 'deleted' then 'DELETE'
when table1.location != table2.location then 'UPDATE'
from table1
full join (select max(date) as lastEntry, userGuid from Table2) lastRecords
inner join table2 on table2.date = lastRecords.lastEntry and table2.userGuid = lastRecords.userGuid
on lastRecords.userguid = table1.userguid
For your second query try:
SELECT table1.UserGuid,table1.Location
FROM table1
WHERE table1.UserGuid IN
(
SELECT table2.UserGuid
FROM table2
WHERE table2.UserGuid = table2.UserGuid AND table2.Location <> table2.Location
)
I tend to use EXISTS for these sorts of checks
--INSERTS
SELECT table1.UserGuid,table1.Location
FROM table1
WHERE NOT EXISTS (SELECT 1 FROM table2 WHERE table2.UserGuid = table1.UserGuid)
UNION ALL
--UPDATES
SELECT table1.UserGuid,table1.Location
FROM table1
WHERE EXISTS
(
SELECT 1 FROM table2
WHERE table2.UserGuid = table1.UserGuid
AND ISNULL(table2.Location,'') <> ISNULL(table1.Location,'')
)
UNION ALL
--DELETES
SELECT table2.UserGuid,table2.Location
FROM table2
WHERE NOT EXISTS (SELECT 1 FROM table1 WHERE table2.UserGuid = table1.UserGuid)
I included ISNULL checks in the event your location could be null; they're not needed if that's not possible.

Simple update statement so that all rows are assigned a different value

I'm trying to set a column in one table to a random foreign key for testing purposes.
I attempted using the below query
update table1 set table2Id = (select top 1 table2Id from table2 order by NEWID())
This will get one table2Id at random and assign it as the foreign key in table1 for each row.
It's almost what I want, but I want each row to get a different table2Id value.
I could do this by looping through the rows in table1, but I know there's a more concise way of doing it.
On some test table my end your original plan looks as follows.
It just calculates the result once and caches it in a sppol then replays that result. You could try the following so that SQL Server sees the subquery as correlated and in need of re-evaluating for each outer row.
UPDATE table1
SET table2Id = (SELECT TOP 1 table2Id
FROM table2
ORDER BY Newid(),
table1.table1Id)
For me that gives this plan without the spool.
It is important to correlate on a unique field from table1 however so that even if a spool is added it must always be rebound rather than rewound (replaying the last result) as the correlation value will be different for each row.
If the tables are large this will be slow as work required is a product of the two table's rows (for each row in table1 it needs to do a full scan of table2)
I'm having another go at answering this, since my first answer was incomplete.
As there is no other way to join the two tables until you assign the table2_id you can use row_number to give a temporary key to both table1 and table2.
with
t1 as (
select row_number() over (order by table1_id) as row, table1_id
from table1 )
,
t2 as (
select row_number() over (order by NEWID()) as row, table2_id
from table2 )
update table1
set table2_id = t2.table2_id
from t1 inner join t2
on t1.row = t2.row
select * from table1
SQL Fiddle to test it out: http://sqlfiddle.com/#!6/bf414/12
Broke down and used a loop for it. This worked, although it was very slow.
Select *
Into #Temp
From table1
Declare #Id int
While (Select Count(*) From #Temp) > 0
Begin
Select Top 1 #Id = table1Id From #Temp
update table1 set table2Id = (select top 1 table2Id from table2 order by NEWID()) where table1Id = #Id
Delete #Temp Where table1Id = #Id
End
drop table #Temp
I'm going to assume MS SQL based on top 1:
update table1
set table2Id =
(select top 1 table2Id from table2 tablesample(1 percent))
(sorry, not tested)

How to remove duplicate records in a table?

I've got a table in a testing DB that someone apparently got a little too trigger-happy on when running INSERT scripts to set it up. The schema looks like this:
ID UNIQUEIDENTIFIER
TYPE_INT SMALLINT
SYSTEM_VALUE SMALLINT
NAME VARCHAR
MAPPED_VALUE VARCHAR
It's supposed to have a few dozen rows. It has about 200,000, most of which are duplicates in which TYPE_INT, SYSTEM_VALUE, NAME and MAPPED_VALUE are all identical and ID is not.
Now, I could probably make a script to clean this up that creates a temporary table in memory, uses INSERT .. SELECT DISTINCT to grab all the unique values, TRUNCATE the original table and then copy everything back. But is there a simpler way to do it, like a DELETE query with something special in the WHERE clause?
You don't give your table name but I think something like this should work. Just leaving the record which happens to have the lowest ID. You might want to test with the ROLLBACK in first!
BEGIN TRAN
DELETE <table_name>
FROM <table_name> T1
WHERE EXISTS(
SELECT * FROM <table_name> T2
WHERE
T1.TYPE_INT = T2.TYPE_INT AND
T1.SYSTEM_VALUE = T2.SYSTEM_VALUE AND
T1.NAME = T2.NAME AND
T1.MAPPED_VALUE = T2.MAPPED_VALUE AND
T2.ID > T1.ID
)
SELECT * FROM <table_name>
ROLLBACK
here is a great article on that: Deleting duplicates, which basically uses this pattern:
WITH q AS
(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
FROM t_duplicate d
)
DELETE
FROM q
WHERE rn > 1
SELECT *
FROM t_duplicate
WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE )
AS
(
SELECT Min(Id) ID TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
FROM T1
GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
HAVING Count(Id) > 1
)
DELETE FROM T1
WHERE ID IN (
SELECT T1.Id
FROM T1
INNER JOIN Duplicates
ON T1.TYPE_INT = Duplicates.TYPE_INT
AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE
AND T1.NAME = Duplicates.NAME
AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE
AND T1.Id <> Duplicates.ID
)