SQL Update with join - sql

Is there any other faster way to do an update besides with a join? Here's my query but it's a bit slow:
UPDATE #user_dupes
SET ExternalEmail = ud2.Email
FROM #user_dupes ud1
INNER JOIN(
SELECT Email, UserName
FROM #user_flat_emailtable_dupes
WHERE EmailType = 2
AND Email IS NOT NULL AND LEN(Email) > 0
) ud2
ON ud1.UserName = ud2.UserName
Thanks for any ideas

If you are using SQL Server, you were almost there. It's just a little fix:
UPDATE ud1 --little fix here!
SET ExternalEmail = ud2.Email
FROM #user_dupes ud1
INNER JOIN
(
SELECT Email, UserName
FROM #user_flat_emailtable_dupes
WHERE EmailType = 2
AND Email IS NOT NULL AND LEN(Email) > 0
) ud2
ON ud1.UserName = ud2.UserName

A couple of changes, on top of what #Adrian said...
UPDATE
ud1 -- #Adrian's change. Update the instance that you have already aliased.
SET
externalEmail = ud2.Email
FROM
#user_dupes AS ud1
INNER JOIN
#user_flat_emailtable_dupes AS ud2
ON ud1.UserName = ud2.UserName
WHERE
ud2.EmailType = 2 -- Removed sub-query, for layout, doubt it will help performance
AND ud2.Email IS NOT NULL
AND ud2.Email <> '' -- Removed the `LEN()` function
But possibly the most important past is to ensure you have indexes. The JOIN is necessary for this logic (or correlated sub-queries, etc), so you want the join to be performant.
An Index of (UserName) on #user_dupes, and an Index of (EmailType, Email, UserName) on #user_flat_emailtable_dupes. (This assumes ud2 is the smaller table, after the filtering)
With the indexes as specified, the change from LEN(Email) > 0 to Email <> '' may allow an index seek rather than scan. The larger your tables the more apparent this will be.

I believe this query will do the same thing (although you'd have to be sure to form #user_flat_emailtable_dupes with no duplicate usernames). I haven't checked to see if they have different execution plans. It looks like you're refining junky input, I mention this partly because I do a lot of that and find MERGE useful (all the more useful for me since I don't know how UPDATE FROM works). And partly because I hadn't ever used MERGE with variables. It appears to be the case that at least the target table must be aliased, or the parser decides #ud1 is a scalar variable and it breaks.
MERGE #user_dupes AS ud1
USING #user_flat_emailtable_dupes AS ud2
ON emailType = 2
AND COALESCE(ud2.email, '') <> ''
AND ud2.username = ud1.username
WHEN MATCHED THEN UPDATE
SET externalEmail = ud2.email
;

Related

How to update a single row from multiple rows with UPDATE JOIN

I am trying to use an UPDATE JOIN in SQL Server 2016 to update a single row from multiple rows.
I have a table with a bunch of audit events that looks like:
Event_Date,Event_Name,Entity_ID
2020-01-01,'USER_ROLE_CHANGED',12345
2020-01-02,'USER_ACCOUNT_ACTIVATED',12345
2020-01-03,'USER"ACCOUNT_DEACTIVATED',12345
The User table looks like this:
Employee_ID,Employee_Name,Date_Activated,Date_Deactivated,Date_Role_Assigned
12345,'Joe Cool',null,null,null
....
I want to update this record to look like this:
Employee_ID,Employee_Name,Date_Activated,Date_Deactivated,Date_Role_Assigned
12345,'Joe Cool',2020-01-02,2020-01-03,2020-01-01
.....
Here is what I tried:
update u
set u.date_role_assigned = iif(a.event_name = 'USER_ROLE_CHANGED', a.event_date, u.date_role_assigned),
u.date_activated = iif(a.event_name = 'USER_ACCOUNT_ACTIVATED', a.event_date, u.date_activated),
u.date_deactivated = iif(a.event_name = 'USER_ACCOUNT_DEACTIVATED', a.event_date, u.date_deactivated)
from dbo.[User] u
inner join dbo.AuditEventsPivot a on a.entid = u.empid
However, I actually got the following:
Employee_ID,Employee_Name,Date_Activated,Date_Deactivated,Date_Role_Assigned
12345,'Joe Cool',2020-01-02,null,null
.....
In other words, it only updated one of the columns and didn't update the other two. (Incidentally, I did verify that all three events exist in the table for the ID in question, so that's not the problem - the problem is with the way I've written the join itself).
I could solve this by splitting this into multiple queries, but I would prefer to use a single query if possible to improve performance and make the code more concise. (Please feel free to correct me if this would not, in fact, lead to a performance gain).
Is this possible to do, or am I going to have to split this into multiple queries after all?
I have seen several related questions, but none that directly address my problem; for example, this one is asking what will happen if the JOIN clause matches multiple rows. My understanding based on this (and similar Q&As) is that you can't "count" on which one it'll pick. However, I'm not asking which one will be updated, I'd like all of them to be updated.
I found a Q&A trying to do this with a subquery, but that's not what I want to do because I want to use an UPDATE JOIN rather than a subquery.
One method is pre-aggregation:
update u
set date_role_assigned = coalesce(user_role, u.date_role_assigned)
u.date_role_assigned),
date_activated = coalesce(a.user_account_activated, u.date_activated),
date_deactivated = coalesce(a.user_account_activated, u.date_deactivated)
from dbo.[User] u join
(select a.entid,
max(case when a.event_name = 'USER_ROLE_CHANGED' then a.event_date end) as user_role,
max(case when a.event_name = 'USER_ACCOUNT_ACTIVATED' then a.event_date end) as user_account_activated,
max(case when a.event_name = 'USER_ACCOUNT_DEACTIVATED' then a.event_date end) as user_account_deactivated
from dbo.AuditEventsPivot a
group by a.entid
) a
on a.entid = u.empid;

How to delete only rows that comply with condition

I have the query below and the select return the rows that I want to delete but when I ran the entire query oh boy it deleted everything from the table.
How do I change the delete to only delete the rows returned by the select after the where:
delete from StaffSecurityItems
WHERE exists (
SELECT ssi.SysSaffID
FROM StaffDeparment sd, CustomFieldValues cfv, StaffSecurityItems ssi
where cfv.SysCustomFieldID = '-9223372036854749962'
and cfv.FieldValue = 'Yes'
and sd.SysStaffRoleID =11
and cfv.SysDeparmentID = sd.SysDeparmentID
and ssi.SysSaffID = SysSaffID
and ssi.ItemName in ('EnrollNewMember','EditEnrollmentInfo')
and ssi.SysStudyID ='-9223372036854759558');
My main issues is with the delete, I need to make sure that in only deletes what is being returned by select.
If the SELECT works, why not simply replace the SELECT? What you have there is a query that will delete every row in the table StaffSecurityItems if the query within the EXISTS returns even a single row (what rows it finds is meaningless, due to a lack of a correlated query).
DELETE SD
FROM StaffDeparment sd
JOIN CustomFieldValues cfv ON cfv.SysDeparmentID = sd.SysDeparmentID
JOIN StaffSecurityItems ssi ON ssi.SysSaffID = sd.SysSaffID --guessed alias
WHERE cfv.SysCustomFieldID = '-9223372036854749962'
AND cfv.FieldValue = 'Yes'
AND sd.SysStaffRoleID =11
AND ssi.ItemName in ('EnrollNewMember','EditEnrollmentInfo')
AND ssi.SysStudyID ='-9223372036854759558'
And, as mentioned in the comments, the ANSI-92 syntax has been around for 27 years! There's no reason you shouldn't be using it: Bad habits to kick : using old-style JOINs

Difference between SQL statements

I have come across two versions of an SQLRPGLE program and saw a change in the code as below:
Before:
Exec Sql SELECT 'N'
INTO :APRFLG
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y';
After:
Exec Sql SELECT case
when T1.ISAPRV <> 'Y' then 'N'
else T1.ISAPRV
end as APRFLG
INTO :APRFLG
FROM LG751F T1
join LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
group by T1.ISAPRV;
Could you please tell me if you see any difference in how the codes would work differently? The second SQL has a group by which is supposed to be a fix to avoid -811 SQLCod error. Apart from this, do you guys spot any difference?
They are both examples of poor coding IMO.
The requirement to "remove duplicates" is often an indication of a bad statement design and/or a bad DB design.
You appear to be doing an existence check, in which case you should be making use of the EXISTS predicate.
select 'N' into :APRFLG
from sysibm.sysdummy1
where exists (select 1
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN
AND T1.ISITNO = T2.IDMDNO
WHERE
T2.IDVIN = :M_VIN
AND T1.ISAPRV <> 'Y');
As far as the original two statements, besides the group by, the only real difference is moving columns from the JOIN clause to the WHERE clause. However, the query engine in Db2 for i will rewrite both statements equivalently and come up with the same plan; since an inner join is used.
EDIT : as Mark points out, there JOIN and WHERE are the same in both the OP's statements. But I'll leave the statement above in as an FYI.
I don't find a compelling difference, other that the addition of the group by, that will have the effect of suppressing any duplicate rows that might have been output.
It looks like the developer intended for the query to be able to vary its output to be sometimes Y and sometimes N, but forgot to remove the WHERE clause that necessarily forces the case to always be true, and hence it to always output N. This kind of pattern is usually seen when the original report includes some spec like "don't include managers in the employee Sakarya report" and that then changes to "actually we want to know if the employee is a manager or not". What was a "where employee not equal manager" becomes a "case when employee is manager then... else.." but the where clause needs removing for it to have any effect
The inner keyword has disappeared from the join statement, but overall this should also be a non-op
Another option is just to use fetch first row only like this:
Exec Sql
SELECT 'N'
INTO :APRFLG
FROM LG751F T1
JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
fetch first row only;
That makes it more obvious that you only want a single row rather than trying to use grouping which necessitates the funky do nothing CASE statement. But I do like the EXISTS method Charles provided since that is the real goal, and having exists in there makes it crystal clear.
If your lead insists on GROUP BY, you can also GROUP BY 'N' and still leave out the CASE statement.

If I substitue hardcode 1=0 in a sql server query with a variable it slows way down

Testing a query using hard coded 1=0 and 1=1 values. When I substitute a variable for them the query slows way down. And suggestions?
DECLARE #BoxType int
SET #BoxType = 2
Select blah from table t
INNER JOIN table2 t2
ON (t2.blah = t.blah AND 1=1 OR t2.blah = t.blah AND 1=0)
-- very fast
rewrite using:
...
INNER JOIN table t
ON (t2.blah = t.blah AND #BoxType = 2 OR t2.blah = t.blah AND #BoxType = 1)
-- very slow
t2.blah = t.blah AND 1=0 will always be false so can be optimised out at compile time.
If you are saying that the second query is slower when #BoxType <> 1 and you are on SQL Server 2008+ you can try adding OPTION (RECOMPILE) to the query to get the same compile time simplification dependant on the actual value of the variable.
The comments have kind of touched on this. When you say “Where MyField = 1” the DB does not know what rows it will find so has to actually search for them. If there is an index on the field it may be reasonably fast. If there is no index and the table need to be scanned could be very long.
But when you say “Where 1=0” The database knows just from the statement that the condition will always be false and no record will be found so will be blinding fast because it doesn’t even need to read the table to return to you an empty result set

Optimize query in TSQL 2005

I have to optimize this query can some help me fine tune it so it will return data faster?
Currently the output is taking somewhere around 26 to 35 seconds. I also created index based on attachment table following is my query and index:
SELECT DISTINCT o.organizationlevel, o.organizationid, o.organizationname, o.organizationcode,
o.organizationcode + ' - ' + o.organizationname AS 'codeplusname'
FROM Organization o
JOIN Correspondence c ON c.organizationid = o.organizationid
JOIN UserProfile up ON up.userprofileid = c.operatorid
WHERE c.status = '4'
--AND c.correspondence > 0
AND o.organizationlevel = 1
AND (up.site = 'ALL' OR
up.site = up.site)
--AND (#Dept = 'ALL' OR #Dept = up.department)
AND EXISTS (SELECT 1 FROM Attachment a
WHERE a.contextid = c.correspondenceid
AND a.context = 'correspondence'
AND ( a.attachmentname like '%.rtf' or a.attachmentname like '%.doc'))
ORDER BY o.organizationcode
I can't just change anything in db due to permission issues, any help would be much appreciated.
I believe your headache is coming from this part in specific...like in a where exists can be your performance bottleneck.
AND EXISTS (SELECT 1 FROM Attachment a
WHERE a.contextid = c.correspondenceid
AND a.context = 'correspondence'
AND ( a.attachmentname like '%.rtf' or a.attachmentname like '%.doc'))
This can be written as a join instead.
SELECT DISTINCT o.organizationlevel, o.organizationid, o.organizationname, o.organizationcode,
o.organizationcode + ' - ' + o.organizationname AS 'codeplusname'
FROM Organization o
JOIN Correspondence c ON c.organizationid = o.organizationid
JOIN UserProfile up ON up.userprofileid = c.operatorid
left join article a on a.contextid = c.correspondenceid
AND a.context = 'correspondence'
and right(attachmentname,4) in ('.doc','.rtf')
....
This eliminates both the like and the where exists. put your where clause at the bottom.it's a left join, so a.anycolumn is null means the record does not exist and a.anycolumn is not null means a record was found. Where a.anycolumn is not null will be the equivalent of a true in the where exists logic.
Edit to add:
Another thought for you...I'm unsure what you are trying to do here...
AND (up.site = 'ALL' OR
up.site = up.site)
so where up.site = 'All' or 1=1? is the or really needed?
and quickly on right...Right(column,integer) gives you the characters from the right of the string (I used a 4, so it'll take the 4 right chars of the column specified). I've found it far faster than a like statement runs.
This is always going to return true so you can eliminate it (and maybe the join to up)
AND (up.site = 'ALL' OR up.site = up.site)
If you can live with dirty reads then with (nolock)
And I would try Attachement as a join. Might not help but worth a try. Like is relatively expensive and if it is doing that in a loop where it could it once that would really help.
Join Attachment a
on a.contextid = c.correspondenceid
AND a.context = 'correspondence'
AND ( a.attachmentname like '%.rtf' or a.attachmentname like '%.doc'))
I know there are some people on SO that insist that exists is always faster than a join. And yes it is often faster than a join but not always.
Another approach is the create a #temp table using
CREATE TABLE #Temp (contextid INT PRIMARY KEY CLUSTERED);
insert into #temp
Select distinct contextid
from atachment
where context = 'correspondence'
AND ( attachmentname like '%.rtf' or attachmentname like '%.doc'))
order by contextid;
go
select ...
from correspondence c
join #Temp
on #Temp.contextid = c.correspondenceid
go
drop table #temp
Especially if productID is the primary key or part of the primary key on correspondence creating the PK on #temp will help.
That way you can be sure that like expression is only evaluated once. If the like is the expensive part and in a loop then it could be tanking the query. I use this a lot where I have a fairly expensive core query and I need to those results to pick up reference data from multiple tables. If you do a lot of joins some times the query optimizer goes stupid. But if you give the query optimizer PK to PK then it does not get stupid and is fast. The down side is it takes about 0.5 seconds to create and populate the #temp.