I'm updating some columns and incrementing a counter whenever I change a row.
The update statement is the result of a join (simplified code below):
update #to
set
t.num += 1
from #to t
join #source s
on t.id = s.id
When I update one row more than once, the columns hold the last value (as they should), but the counter is only incremented once. So if the join returns (id = 1, id = 1), my table holds (id = 1, num = 1) rather than (id = 1, num = 2).
There are ways to get around this (another join on a select count, for example), but I wonder if there's a way to keep it simple.
There's not really a way to get the count without getting the count. Here is one way to do that (and still only referencing #source once):
;WITH s AS
(
SELECT id, c = COUNT(*)
FROM #source
GROUP BY id
)
UPDATE t SET t.num += s.c
FROM #to AS t
INNER JOIN s
ON t.id = s.id;
Hopefully the rows that end up in #source are already filtered down to only those that will also be found in #to. If not, you can add more conditions to the initial CTE.
Related
Hi all … Wonder if anyone out there can help me with this one please.
I am running a query to update product categories against sales lines and need to back file a few million records so I wrote the query below to run for a specific order ID
DECLARE #ID INT
SET #ID = 659483
UPDATE [TradeSpace].[TradeSpace].[dbo].[SalesLine]
SET [ProductCategory] = [curSync].[pc_Cat]
FROM (SELECT [SC_ID],
[pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines]
INNER
JOIN [MW_MereSys].[dbo].[MWProductCats]
ON [MWSLines].[pc_catref] = [MWProductCats].[pc_catref]
WHERE [sh_id] = #ID
) AS [curSync]
WHERE [SalesLine].[slID] = [curSync].[sc_id]
AND [salesline].[soid] = #ID
The sub SELECT runs in less than one second but the update has yet to finished (have left it for an hour at most). Indexes exist for [slID] and [soid] .. a manual update for one line takes less than one seconds but run like this (10 lines) is desperately slow.
Does anybody have any clues please. I've written plenty of queries like this and never had a problem … stumped :(
Your query rewritten with no changes:
UPDATE s SET
ProductCategory = curSync.pc_Cat
FROM TradeSpace.TradeSpace.dbo.SalesLine s
INNER JOIN
(
SELECT [SC_ID], [pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines] l
INNER JOIN [MW_MereSys].[dbo].[MWProductCats] c ON l.[pc_catref] = c.[pc_catref]
WHERE [sh_id] = #ID
) AS [curSync]
on s.[slID] = [curSync].[sc_id]
WHERE s.[soid] = #ID
Are your sure everything is correct here? That single row from SalesLine always matches only one row from subquery?
Try this then. Will fail if this is not true. Original query would silently update same row with different values in same situation.
UPDATE s SET
ProductCategory = (
SELECT [pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines] l
INNER JOIN [MW_MereSys].[dbo].[MWProductCats] c ON l.[pc_catref] = c.[pc_catref]
WHERE [sh_id] = #ID
AND [sc_id] = s.[slID]
)
FROM TradeSpace.TradeSpace.dbo.SalesLine s
WHERE s.[soid] = #ID
And please check estimated execution plan. Does it hit indexes?
We need other detail like a I mention in comments.
Your update is slow because of very high cardinality estimate when update table is join with Sub query result.
It may be because of wrong join and where predicate.
you can put the sub query result in #Temp table and try.Also you can create same index in #temp table.
DECLARE #ID INT
SET #ID = 659483
create #temp table([SC_ID] int,[pc_cat] int)
insert into #temp
SELECT [SC_ID],
[pc_cat]
FROM [MW_MereSys].[dbo].[MWSLines]
INNER JOIN [MW_MereSys].[dbo].[MWProductCats]
ON [MWSLines].[pc_catref] = [MWProductCats].[pc_catref]
WHERE [sh_id] = #ID
UPDATE SalesLine
SET [ProductCategory] = [curSync].[pc_Cat]
FROM [TradeSpace].[TradeSpace].[dbo].[SalesLine] as SalesLine
inner join #temp AS [curSync]
WHERE [SalesLine].[slID] = [curSync].[sc_id]
AND [salesline].[soid] = #ID
drop table #temp
I am using the below query to update one column based on the conditions it is specified. I am using "inner join" but it is taking more than 15 seconds to run the query even if it has to update no records(0 records).
UPDATE CONFIGURATION_LIST
SET DUPLICATE_SERIAL_NUM = 0
FROM CONFIGURATION_LIST
INNER JOIN (SELECT DISTINCT APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER, COUNT(*) AS NB
FROM CONFIGURATION_LIST
WHERE
PLANT = '0067'
AND APPLIED_SERIAL_NUMBER IS NOT NULL
AND APPLIED_SERIAL_NUMBER !=''
AND DUPLICATE_SERIAL_NUM = 1
GROUP BY
APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER
HAVING
COUNT(*) = 1) T2 ON T2.APPLIED_SERIAL_NUMBER = CONFIGURATION_LIST.APPLIED_SERIAL_NUMBER
AND T2.APPLIED_MAT_CODE = CONFIGURATION_LIST.APPLIED_MAT_CODE
WHERE
CONFIGURATION_LIST.PLANT = '0067'
AND DUPLICATE_SERIAL_NUM = 1
The index is there with APPLIED_SERIAL_NUMBER and APPLIED_MAT_CODE and fragmentation is also fine.
Could you please help me on the above query performance.
First, you don't need the DISTINCT when using GROUP BY. SQL Server probably ignores it, but it is a bad idea anyway:
UPDATE CONFIGURATION_LIST
SET DUPLICATE_SERIAL_NUM = 0
FROM CONFIGURATION_LIST INNER JOIN
(SELECT APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER, COUNT(*) AS NB
FROM CONFIGURATION_LIST cl
WHERE cl.PLANT = '0067' AND
cl.APPLIED_SERIAL_NUMBER IS NOT NULL AND
cl.APPLIED_SERIAL_NUMBER <> ''
cl.DUPLICATE_SERIAL_NUM = 1
GROUP BY cl.APPLIED_MAT_CODE, cl.APPLIED_SERIAL_NUMBER
HAVING COUNT(*) = 1
) T2
ON T2.APPLIED_SERIAL_NUMBER = CONFIGURATION_LIST.APPLIED_SERIAL_NUMBER AND
T2.APPLIED_MAT_CODE = CONFIGURATION_LIST.APPLIED_MAT_CODE
WHERE CONFIGURATION_LIST.PLANT = '0067' AND
DUPLICATE_SERIAL_NUM = 1;
For this query, you want the following index: CONFIGURATION_LIST(PLANT, DUPLICATE_SERIAL_NUM, APPLIED_SERIAL_NUMBER, APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER).
The HAVING COUNT(*) = 1 suggests that you might really want NOT EXISTS (which would normally be faster). But you don't really explain what the query is supposed to be doing, you only say that this code is slow.
Looks like you're checking the table for rows that exist in the same table with the same values, and if not, update the duplicate column to zero. If your table has a unique key (identity field or composite key), you could do something like this:
UPDATE C
SET C.DUPLICATE_SERIAL_NUM = 0
FROM
CONFIGURATION_LIST C
where
not exists (
select
1
FROM
CONFIGURATION_LIST C2
where
C2.APPLIED_SERIAL_NUMBER = C.APPLIED_SERIAL_NUMBER and
C2.APPLIED_MAT_CODE = C.APPLIED_MAT_CODE and
C2.UNIQUE_KEY_HERE != C.UNIQUE_KEY_HERE
) and
C.PLANT = '0067' and
C.DUPLICATE_SERIAL_NUM = 1
I will try with a select first:
select APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER, count(*) as n
from CONFIGURATION_LIST cl
where
cl.PLANT='0067' and
cl.APPLIED_SERIAL_NUMBER IS NOT NULL and
cl.APPLIED_SERIAL_NUMBER <> ''
group by APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER;
How many rows do you get with this and how long does it take?
If you remove your DUPLICATE_SERIAL_NUM column from your table it might be very simple. The DUPLICATE_SERIAL_NUM suggests that you are searching for duplicates. As you count your rows you could introduce a simple table that contains the counts:
create table CLCOUNT ( N int unsigned, C int /* or what APPLIED_MAT_CODE is */, S int /* or what APPLIED_SERIAL_NUMBER is */, PLANT char(20) /* or what PLANT is */, index unique (C,S,PLANT), index(PLANT,N));
insert into CLCOUNT select count(*), cl.APPLIED_MAT_CODE, cl.APPLIED_SERIAL_NUMBER, cl.PLANT
from CONFIGURATION_LIST cl
where
cl.PLANT='0067' and
cl.APPLIED_SERIAL_NUMBER IS NOT NULL and
cl.APPLIED_SERIAL_NUMBER <> ''
group by APPLIED_MAT_CODE, APPLIED_SERIAL_NUMBER;
How long does this take?
Now you can simply select * from CLCOUNT where PLANT='0067' and N=1;
This is all far from being perfect. But you should be able to analyze (EXPLAIN SELECT ...) your queries and find why it takes so long.
I am going to explain again what I am trying to do in hopes that you can help.
Table 1 has 4061 rows with columns that include
[Name],[Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone]
and 20 other columns. Table 1 is data that needs to be deidentified. Table 1 has 1534 distinct [Name] rows out of 4061 rows total.
Table 2 has auto generated data which includes the same columns. I would like to replace the above mentioned columns in table 1 with data from table 2. I want to select distinct based on [Name] from table one and then [Name],[Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone] with a new set of distinct data from table 2.
I do not want to just update each row with a new address as that will screw up the data consistency. By replacing only distinct this will allow me to preserve the data consistency while changing the row data in table 1. When I am done I would like to have 1534 distinct new de-identified [Name] [Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone] in table 1 from table 2.
You would use join in the update. You can generate a join key for 1500 rows using row_number():
update toupdate
set t.address = f.address
from (select t.*, row_number() over (order by newid()) as seqnum
from table t
) toupdate join
(select f.*, row_number() over (order by newid()) as seqnum
fake f
) f
on toupdate.seqnum = f.seqnum and t.seqnum <= 1500;
Here is how I ended up doing it.
First I ran a statement to select distinct and inserted it into a table.
Select Distinct [Name],[Address1],[City],[State],[Zip],[Country],[Phone]
INTO APMAST2
FROM APMAST
I then added name2 column in APMAST2 and used a statement to create a sequential id field into APMAST2.
DECLARE #id INT
SET #id = 0
UPDATE APMAST2
SET #id = id = #id + 1
GO
Now I have my distinct info plus a blank name field and a sequential ID field in APMAST2. Now I can join this date with my fakenames table which I generated from. HERE using their bulk tool.
Using a Join Statement I joined my fake data with APMAST2
Update dbo.APMAST2
SET dbo.APMAST2.Name = dbo.fakenames.company,
dbo.APMAST2.Address1 = dbo.fakenames.streetaddress,
dbo.APMAST2.City = dbo.fakenames.City,
dbo.APMAST2.State = dbo.fakenames.State,
dbo.APMAST2.Zip = dbo.fakenames.zipcode,
dbo.APMAST2.Country = dbo.fakenames.countryfull,
dbo.APMAST2.Phone = dbo.fakenames.telephonenumber
FROM
dbo.APMAST2
INNER JOIN
dbo.fakenames
ON dbo.fakenames.number = dbo.APMAST2.id
Now I have my fake data loaded but I kept my original Name field so I could reload this data into my full table ARMAST so now I can do a join between ARMAST2 and ARMAST.
Update dbo.APMAST
SET dbo.APMAST.Name = dbo.APMAST2.Name,
dbo.APMAST.Address1 = dbo.APMAST2.Address1,
dbo.APMAST.City = dbo.APMAST2.City,
dbo.APMAST.State = dbo.APMAST2.State,
dbo.APMAST.Zip = dbo.APMAST2.Zip,
dbo.APMAST.Country = dbo.APMAST2.Country,
dbo.APMAST.Phone = dbo.APMAST2.Phone
FROM
dbo.APMAST
INNER JOIN
dbo.apmast2
ON dbo.apmast.name = dbo.APMAST2.name2
Now my original table has all fake data in it but it keeps the integrity it had , well most of it, so the data looks good when reported on but is de-identified. You can now remove APMAST2 or keep it if you need to match this with other data later on. I know this is long and I am sure there is a better way to do it but this is how I did it, suggestions welcome.
I'm trying to migrate some tables into an existing table, I need to perform the updates only where DET_ATTACHMENT_ID equals DET_ATTACHMENT.ID, here's the query I have so far.
UPDATE DET_ATTACHMENT
SET attachment_type = 'LAB', -- being added by the query, to replace the table difference
payer_criteria_id = (
SELECT PAYER_CRITERIA_ID
FROM DET_LAB_ATTACHMENT
WHERE DET_LAB_ATTACHMENT.DET_ATTACHMENT_ID = DET_ATTACHMENT.ID)
WHERE exists(
SELECT DET_ATTACHMENT_ID
FROM DET_ATTACHMENT
JOIN DET_LAB_ATTACHMENT ON (ID = DET_ATTACHMENT_ID)
WHERE DET_ATTACHMENT_ID = DET_ATTACHMENT.ID
the problem with the existing query is that it's setting every row to have an attachment_type of "LAB", and nulling out the payer_criteria_id where it didn't match. What am I doing wrong?
The problem might be that your exists(...) predicate always evaluates to true, thus making the update run for all rows of det_attachment. Try it this way:
UPDATE DET_ATTACHMENT X
SET X.attachment_type = 'LAB',
X.payer_criteria_id = (
SELECT C.PAYER_CRITERIA_ID
FROM DET_LAB_ATTACHMENT C
WHERE C.DET_ATTACHMENT_ID = X.ID
)
WHERE
exists(
SELECT 1
FROM DET_ATTACHMENT A
JOIN DET_LAB_ATTACHMENT B
ON B.DET_ATTACHMENT_ID = A.ID
where B.det_attachment_id = X.id
)
;
I've created a junction table like this one:
http://imageshack.us/scaled/landing/822/kantotype.png
I was trying to figure out a query that could able to select some rows - based on the PokémonID - and then updating only the first or second row after the major "filtering".
For example:
Let's suppose that I would like to change the value of the TypeID from the second row containing PokémonID = 2. I cannot simply use UPDATE KantoType SET TypeID = x WHERE PokémonID = 2, because it will change both rows!
I've already tried to use subqueries containing IN,EXISTS and LIMIT, but with no success.
Its unclear what are your trying to do. However, you can UPDATE with JOIN like so:
UPDATE
SET k1.TypeID = 'somethng' -- or some value from k2
FROM KantoType k1
INNER JOIN
(
Some filtering and selecting
) k2 ON k1.PokémonID = k2.PokémonID
WHERE k1.PokémonID = 2;
Or: if you want to UPDATE only the two rows that have PokémonID = 2 you can do this:
WITH CTE
AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY TypeID) rownum
FROM KantoType
WHERE PokemonID = 2
)
UPDATE c
SET c.TypeID = 5
FROM CTE c
WHERE c.rownum = 1;
SQL Fiddle Demo
I can suggest something like this if you just need to update a single line in your table:
UPDATE kantotype
SET
type = 2
WHERE pokemon = 2
AND NOT EXISTS (SELECT * FROM kantotype k2
WHERE kantotype.type > k2.type
AND kantotype.pokemon = k2.pokemon)
It would be easier to get the first or last item of the table if you had unique identifier field in your table.
Not sure even if you are trying to update the row with PokemenID =2 by doing a major filtering on TypeID... So just out of assumptiong (big one), you can give a try on Case
UPDATE yourtable a
LEFT JOIN youtable b on a.pokeid = b.pokeid
SET a.typeid = (CASE
WHEN a.typeid < b.typeid THEN yourupdatevalue
WHEN a.typeid > b.typeid THEN someothervalue
ELSE a.typeid END);
If you know the pokemon ID and the type id then just add both to the where clause of your query.
UPDATE KantoType
SET TypeID = x
WHERE PokémonID = 2
AND TypeID=1
If you don't know the type ID, then you need to provide more information about what you're trying to accomplish. It's not clear why you don't have this information.
Perhaps think about what is the unique identifier in your data set.