SQL Update Skipping duplicates - sql

Table 1 looks like the following.
ID SIZE TYPE SERIAL
1 4 W-meter1 123456
2 5 W-meter2 123456
3 4 W-meter 585858
4 4 W-Meter 398574
As you can see. Items 1 and 2 both have the same Serial Number. I have an innerjoin update statement that will update the UniqueID on these devices based on linking their serial number to the list.
What I would like to do. Is modify by hand the items with duplicate serial numbers and scripted update the ones that are unique. Im presuming I have to reference the distinct command here somewhere buy not sure.
This is my update statement as is. Pretty simple and straight forward.
update UM00400
Set um00400.umEquipmentID = tb2.MIUNo
from UM00400 tb1
inner join AA_Meters tb2 on
tb1.umSerialNumber = tb2.Old_Serial_Num
where tb1.umSerialNumber <> tb2.New_Serial_Num

;WITH CTE
AS
(
SELECT * , rn = ROW_NUMBER() OVER (PARTITION BY SERIAL ORDER BY SERIAL)
FROM UM00400
)
UPDATE CTE
SET CTE.umEquipmentID = tb2.MIUNo
inner join AA_Meters tb2
on CTE.umSerialNumber = tb2.Old_Serial_Num
where tb1.umSerialNumber <> tb2.New_Serial_Num
AND CTE.rn = 1
This will update the 1st record of multiple records with the same SERIAL.

If i understand your question correctly below query will help you out :
;WITH CTE AS
(
// getting those serial numbers which are not duplicated
SELECT umSerialNumber,COUNT(umSerialNumber) as CountOfSerialNumber
FROM UM00400
GROUP BY umSerialNumber
HAVING COUNT(umSerialNumber) = 1
)
UPDATE A SET A.umEquipmentID = C.MIUNo
FROM UM00400 A
INNER JOIN CTE B ON A.umSerialNumber = B.umSerialNumber
INNER JOIN AA_Meters C ON A.umSerialNumber = C.Old_Serial_Num

Related

SQL Server 2014 Replace Distinct? DeIdentify Data

I am going to explain again what I am trying to do in hopes that you can help.
Table 1 has 4061 rows with columns that include
[Name],[Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone]
and 20 other columns. Table 1 is data that needs to be deidentified. Table 1 has 1534 distinct [Name] rows out of 4061 rows total.
Table 2 has auto generated data which includes the same columns. I would like to replace the above mentioned columns in table 1 with data from table 2. I want to select distinct based on [Name] from table one and then [Name],[Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone] with a new set of distinct data from table 2.
I do not want to just update each row with a new address as that will screw up the data consistency. By replacing only distinct this will allow me to preserve the data consistency while changing the row data in table 1. When I am done I would like to have 1534 distinct new de-identified [Name] [Address1],[Address2],[Address3],[City],[State],[Zip],[Country],[Phone] in table 1 from table 2.
You would use join in the update. You can generate a join key for 1500 rows using row_number():
update toupdate
set t.address = f.address
from (select t.*, row_number() over (order by newid()) as seqnum
from table t
) toupdate join
(select f.*, row_number() over (order by newid()) as seqnum
fake f
) f
on toupdate.seqnum = f.seqnum and t.seqnum <= 1500;
Here is how I ended up doing it.
First I ran a statement to select distinct and inserted it into a table.
Select Distinct [Name],[Address1],[City],[State],[Zip],[Country],[Phone]
INTO APMAST2
FROM APMAST
I then added name2 column in APMAST2 and used a statement to create a sequential id field into APMAST2.
DECLARE #id INT
SET #id = 0
UPDATE APMAST2
SET #id = id = #id + 1
GO
Now I have my distinct info plus a blank name field and a sequential ID field in APMAST2. Now I can join this date with my fakenames table which I generated from. HERE using their bulk tool.
Using a Join Statement I joined my fake data with APMAST2
Update dbo.APMAST2
SET dbo.APMAST2.Name = dbo.fakenames.company,
dbo.APMAST2.Address1 = dbo.fakenames.streetaddress,
dbo.APMAST2.City = dbo.fakenames.City,
dbo.APMAST2.State = dbo.fakenames.State,
dbo.APMAST2.Zip = dbo.fakenames.zipcode,
dbo.APMAST2.Country = dbo.fakenames.countryfull,
dbo.APMAST2.Phone = dbo.fakenames.telephonenumber
FROM
dbo.APMAST2
INNER JOIN
dbo.fakenames
ON dbo.fakenames.number = dbo.APMAST2.id
Now I have my fake data loaded but I kept my original Name field so I could reload this data into my full table ARMAST so now I can do a join between ARMAST2 and ARMAST.
Update dbo.APMAST
SET dbo.APMAST.Name = dbo.APMAST2.Name,
dbo.APMAST.Address1 = dbo.APMAST2.Address1,
dbo.APMAST.City = dbo.APMAST2.City,
dbo.APMAST.State = dbo.APMAST2.State,
dbo.APMAST.Zip = dbo.APMAST2.Zip,
dbo.APMAST.Country = dbo.APMAST2.Country,
dbo.APMAST.Phone = dbo.APMAST2.Phone
FROM
dbo.APMAST
INNER JOIN
dbo.apmast2
ON dbo.apmast.name = dbo.APMAST2.name2
Now my original table has all fake data in it but it keeps the integrity it had , well most of it, so the data looks good when reported on but is de-identified. You can now remove APMAST2 or keep it if you need to match this with other data later on. I know this is long and I am sure there is a better way to do it but this is how I did it, suggestions welcome.

How to increment a column based on two tables that are joined

I am trying to increment a column on a sql server table based on the join between the initial table and the joined table. The idea is to update tblForm10Objectives, set the ObjectiveNumber column to an increment number starting with 1 based on the number of rows returned from the join of tblForm10GoalsObjectives and tblForm10Objectives where ID_Form10Goal equals a number. Example query so far:
Update tblForm10Objectives
Set ObjectiveNumber = rn
From (
Select ROW_NUMBER() over (PARTITION by OG.ID_Form10Goal) as rn
, *
From (
Select *
From tblForm10GoalsObjectives OG
Join tblForm10Objectives O On OG.ID_Form10Objective = O.ID_Form10Objective
Where OG.ID_Form10Goal = 4
Order by O.ID_Form10Objective
) as tblForm10Objectives;
If the select portion of the query is performed the columns are displayed so you can see the ObjectiveNumber is currently 0 where ID_Form10Goal = 4
Once the update runs I need for the ObjectiveNumber to show 1 , 2; since there are two rows for ID_Form10Goal = 4.
I had to introduce a new table to the logic of this update statement, the table name is tblForm10Goals. The objectives need to be pulled by ID_Agency instead of ID_Form10Goal I am getting an error message stating a "a multipart identifier 'dbo.tblForm10Objectives.ID_Form10Objective = rns.ID_Form10Objective' could not be bound. I am using the following SQL Update statement:
UPDATE dbo.tblForm10Objectives
SET ObjectiveNumber = rn
FROM tblForm10Goals As g
Left Join tblForm10GoalsObjectives gobs ON g.ID_Form10Goal = gobs.ID_Form10Goal
Right Join
(
SELECT
ROW_NUMBER() OVER (PARTITION BY g.ID_Agency
ORDER BY OB.ID_Form10Objective) AS rn,
OB.ID_Form10Objective
FROM tblForm10Goals g
LEFT JOIN dbo.tblForm10GoalsObjectives gobs ON g.ID_Form10Goal = gobs.ID_Form10Goal
RIGHT JOIN dbo.tblForm10Objectives OB ON gobs.ID_Form10Objective = OB.ID_Form10Objective
Where g.ID_Agency = 2
) rns ON dbo.tblForm10Objectives.ID_Form10Object = rns.ID_Form10Objective
Your example seems to be missing a closing parenthesis somewhere, and without the table structures to look at, I can't be certain of my answer. It seems you have two tables:
tblForm10Objectives
-------------------
ID_Form10Objective
ObjectiveNumber
...
and
tblForm10GoalsObjectives
------------------------
ID_Form10Goal
ID_Form10Objective
...
If this is the case, the following query should give you the results you desire:
UPDATE dbo.tblForm10Objectives
SET ObjectiveNumber = rn
FROM dbo.tblForm10Objectives INNER JOIN
(
SELECT
ROW_NUMBER() OVER (PARTITION BY OG.ID_Form10Goal
ORDER BY O.ID_Form10Objective) AS rn,
O.ID_Form10Objective
FROM dbo.tblForm10Objectives O INNER JOIN
dbo.tblForm10GoalsObjectives OG ON OG.ID_Form10Objective = O.ID_Form10Objective
Where OG.ID_Form10Goal = 4
) rns ON dbo.tblForm10Objectives.ID_Form10Objective = rns.ID_Form10Objective
If you run the inner SELECT statement, you will see the desired ObjectiveNumber values and the corresponding ID_Form10Objective that will get updated with those values.
If you post your table structures, I or someone else may be able to be of more help.

Top Row within the Second Table

I'm having an issue with getting TOP to work within my SQL query. I only want to see the first row within the PART_AML table. I'm not having any luck trying to only query that table without querying the PART table.
How can I go about only showing the top row within the PART_AML table? I'm using Microsoft SQL.
Thank you for your help its greatly appreciated.
SELECT innovator.PART.STATE,
innovator.PART.NAME,
innovator.PART.ITEM_NUMBER,
innovator.PART.ID,
innovator.PART.TYPE,
innovator.MANUFACTURER_PART.SPEC_URL
FROM innovator.PART
join innovator.PART_AML
on innovator.PART_AML.SOURCE_ID = innovator.PART.ID
join innovator.MANUFACTURER_PART
on innovator.MANUFACTURER_PART.ID = innovator.PART_AML.RELATED_ID
WHERE
(innovator.PART.IS_CURRENT = 1) AND (innovator.PART_AML.IS_CURRENT = 1) AND (innovator.MANUFACTURER_PART.IS_CURRENT = 1)
Current Output
Number Name ID Type Spec
E000836 1k ID1 Resistor SPEC 1
E000836 1k ID1 Resistor SPEC 2
E000836 1k ID1 Resistor SPEC 3
E003455 14.400MHz ID2 Crystal SPEC 1
E003455 14.400MHz ID2 Crystal SPEC 2
E003455 14.400MHz ID2 Crystal SPEC 3
Preferred Output
Number Name ID Type Spec
E000836 1k ID1 Resistor SPEC 1
E003455 14.400MHz ID2 Crystal SPEC 1
You can make use of the ranking function ROW_NUMBER() OVER(ORDER BY ...) to do this:
WITH CTE
AS
(
SELECT
i.STATE,
i.NAME,
i.ITEM_NUMBER,
i.ID,
i.TYPE,
p.SPEC_URL,
ROW_NUMBER() OVER(PARTITION BY i.ID
ORDER BY p.SPEC_URL) AS Rownum
FROM innovator.PART AS i
INNER JOIN innovator.PART_AML AS a on a.SOURCE_ID = i.ID
INNER JOIN innovator.MANUFACTURER_PART AS p on p.ID = a.RELATED_ID
WHERE i.IS_CURRENT = 1
AND a.IS_CURRENT = 1
AND p.IS_CURRENT = 1
)
SELECT *
FROM CTE
WHERE rownum = 1;
Also you can use simple GROUP BY clause with MIN() function
SELECT innovator.PART.STATE,
innovator.PART.NAME,
innovator.PART.ITEM_NUMBER,
innovator.PART.ID,
innovator.PART.TYPE,
MIN(innovator.MANUFACTURER_PART.SPEC_URL) AS SPEC_URL
FROM innovator.PART join innovator.PART_AML on innovator.PART_AML.SOURCE_ID = innovator.PART.ID
join innovator.MANUFACTURER_PART on innovator.MANUFACTURER_PART.ID = innovator.PART_AML.RELATED_ID
WHERE (innovator.PART.IS_CURRENT = 1) AND (innovator.PART_AML.IS_CURRENT = 1) AND (innovator.MANUFACTURER_PART.IS_CURRENT = 1)
GROUP BY innovator.PART.STATE, innovator.PART.NAME, innovator.PART.ITEM_NUMBER, innovator.PART.ID, innovator.PART.TYPE

Selective update in SQL Server

I've created a junction table like this one:
http://imageshack.us/scaled/landing/822/kantotype.png
I was trying to figure out a query that could able to select some rows - based on the PokémonID - and then updating only the first or second row after the major "filtering".
For example:
Let's suppose that I would like to change the value of the TypeID from the second row containing PokémonID = 2. I cannot simply use UPDATE KantoType SET TypeID = x WHERE PokémonID = 2, because it will change both rows!
I've already tried to use subqueries containing IN,EXISTS and LIMIT, but with no success.
Its unclear what are your trying to do. However, you can UPDATE with JOIN like so:
UPDATE
SET k1.TypeID = 'somethng' -- or some value from k2
FROM KantoType k1
INNER JOIN
(
Some filtering and selecting
) k2 ON k1.PokémonID = k2.PokémonID
WHERE k1.PokémonID = 2;
Or: if you want to UPDATE only the two rows that have PokémonID = 2 you can do this:
WITH CTE
AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY TypeID) rownum
FROM KantoType
WHERE PokemonID = 2
)
UPDATE c
SET c.TypeID = 5
FROM CTE c
WHERE c.rownum = 1;
SQL Fiddle Demo
I can suggest something like this if you just need to update a single line in your table:
UPDATE kantotype
SET
type = 2
WHERE pokemon = 2
AND NOT EXISTS (SELECT * FROM kantotype k2
WHERE kantotype.type > k2.type
AND kantotype.pokemon = k2.pokemon)
It would be easier to get the first or last item of the table if you had unique identifier field in your table.
Not sure even if you are trying to update the row with PokemenID =2 by doing a major filtering on TypeID... So just out of assumptiong (big one), you can give a try on Case
UPDATE yourtable a
LEFT JOIN youtable b on a.pokeid = b.pokeid
SET a.typeid = (CASE
WHEN a.typeid < b.typeid THEN yourupdatevalue
WHEN a.typeid > b.typeid THEN someothervalue
ELSE a.typeid END);
If you know the pokemon ID and the type id then just add both to the where clause of your query.
UPDATE KantoType
SET TypeID = x
WHERE PokémonID = 2
AND TypeID=1
If you don't know the type ID, then you need to provide more information about what you're trying to accomplish. It's not clear why you don't have this information.
Perhaps think about what is the unique identifier in your data set.

What is the correct pattern for working without deferred constraints?

I am using databases that aren't Oracle or Postgresql, which means I don't have access to deferred constraints, which means that constraints must be valid at all times (instead of just on commit).
Let's say I'm storing a linked list type structure in a database like so:
id parentId
---------------
1 null
2 1
3 2
4 3
5 4
6 5
parentId is a foreign key reference to id, and is required to be unique via a constraint.
Let's say I wanted to move item 5 to sit just before item 1, so our DB would look like this:
id parentId
---------------
1 null
2 5 <-- different
3 2
4 3
5 1 <-- different
6 4 <-- different
Three rows need to be altered, which is three update statements. Any one of these update statements will cause a constraint violation: all three statements must be complete before the constraint would be valid again.
My question is: what is the best way of not violating the uniqueness constraint?
I can currently conceive of two different solutions, neither of which I like:
Set each affected parentId to null and then perform the three updates
Completely change my data model so it's more of a 'copy on write' style versioned database, where these sorts of issues are not a problem.
You can do this in a single query. I'm sure there are many variations of this, but here is what I would use...
DECLARE
#node_id INT,
#new_parent_id INT
SELECT
#node_id = 5,
#new_parent = 1
UPDATE
yourTable
SET
parent_id = CASE WHEN yourTable.id = target_node.id THEN new_antiscendant.id
WHEN yourTable.id = descendant.id THEN target_node.parent_id
WHEN yourTable.id = new_descendant.id THEN target_node.id
END
FROM
yourTable AS target_node
LEFT JOIN
yourTable AS descendant
ON descendant.parent_id = target_node.id
LEFT JOIN
yourTable AS new_antiscendant
ON new_antiscendant.id = #new_parent_id
LEFT JOIN
yourTable AS new_descendant
ON COALESCE(new_descendant.parent_id, -1) = COALESCE(new_antiscendant.id, -1)
INNER JOIN
yourTable
ON yourTable.id IN (target_node.id, descendant.id, new_descendant.id)
WHERE
target_node.id = #node_id
This will work even if the #new_parent_id is NULL or the last record in the list.
MySQL doesn't like self joins in updates, so the approach would probably be to do the LEFT JOINs into a temporary table to get the new mapping. Then join on that table to update all three recors in a single query.
INSERT INTO
yourTempTable
SELECT
yourTable.id AS node_id,
CASE WHEN yourTable.id = target_node.id THEN new_antiscendant.id
WHEN yourTable.id = descendant.id THEN target_node.parent_id
WHEN yourTable.id = new_descendant.id THEN target_node.id
END AS new_parent_id
FROM
yourTable AS target_node
LEFT JOIN
yourTable AS descendant
ON descendant.parent_id = target_node.id
LEFT JOIN
yourTable AS new_antiscendant
ON new_antiscendant.id = #new_parent_id
LEFT JOIN
yourTable AS new_descendant
ON COALESCE(new_descendant.parent_id, -1) = COALESCE(new_antiscendant.id, -1)
INNER JOIN
yourTable
ON yourTable.id IN (target_node.id, descendant.id, new_descendant.id)
WHERE
target_node.id = #node_id
UPDATE
yourTable
SET
parent_id = yourTempTable.newParentID
FROM
yourTable
INNER JOIN
yourTempTable
ON yourTempTamp.node_id = yourTable.id
(The exact syntax depends on your RDBMS.)