Need to duplicate a row an its related data in other tables. Revision a row - sql

My company has a database with Project related data. At times, they would like to Revise a project, keeping the old version and copying it so they can work on a copied version. The project table has a revision field that defaults to 0 and should increment by one when they click a revise button on the front-end website. The hierarchy would look like:
Project(ProjectID)
Project_Details: (ID) | (ProjectID)
Activities: (ID) | (ProjectID)
Activity_Details: (ID) | (ActivitiesID)
ProjectID will link all my tables together. I Have an Activities table that will contain activities for a project. So one to many. The Activities table will link all of its table by ActivityID.
What i Have so far just to test out:
INSERT INTO Project SELECT projectnumber, MAX(Revision)+1 FROM Project Where projectnumber = '23.444.555'
SELECT ##IDENTITY
INSERT INTO ProjectDatails SELECT ##IDENTITY, Rate, Department FROM ProjectDatails where projectid = #projectid
INSERT INTO Activities SELECT ##IDENTITY, Area_No, Completed_Date FROM Activities where projectid = #projectid
This is where i am not sure what to do from here. I need to copy all my rows from an Activity_Details table that relate to my Activities table by activityid. However, there are multiple rows in my Activities table with the same ProjectID.
So it looks something like a foreach row in Activities with ProjectID = #projectid, get the activityid in that row, copy all rows in Activity_Details with that activityid.
How do I accomplish that.

No need for a loop. What you need is a mapping between the 'old' and 'new' activity records and use that mapping to create the Activity_details with the correct ActivityID.
If you can add another field on Activities, which will store the last ActivityID that record was copied from, you can use that in the join to insert into activities details:
INSERT INTO Activities (ProjectID, Area_No, Completed_Date, Last_ActivityID)
SELECT #newprojectid, Area_No, Completed_Date, ActivityID FROM Activities where projectid = #projectid
INSERT INTO Activity_Details (ActivityID, Details)
SELECT Activities.ActivityID, Details FROM Activity_Details
INNER JOIN Activities ON Activity_Details.ActivityID = Activities.Last_ActivityID
where Activities.projectid = #newprojectid
If you cannot (or don't want to) add that field, you will have to rely on a MERGE statement to get the get the mapping. Quite a bit trickier, but still doable. Probably best left to a different answer, if desired.

Related

Best approach to populate new tables in a database

I have a problem I have been working on the past several hours. It is complex (for me) and I don't expect someone to do it for me. I just need the right direction.
Problem: We had the tables (below) added to our database and I need to update them based off of data already in our DailyCosts table. The tricky part is that I need to take DailyCosts.Notes and move it to PurchaseOrder.PoNumber. Notes is where we currenlty have the PONumbers.
I started with the Insert below, testing it out on one WellID. This is Inserting records from our DailyCosts table to the new PurchaseOrder table:
Insert Into PurchaseOrder (PoNumber,WellId,JObID,ID)
Select Distinct Cast(Notes As nvarchar(20)), WellID, JOBID,
DailyCosts.DailyCostID
From DailyCosts
Where WellID = '24A-23'
It affected 1973 rows (The Notes are in Ntext)
However, I need to update the other new tables because we need to see the actual PONumbers in the application.
This next Insert is Inserting records from our DailyCost table and new PurchaseOrder table (from above) to a new table called PurchaseOrderDailyCost
Insert Into PurchaseOrderDailyCost (WellID, JobID, ReportNo, AccountCode, PurchaseOrderID,ID,DailyCostSeqNo, DailyCostID)
Select Distinct DailyCosts.WellID,DailyCosts.JobID,DailyCosts.ReportNo,DailyCosts.AccountCode,
PurchaseOrder.ID,NEWID(),0,DailyCosts.DailyCostID
From DailyCosts join
PurchaseOrder ON DailyCosts.WellID = PurchaseOrder.WellID
Where DailyCosts.WellID = '24A-23'
Unfortunately, this produces 3,892,729 records. The Notes field contains the same list of PONumbers each day. This is by design so that the people inputting the data out in the field can easily track their PO numbers. The new PONumber column that we are moving the Notes to would store just unique POnumbers. I modified the query by replacing NEWID() with DailyCostID and the Join to ON DailyCosts.DailyCostID = PurchaseOrder.ID
This affected 1973 rows the same as the first Insert.
The next Insert looks like this:
Insert Into PurchaseOrderAccount (WellID, JobID, PurchaseOrderID, ID, AccountCode)
Select PurchaseOrder.WellID, PurchaseOrder.JobID, PurchaseOrder.ID, PurchaseOrderDailyCost.DailyCostID,PurchaseOrderDailyCost.AccountCode
From PurchaseOrder Inner Join
PurchaseOrderDailyCost ON PurchaseOrder.ID = PurchaseOrderDailyCost.DailyCostID
Where PurchaseOrder.WellID = '24A-23'
The page in the application now shows the PONumbers in the correct column. Everything looks like I want it to.
Unfortunately, it slows down the application to an unacceptable level. I need to figure out how to either modify my Insert or delete duplicate records. The problem is that there are multiple foreign key constraints. I have some more information below for reference.
This shows the application after the inserts. These are all duplicate records that I am hoping to elminate
Here is some additional information I received from the vendor about the tables:
-- add a new purchase order
INSERT INTO PurchaseOrder
(WellID, JobID, ID, PONumber, Amount, Description)
VALUES ('MyWell', 'MyJob', NEWID(), 'PO444444', 500.0, 'A new Purchase Order')
-- link a purchase order with id 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3' to a DailyCost record with SeqNo 0 and AccountCode 'MyAccount'
INSERT INTO PurchaseOrderDailyCost
(WellID, JobID, ReportNo, AccountCode, DailyCostSeqNo, PurchaseOrderID, ID)
VALUES ('MyWell', 'MyJob', 4, 'MyAccount', 0, 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3', NEWID())
-- link a purchase order with id 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3' to an account code 'MyAccount'
-- (i.e. make it choosable from the DailyCost PO-column dropdown for any DailyCost record whose account code is 'MyAccount')
INSERT INTO PurchaseOrderAccount
(WellID, JobID, PurchaseOrderID, ID, AccountCode)
VALUES ('MyWell', 'MyJob', 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3', NEWID(), 'MyAccount')
-- link a purchase order with id 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3' to an AFE No. 'MyAFENo'
-- (same behavior as with the account codes above)
INSERT INTO PurchaseOrderAFE
(WellID, JobID, PurchaseOrderID, ID, AFENo)
VALUES ('MyWell', 'MyJob', 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3', NEWID(), 'MyAFENo')
So it turns out I missed some simple joining principles. The better I get the more silly mistakes I seem to make. Basically, on my very first insert, I did not include a Group By. Adding this took my INSERT from 1973 to 93. Then on my next insert, I joined DailyCosts.Notes on PurchaseOrder.PONumber since these are the only records from DailyCosts I needed. This was previously INSERT 2 on my question. From there basically, everything came together. Two steps forward an one step back. Thanks to everyone that responded to this.

Recommended way to deal with updating m2m table postgres

I have the below tables
A project table
project_id,project_name
A skill table
skill_id,skill_name
A project_skill table (many to many relationship)
project_skill_id,project_id,skill_id
The browser will have a form which asks the user to enter a project name and and SO style autocomplete for tags. I'm sending the below json format back to sql for insertion
{"project_name":"foo","skills":["bar","baz"]}
My question relates to a situation where the user gets to edit an existing project.Assuming the user removes "baz" from skills and includes "zed". How do i properly deal with updating the many to many table
{"project_name":"foo","skills":["bar","zed","biz"]}
Do i remove all records from the m2m table and do a fresh insert with the new skills?
remove all records based on project_id
insert new records of bar,zed,biz
Do i check in the server what was removed/added and remove only what was actually removed
remove baz from table
add biz
This also pertains to modifying project_name etc. Do i check what was modified and update the necessary or perform a complete delete and insert
I'd use a CTE with a MERGE (note this is SQL Server but Postgres should be similar):
;WITH src AS
(
SELECT p.project_id, s.skill_id
FROM
dbo.project AS p
INNER JOIN #input AS i ON p.project_name = i.project_name
INNER JOIN dbo.skill AS s ON i.skill_name = s.skill_name
)
MERGE INTO dbo.project_skill AS tgt
USING src
ON tgt.project_id = src.project_id AND tgt.skill_id = src.skill_id
WHEN NOT MATCHED BY TARGET THEN
INSERT (project_id, skill_id) VALUES (src.project_id, src.skill_id)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
where #input contains the new values:
DECLARE #input TABLE
(
project_name VARCHAR(100),
skill_name VARCHAR(100)
);

Performing multiple inserts for a single row in a query

I have a table containing data that i need to migrate into another table with a linking table. This is a one time migration as part of an upgrade.
I have a company table that contains records relating to a company and a contact person.
I want to migrate the contact details into another table and link the new person with a linking table
Consider I have this table which is already populated
tblCompany
CompanyId
CompanyName
RegNo
ContactForename
ContactSurname
And i want to migrate the contact person data to
tblPerson
PersonID (identitycolumn)
Forename
Surname
and use the identity column resulting and insert it into the linking table
tblCompanyPerson
CompanyId
PersonId
I've tried a few different ways to approach this using cursors and output variables into a temp table but none seem right to me (or give me the solution...)
The closest i have got is to have a companyID on tblPerson and insert companyId into it and output the new personId and the companyId into a temp table. Then loop through the temp table to create the tblCompanyContact.
example
declare #companycontact TABLE (companyId int, PersonId int)
insert into tblPerson
(Forename,
Surname,
CompanyID)
output inserted.CompanyID, INSERTED.PersonID into #companycontact
select
ContactPersonForeName,
ContactPersonSurename,
CompanyID
from tblCompany c
insert into tblCompanyPerson
(CompanyID,
PersonID)
select c.companyId, PersonId from #companycontact c
Background
Im using MS SQL Server 2008 R2
The tblPerson is already populated with hundreds of thousands of
records.
There is a 'trick' using MERGE statement to achieve mapping between newly inserted and source values:
MERGE tblPerson trgt
USING tblCompany src ON 1=0
WHEN NOT MATCHED
THEN INSERT
(Forename, Surename)
VALUES (src.ContactPersonForeName, src.ContactPersonSurename)
OUTPUT src.CompanyID, INSERTED.PersonID
INTO tblCompanyPerson (CompanyId, PersonID);
That 1=0 condition is to always get everything from source. You might want to replace it or even whole source with some sub-query to actually check whatever you already have same person mapped.
EDIT: Here is some reading about using MERGE and OUTPUT
Because I don't know what SQL you are using its difficult to decide if this is correct. i also don't know if you already tried this but it's the best idea i have:
insert into tblPerson
(Forename, Surename)
Select ContactForename, ContactPersonSurename
from tblCompany
insert into tblCompanyPerson
(CompanyID, PersonID)
select CompanyId, PersonID
from tblPerson, tblCompany
where ContactForename = Forename and ContactPersonSurename = Surename
Sarajog

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.

How to use SQL Merge populating Master Detail related tables

I've been searching and I've yet to find an example using merge for populating related tables.
The Northwind DB Order & OrderDetail tables could be used. (In our scenario, our tables are 3 levels deep.)
For simplicity let's say we have the following tables.
Orders
OrderID PK
OrderNumber
OrderDetails
OrderID - PK
OrderLineItemNumber PK - FK to Orders.OrderID field
OrderDetailDetails
OrderID - PK - FK to OrderDetails.OrderID
OrderLineItemNumber - PK - FK to OrderDetails.OrderLineItemNumber
OrderSequenceNumber - PK
Also, in this scenario, records get written to staging tables that are identical the tables above. The merge would need to merge records from the 3 staging tables to the 3 matching production tables.
The production Order table's OrderId will not share the staging Order tables OrderId value.
So if the merge conditions are met, then there must be an insert into the Order table to generate OrderId (set to identity) because OrderId is needed for the OrderDetail & OrderDetailDetails rows to be created.
Right now I've written a service in C# that does all this but it's not that performant.
MERGE was discovered so we're looking into it to see if it can be used in a situation such as this. Any tips or pointers would be greatly appreciated.
Thanks.
Edit: I am now using Output store values into a Temporary table called #MergeOutput.
Declare #MergeOutput Table
(
ActionType varchar(10),
InsertedOrderId int,
StagingOrderID int,
DeletedOrderId int
);
However, I need to do a Merge on all 3 tables. (Order, OrderDetail & OrderDetailDetails)
Also, these tables have more fields than just the Id's.
So I've started creating the 2nd Merge for the OrderDetail table.
MERGE OrderDetail AS OD
USING(
SELECT OrderID,
OrderLineItemNumber,
ProductId
FROM OrderDetail AS OD
where OrderId IN (Select StagingOrderID from #MergeOutput where ActionType = 'INSERT'
) AS src(OrderID,
OrderLineItemNumber,
ProductId
)
ON (OD.OrderId = src.Order AND OD.OrderLineItemNumber = src.OrderLineItemNumber)
WHEN NOT MATCHED By Target THEN
INSERT INTO <-- (This doesn't work no matter what I've tried so far.)
Select (Select Distinct InsertedOrderID from #MergeOutput where StagingOrderId = OrderID), src.OrderLineItemNumber, src.ProductId
;
I see the following errors with the code above.
"Incorrect syntax near the keyword 'into'
I need the functionality of the Merge to move records on all 3 tables
Looks like I've finally got this to working. I had to change the Insert statement to as follows.
Insert(OrderId, OrderLineItemId, ProductID)
Values((Select Distinct InsertedOrderID from #MergeOutput where StagingOrderId = OrderID), src.OrderLineItemNumber, src.ProductId)
I had tried this Insert statement earlier on. I just figured out I had to wrap the selection parens ().
Thanks for everyone's help. I'm hoping I can carry this over to the merge for the 3rd table.
This is a nasty problem that keeps coming up. You need to extract the inserted identity values. In SQL Server you can do this using the OUTPUT clause (http://msdn.microsoft.com/en-us/library/ms177564.aspx) with the INSERT "virtual table". This allows you to get all inserted IDs out in one statement.
You can then push the IDs into a temp table and use them to insert the detail records with the appropriate master IDs like this:
INSERT INTO Detail
SELECT * from Staging_Detail
JOIN #MasterIDs on Staging_Detail.MasterID = #MasterIDs.MasterID