Sql Server - Update multiple rows based on the primary key (Large Amount of data) - sql

struggling with this one, quite a lengthy description so ill explain best I can:
I have a table with 12 columns in, 1 being a primary key with identity_insert, 1 a foreign key , the other 10 being cost values, ive created a statement to group them into 5 categories, shown below:
select
(ProductID)ProjectID,
sum(Cost1)Catagory1,
sum(Cost2)Catagory2,
sum(Cost3 + Cost4 + Cost5 + Cost6 + Cost7) Catagory3,
sum(Cost 8 + Cost 9)Catagory4,
sum(Cost10)Catagory5
from ProductTable group by ProductID
ive changed the names of the data to make it more generic, they aren't actually called Cost1 etc by the way ;)
the foreign key can appear multiple times (ProductID) so in the above query the related fields are calculated together based upon this... Now what ive been trying to do is put this query into a table, which i have done successfully, and then update the data via a procedure. the problem im having is that all the data in the table is overwritten by row 1 and when theres is thousands of rows this is a problem.
I have also tried putting the above query into a view and the same result... any suggestions would be great :)
update NewTable set
ProductID = (ProductView.ProductID ),
Catagory1 = (ProductView.Catagory1 ),
Catagory2 = (ProductView.Catagory2 ),
Catagory3 = (ProductView.Catagory3 ),
Catagory4 = (ProductView.Catagory4 ),
Catagory5 = (ProductView.Catagory5 )
from ProductView
I need something along the lines like above.... but one that doesn't overwrite everything with row 1 haha ;)
ANSWERED BY: Noman_1
create procedure NewProducts
insert into NewTable
select ProductID.ProductTable,
Catagory1.ProductView,
Catagory2.ProductView,
Catagory3.ProductView,
Catagory4.ProductView,
Catagory5.ProductView
from ProductView
inner join ProductTable on ProductView.ProductID = ProductTable.ProductID
where not exists(select 1 from NewTable where ProductView.ProductID = NewTable.ProductID)
above procedure locates the new Product that has been created within a view, the procedure query detects that there is a Product that is not located in the NewTable and inserts it via the procedure

As far as i know, and since you want to update all the products in the table, and each product uses all the sums of the product itself from origin, you actually need to update each row 1 by 1, and as consecuence when you do an update like the next, its your only main way
update newtable
set category1 = (select sum(cost1) from productTable where productTable.productId = newtable.ProductID),
category2 = (select sum(cost2) from productTable where productTable.productId = newtable.ProductID),
etc..
Keep in mind that if you have new products, they wont get inserted with the update, you would need like this in order to add them:
Insert into newtable
Select VALUES from productTable a where productId not exists(select 1 from newTable b where a.ProductId = b.ProductId);
A second way, and since you want allways to update all the data, is to simply truncate and do a insert select right after.
Maybe on an Oracle, you would be albe to use a MERGE but im unaware if it would really improve anything.
I asume that simply having a view would not work due the amount of data you state you have.
EDIT, I never knew that the MERGE STATMENT is actually avaiable on SQL Server 2008 and above, with this single statment you could do an UPDATE/INSERT on all but it's efficiency is unknown to me, you may want to test it with your high amount of data:
MERGE newtable AS TARGET
USING select ProductId, sum(cost1) cat1, sum(cost2) cat2 ...
FROM productTable Group by ProductId AS SOURCE
ON TARGET.ProductId = SOURCE.ProductID
WHEN MATCHED
THEN UPDATE SET TARGET.category1 = cat1, TARGET.category2 = cat2...
WHEN NOT MATCHED
THEN INSERT (ProductId, category1, category2,...)
VALUES (SOURCe.ProductId, SOURCE.cat1, SOURCE.cat2...);
More info about merge here:
http://msdn.microsoft.com/library/bb510625.aspx
The example at the end may give you a good overview of the sintax

You haven't given any join condition. SQL Server cannot know that you meant to update rows matched by productid.
update NewTable set
ProductID = (ProductView.ProductID ),
Catagory1 = (ProductView.Catagory1 ),
Catagory2 = (ProductView.Catagory2 ),
Catagory3 = (ProductView.Catagory3 ),
Catagory4 = (ProductView.Catagory4 ),
Catagory5 = (ProductView.Catagory5 )
from NewTable
join ProductView pv on NewTable.productid = pv.productid
You don't need a view. Just past the view query to the place where I said ProductView. Of course, you can use a view.

Related

SQL How to update every nth row which meets requirement

I have a table that I would like to update one column data on every nth row if it meets row requirement.
My table has many columns but the key are Object_Id (in case this could be useful for creating temp table)
But the one I'm trying to update is online_status, it looks like below, but on bigger scales so I usually have 10rows that has same time but they all have %Online% in it and in total around 2000 rows (with Online and about another 2000 with Offline). I just need to update every 2-4 rows of those 10 that are repeating itself.
Table picture here: (for some reason table formatting doesn't come up good)
Table
So what I tried is: This pulls a list of every 3rd record that matches criteria Online, I just need a way to update it but can't get through this.
SELECT * FROM (SELECT *, row_number() over() rn FROM people
WHERE online_status LIKE '%Online%') foo WHERE online_status LIKE '%Online%' AND foo.rn % 3 =0
What I also tried is:
However this has updated every single row. not the ones I needed.
UPDATE people
SET online_status = 'Offline 00:00-24:00'
WHERE people.Object_id IN
(SELECT *
FROM
(SELECT people.Object_id, row_number() over() rn FROM people
WHERE online_status LIKE '%Online%') foo WHERE people LIKE '%Online%' AND foo.rn % 3 =0);
Is there a way to take list from Select code above and simply update it or run a few scripts that could add it to like temp table and store object ids, and the next script would update main table if object id would match temp table.
Thank you for any help :)
Don't select other columns but Object_id in the subquery at WHERE people.Object_id IN (..)
UPDATE people
SET online_status = 'Offline 00:00-24:00'
WHERE Object_id IN
( SELECT Object_id
FROM
( SELECT p.Object_id, row_number() over() rn
FROM people p
WHERE p.online_status LIKE '%Online%') foo
WHERE foo.rn % 3 = 0
);

Add data to trigger from the last insert

I am faced with the following situation:
I created a trigger which reacts on insert to the third table. When I insert any data (for example 1 1 2), last number should be subtracted from the Amount of stock column from cell, which has necessary ID Product (as it's shown on picture). But how can I understand which row was the last added? I thought firstly to do it by select, but it seems impossible. And now I think that it's possible to do it with the help of cursor, but it doesn't seem as the best variant. Is there a better variant how can I do it?
Here's my code of trigger, but it only subtracts 1 from the 1st product each time, unfortunately:
CREATE TRIGGER AmountInsert ON Amount
AFTER INSERT
AS
BEGIN
UPDATE Product
SET Amount_On_Stock =
(SELECT Amount_On_Stock FROM Product
WHERE ID_Product = 1) - 1
WHERE ID_Product = 1
END
The first thing you need to understand is that in a trigger in SQL Server you are provided with an inserted pseudo-table and a deleted pseudo-table. You use these tables to determine what changes have occurred.
I think the following trigger accomplishes what you are looking for - the comments explain the logic.
CREATE TRIGGER dbo.AmountInsert ON dbo.Amount
AFTER INSERT
AS
BEGIN
set nocount on;
update P set
-- Adjust the stock level by the amount of the latest insert
Amount_On_Stock = coalesce(Amount_On_Stock) - I.Amount
from dbo.Product P
inner join (
-- We need to group by ID_Product in case the same product appears in the insert multiple times
select ID_Product, sum(Amount) Amount
from Inserted
group by ID_Product
-- No need to update is net change is zero
having sum(Amount) <> 0
) I
on I.ID_Product = P.ID_Product;
END

updating values of a table from another table through a join changes all values to 1 single value

I have 3 tables: raw_sales, sales and details. raw_sales is being populated using COPY from a txt file. All the fields in raw_sales are either string or text. After importing, we run an sql to populate sales and details. There is a foreign key (sale_id) in details. Here's a sample INSERT command that we use to populate sales and details.
INSERT INTO sales (source, source_identifier)
(SELECT DISTINCT
'FOO' AS source,
"identifier" AS source_identifier
FROM raw_sales
LEFT JOIN sales
ON sales.source_identifier = raw_sales.identifier
AND sales.source = 'FOO'
WHERE sales.id IS NULL
AND identifier IS NOT NULL);
INSERT INTO details (sale_id, description)
(SELECT DISTINCT
sales.id AS sale_id,
"improvements" as description
FROM raw_sales
JOIN sales
ON sales.source_identifier = raw_sales.identifier
AND sales.source = 'FOO'
LEFT JOIN details AS existing
ON existing.sale_id = sales.id
WHERE existing.id IS NULL
AND "improvements" != '');
This seems to work fine. After this, there's another sql that's being ran to update existing tables. The query is as follows
UPDATE details SET
description = "improvements"
FROM raw_sales
JOIN sales
ON sales.source_identifier = raw_sales.identifier
AND sales.source = 'FOO'
JOIN details AS existing
ON existing.sale_id = sales.id
WHERE existing.id IS NOT NULL;
This query updates all rows in the details table to a single value, the first non-empty value from raw_sales table. How can I change the above sql so that it updates the existing records in the details table?
There are several problems with your query:
if details.id is a primary key (field id typically is), then what is the point in comparing it to NOT NULL? You're not using any left joins, there is no way it could possibly be NULL if it truly is an identifier.
UPDATE table t SET ... FROM ... requires linking the table t with something on the FROM section, but you're not, therefore each row of table will be updated to any single random row from the FROM results.
Perhaps you want to do this:
UPDATE details SET
description = "improvements"
FROM raw_sales
JOIN sales ON (sales.source_identifier = raw_sales.identifier AND sales.source = 'FOO')
JOIN details AS existing ON (existing.sale_id = sales.id)
WHERE existing.id = details.id;

SQL-Oracle: Updating table multiple row based on values contained in the same table

I have one table named: ORDERS
this table contains OrderNumber's which belong to the same person and same address lines for that person.
However sometimes the data is inconsistent;
as example looking at the table screenshot: Orders table with bad data to fix -
you all can noticed that orderNumber 1 has a name associated to and addresses line1-2-3-4. sometimes those are all different by some character or even null.
my goal is to update all those 3 lines with one set of data that is already there and set equally all the 3 rows.
to make more clear the result expected should be like this:
enter image description here
i am currently using a MERGE statement to avoid a CURSOR (for loop )
but i am having problems to make it work
here the SQL
MERGE INTO ORDERS O USING
(SELECT
INNER.ORDERNUMBER,
INNER.NAME,
INNER.LINE1,
INNER.LINE2,
INNER.LINE3,
INNER.LINE4
FROM ORDERS INNER
) TEMP
ON( O.ORDERNUMBER = TEMP.ORDERNUMBER )
WHEN MATCHED THEN
UPDATE
SET
O.NAME = TEMP.NAME,
O.LINE1 = TEMP.LINE1,
O.LINE2 = TEMP.LINE2,
O.LINE3 = TEMP.LINE3,
O.LINE4 = TEMP.LINE4;
the biggest issues i am facing is to pick a single row out of the 3 randomly - it does not matter whihc of the data - row i pick to update the line/s
as long i make the records exaclty the same for an order number.
i also used ROWNUM =1 but it in multip[le updates will only output one row and update maybe thousand of lines with the same address and name whihch belong to an order number.
order number is the join column to use ...
kind regards
A simple correlated subquery in an update statement should work:
update orders t1
set (t1.name, t1.line1, t1.line2, t1.line3, t1.line4) =
(select t2.name, t2.line1, t2.line2, t2.line3, t2.line4
from orders t2
where t2.OrderNumber = t1.OrderNumber
and rownum < 2)

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.