How to Bulk Update with SQL Server? - sql

I have a table with 10 millions rows that I need to join with another table and update all data. This is taking more than 1 one hour and it is increasing my transaction log in 10+ GBs. Is there another way to enhance this performance?
I believe that after each update, the indexes and constraints are checked and all information are logged. Is there a way to tell SQL Server to check constraints only after the update is finished and to minimally log the update action?
My query follows below. I've modified some names so it becomes easier to read.
UPDATE o
SET o.Info1 = u.Info1, o.Info2 = u.Info2, o.Info3 = u.Info3
FROM Orders o
INNER JOIN Users u
ON u.ID = o.User_ID
EDIT: as asked in comments, the table definition would be something like the following (simplifying again to create a generic question).
Table Orders
ID int PK
OrderNumber nvarchar(20)
User_ID int FK to table Users
Info1 int FK to table T1
Info2 int FK to table T2
Info2 int FK to table T3
Table Users
ID int PK
UserName nvarchar(20)
Info1 int FK to table T1
Info2 int FK to table T2
Info2 int FK to table T3

First of all there is no such thing as BULK UPDATE, a few things that you can do are as follow:
If possible put your database in simple recovery mode before doing this operation.
Drop indexes before doing update and create them again once update is completed.
do updates in smaller batches , something like
WHILE (1=1)
BEGIN
-- update 10,000 rows at a time
UPDATE TOP (10000) O
FROM Table O inner join ... bla bla
IF (##ROWCOUNT = 0)
BREAK;
END
Note
if you go with the simple mode option, dont forget to take a full-backup after you switch the recovery mode back to full. Since simply switching it back to full recovery mode will not strat logging until you take a full backup.

For my case, load data as need(Dotnet winform), and try create a new table, apply bulk and update the basic table with join by bulk table!, for 1M rows , it take me about 5 second

Related

Automatically remove a row without foreign references

I am using sqlite3.
I have one "currencies" table, and two tables that reference the currencies table using a foreign key, as follows:
CREATE TABLE currencies (
currency TEXT NOT NULL PRIMARY KEY
);
CREATE TABLE table1 (
currency TEXT NOT NULL PRIMARY KEY,
FOREIGN KEY(currency)
REFERENCES currencies(currency)
);
CREATE TABLE table2 (
currency TEXT NOT NULL PRIMARY KEY,
FOREIGN KEY(currency)
REFERENCES currencies(currency)
);
I would like to make sure that rows in the "currencies" table that are not referenced by any row from "table1" and "table2" will be removed automatically. This should behave like some kind of ref-counted object. When the reference count reaches zero, the relevant row from the "currencies" table should be erased.
What is the "SQL way" to solve this problem?
I am willing to redesign my tables if it could lead to an elegant solution.
I prefer to avoid solutions that require extra work from the application side, or solutions that require periodic cleanup.
Create an AFTER DELETE TRIGGER in each of table1 and table2:
CREATE TRIGGER remove_currencies_1 AFTER DELETE ON table1
BEGIN
DELETE FROM currencies
WHERE currency = OLD.currency
AND NOT EXISTS (SELECT 1 FROM table2 WHERE currency = OLD.currency);
END;
CREATE TRIGGER remove_currencies_2 AFTER DELETE ON table2
BEGIN
DELETE FROM currencies
WHERE currency = OLD.currency
AND NOT EXISTS (SELECT 1 FROM table1 WHERE currency = OLD.currency);
END;
Every time that you delete a row in either table1 or table2, the trigger involved will check the other table if it contains the deleted currency and if it does not contain it, it will be deleted from currencies.
See the demo.
There is no automatic way of doing this. The reverse can be handling using cascading delete foreign key references. The reverse is that when a currency is deleted all related rows are.
You could schedule a job daily running something like:
delete from currencies c
where not exists (select 1 from table1 t1 where t1.currency = c.currency) and
not exists (select 1 from table2 t2 where t2.currency = c.currency);
If you need an automatic way for doing that, then most dbms provide a trigger mechanism. You can create a trigger on update and delete operations that run the folowing query:
you can use a left join for that:
https://www.w3schools.com/sql/sql_join_left.asp
It return a row for all rows from the left table, even if there is no corresponding row in the right table, replacing the rows form the right with null. You can then check a not null right table field for null with is null. This will filter for the rows the have no counterpart in the right table.
For example:
SELECT currencies.currency FROM currencies LEFT JOIN table1 WHERE table1.currency IS NULL
will show the relevant rows for table1.
You can do the same with table two.
This will give you two queries, that shows which rows have no couterpart.
You can then use intersect on the result, so that you have the rows that have not couterpart in either:
SELECT * FROM query1 INTERSECT SELECT * FROM query2
Now you have the list of currencies to be deleted.
You can finish this by using a subqueried delete:
DELETE FROM currencies WHERE currency IN (SELECT ...)

How to access a specific column in the result set of a cursor

I'm creating a cursor to collect a set of rows that I want to delete in a table. But I also want to delete records in related tables that key off the table. Is it possible to get the field in a single column of a cursor so that I can use that value to delete the fields in the other tables as well (The field isn't explicitly defined as a foreign key in the other tables)
Agree with Sami's comment; people tend to go to cursors because they think in a row-by-row processing mode, but SQLServer is totally focused on operating on things as a set. Consider building a temp table of all the rows you want to delete, then run delete operations that use that temp table as a driver for which rows in which other tables need deleting. If you can't use cascading deletes from FKs you need to engage in joining and delete in order from child up to parent
Example:
CREATE TABLE #delusers (userid INT);
INSERT INTO #delusers VALUES(1);
INSERT INTO #delusers VALUES(2);
INSERT INTO #delusers VALUES(3);
--For relationships:
-- User.ID--hasmany-->Order.UserID and
-- Order.ID--hasmany-->OrderProduct.OrderID
BEGIN TRANSACTION;
DELETE FROM OrderProducts WHERE OrderID IN (SELECT o.id FROM orders o INNER JOIN #delusers u on o.userid = u.userid);
DELETE FROM Orders WHERE UserID IN (SELECT userid from #delusers);
DELETE FROM Users WHERE UserID IN (SELECT userid from #delusers);
COMMIT TRANSACTION;
Here I've used a temp table as an example, but it's as easy to use a table var or even the original table just with a where clause if you feel the need. I've also used IN as it's the easiest way to deliver a readable demo of the overall point; you may choose some different method of coordination
Example 2:
--delete all orders and products on those orders from people with last name smith
--For relationships:
-- User.ID--hasmany-->Order.UserID and
-- Order.ID--hasmany-->OrderProduct.OrderID
BEGIN TRANSACTION;
DELETE FROM OrderProducts WHERE OrderID IN (SELECT o.id FROM orders o INNER JOIN users u on o.userid = u.userid WHERE u.lastname = 'smith');
DELETE FROM Orders WHERE UserID IN (SELECT userid from users WHERE u.lastname = 'smith');
DELETE FROM Users WHERE lastname = 'smith';
COMMIT TRANSACTION;

Update tables in one database from multiple tables in another database regularly

I have 2 databases in SQL Server, DB1 has multiple tables and some of the tables are updated with new records continuously. DB2 has only 1 table which should contain all the combined info from the multiple tables in DB1, and needs to be updated every 2 hours.
For example, DB1 has 3 tables: "ProductInfo", "StationRecord", "StationInfo". The first 2 tables both have a timestamp column that indicates when a record is created (i.e. the two tables are updated asynchronously, ONLY when a product passes all stations in "StationRecord" will "ProductInfo" be updated with a new product), and the last table is fixed.
The tables are as follows:
USE DB1
GO
CREATE TABLE ProductInfo
ProductID bigint Primary Key NOT NULL
TimeCreated datetime
ProductName nvarchar(255)
CREATE TABLE StationRecord
RecordID bigint Primary Key NOT NULL
TimeCreated datetime
ProductID bigint NOT NULL
StationID bigint
CREATE TABLE StationInfo
StationID bigint Primary Key NOT NULL
BOM_used nvarchar(255)
DB2 has only 1 table which contains a composite PK of "ProductID" & "StationID", as follows:
CREATE TABLE DB2.BOMHistory AS
SELECT
DB1.ProductInfo.ProductID
DB1.ProductInfo.TimeCreated AS ProductCreated
DB1.StationInfo.StationID
DB1.StationInfo.BOM_used
FROM DB1.ProductInfo
JOIN DB1.StationRecord
ON DB1.ProductInfo.ProductID = DB1.StationRecord.ProductID
JOIN DB1.StationInfo
ON DB1.StationRecord.StationID = DB1.StationInfo.StationID
constraint PK_BOMHistory Primary Key (ProductID,StationID)
I figured out the timing portion which is to use create a job with some pre-set schedules, and the job is to execute a stored procedure. The problem is how to write the stored procedure properly, which has to do the following things:
wait for the last product to pass all stations (and the "stationInfo" table is updated fully)
find all NEW records generated in this cycle in the tables in DB1
combine the information of the 3 tables in DB1
insert the combined info into DB2.BOMHistory
Here's my code:
ALTER Procedure BOMHistory_Proc
BEGIN
SELECT
DB1.ProductInfo.ProductID,
DB1.ProductInfo.TimeCreated AS ProductCreated
DB1.StationInfo.StationID,
DB1.StationInfo.BOM_used
into #temp_BOMList
FROM DB1.ProductInfo
JOIN DB1.StationRecord
ON DB1.ProductInfo.ProductID = DB1.StationRecord.ProductID
JOIN DB1.StationInfo
ON DB1.StationRecord.StationID = DB1.StationInfo.StationID
ORDER BY ProductInfo.ProductID
END
SELECT * from #temp_BOMList
INSERT INTO DB2.BOMHistory(ProductID, ProductCreated, StationID, BOM_used)
SELECT DISTINCT (ProductID, stationID)
FROM #temp_BOMList
WHERE (ProductID, stationID) NOT IN (SELECT ProductID, stationID FROM DB2.BOMHistory)
The Condition in the INSERT statement is not working, please provide some advice.
Also, should I use a table variable or a temp table for this application?
Try:
INSERT INTO DB2.BOMHistory(ProductID, ProductCreated, StationID, BOM_used)
SELECT DISTINCT tb.ProductID, tb.ProductCreated, tb.StationID, tb.BOM_used
FROM #temp_BOMList tb
WHERE NOT EXISTS
(SELECT * FROM DB2.BOMHistory WHERE ProductID = tb.ProductID AND StationID = tb.StationID)

Create a field in Firebird which displays data from another table

I didn't find a working solution for creating a "lookup column" in a Firebird database.
Here is an example:
Table1: Orders
[OrderID] [CustomerID] [CustomerName]
Table2: Customers
[ID] [Name]
When I run SELECT * FROM ORDERS I want to get OrderID, CustomerID and CustomerName....but CustomerName should automatically be computed by looking for the "CustomerID" in the "ID" column of "Customer" Table, returning the content of the "Name" column.
Firebird has calculated fields (generated always as/computed by), and these allow selecting from other tables (contrary to an earlier version of this answer, which stated that Firebird doesn't support this).
However, I suggest you use a view instead, as I think it performs better (haven't verified this, so I suggest you test this if performance is important).
Use a view
The common way would be to define a base table and an accompanying view that gathers the necessary data at query time. Instead of using the base table, people would query from the view.
create view order_with_customer
as
select orders.id, orders.customer_id, customer.name
from orders
inner join customer on customer.id = orders.customer_id;
Or you could just skip the view and use above join in your own queries.
Alternative: calculated fields
I label this as an alternative and not the main solution, as I think using a view would be the preferable solution.
To use calculated fields, you can use the following syntax (note the double parentheses around the query):
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as ((select name from customer where id = customer_id))
)
Updates to the customer table will be automatically reflected in the orders table.
As far as I'm aware, the performance of this option is less than when using a join (as used in the view example), but you might want to test that for yourself.
FB3+ with function
With Firebird 3, you can also create calculated fields using a trigger, this makes the expression itself shorter.
To do this, create a function that selects from the customer table:
create function lookup_customer_name(customer_id integer)
returns varchar(50)
as
begin
return (select name from customer where id = :customer_id);
end
And then create the table as:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as (lookup_customer_name(customer_id))
);
Updates to the customer table will be automatically reflected in the orders table. This solution can be relatively slow when selecting a lot of records, as the function will be executed for each row individually, which is a lot less efficient than performing a join.
Alternative: use a trigger
However if you want to update the table at insert (or update) time with information from another table, you could use a trigger.
I'll be using Firebird 3 for my answer, but it should translate - with some minor differences - to earlier versions as well.
So assuming a table customer:
create table customer (
id integer generated by default as identity primary key,
name varchar(50) not null
);
with sample data:
insert into customer(name) values ('name1');
insert into customer(name) values ('name2');
And a table orders:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name varchar(50) not null
)
You then define a trigger:
create trigger orders_bi_bu
active before insert or update
on orders
as
begin
new.customer_name = (select name from customer where id = new.customer_id);
end
Now when we use:
insert into orders(customer_id) values (1);
the result is:
id customer_id customer_name
1 1 name1
Update:
update orders set customer_id = 2 where id = 1;
Result:
id customer_id customer_name
1 2 name2
The downside of a trigger is that updating the name in the customer table will not automatically be reflected in the orders table. You would need to keep track of these dependencies yourself, and create an after update trigger on customer that updates the dependent records, which can lead to update/lock conflicts.
No need here a complex lookup field.
No need to add a persistant Field [CustomerName] on Table1.
As Gordon said, a simple Join is enough :
Select T1.OrderID, T2.ID, T2.Name
From Customers T2
Join Orders T1 On T1.IDOrder = T2.ID
That said, if you want to use lookup Fields (as we do it on a Dataset) with SQL you can use some thing like :
Select T1.OrderID, T2.ID,
( Select T3.YourLookupField From T3 where (T3.ID = T2.ID) )
From Customers T2 Join Orders T1 On T1.IDOrder = T2.ID
Regards.

SQL chunk update with JOIN

I have a 2 tables in my DB, one contain data about clients (called Clients), the other table contains clientID, Guid, AddedTime and IsValid (called ClientsToUpdate).
ClientID is related to the clients table, Guid is a unique identifier, AddedTime is the time when the record was added to the table, and IsValid is a bit indicated if this ClientID was updated or not.
What I want to do, is update all the Clients that their ID is in ClientsToUpdate, the problem is, the ClientsToUpdate table contains over than 80,000 records and I am getting deadlocks.
What I though I can do, is update 2000 clients at a time, using a while loop or something similar.
MY Stored Procedure looks like:
UPDATE client SET LastLogin=GETDATE()
FROM Clients client
JOIN ClientsToUpdate ctu ON client.ID = ctu.ClientID;
Any idea how can I do it?
declare #done table (ClientID int primary key)
while 1=1
begin
update top (2000) c
set lastlogin = getdate()
output deleted.id into #done
from Clients c
join ClientsToUpdate ctu
on c.id = ctu.ClientID
where not exists
(
select *
from #done d
where d.ClientID = ctu.ClientID
)
if ##rowcount = 0
break
end
Example at SQL Fiddle.
If you experience deadlocks, updating in chunks might reduce errors (assuming you carefully manage your transactions and commit your chunk updates), but does not resolve the deadlocks origin. IMHO you should investigate lock problems and find why you have deadlocks