simple UPDATE query on large table has bad performance - sql

I need to do the following update query through a stored procedure:
UPDATE table1
SET name = #name (this is the stored procedure inputparameter)
WHERE name IS NULL
Table1 has no indexes or keys, 5 columns which are 4 integers and 1 varchar (updatable column 'name' is the varchar column)
The NULL records are about 15.000.000 rows that need updating. This takes about 50 minutes, which I think is too long.
I'm running an Azure SQL DB Standard S6 (400DTU's).
Can anyone give me an advise to improve performance?

As you don't have any keys, or indexes, I can suggest following approach.
1- Create a new table using INTO (which will copy the data) like following query.
SELECT
CASE
WHEN NAME IS NULL THEN #name
ELSE NAME
END AS NAME,
<other columns >
INTO dbo.newtable
FROM table1
2- Drop the old table
drop table table1
3- Rename the new table to table1
exec sp_rename 'dbo.newtable', 'table1'
Another approach can be using batch update, sometime you get better performance compared to bulk update (You need to test by adjusting the batch size).
WHILE EXISTS (SELECT 1 FROM table1 WHERE name is null)
BEGIN
UPDATE TOP (10000) table1
SET name = #name
WHERE n ame is null
END

can you do with following method ?
UPDATE table1
SET name = ISNULL(name,#name)
for null values it will update with #name and rest will be updated with same value.

No. You are updating 15,000,000 rows which is going to take a long time. Each update has overhead for finding the row and logging the value.
With so many rows to update, it is unlikely that the overhead is finding the rows. If you add an index on name, the update is going to actually have to update the index as well as updating the original values.
If your concern is locking the database, you can set up a loop where you do something like this over and over:
UPDATE TOP (100000) table1
SET name = #name (this is the stored procedure inputparameter)
WHERE name IS NULL;
100,000 rows should be about 30 seconds or so.
In this case, an index on name does help. Otherwise, each iteration of the loop would in essence be reading the entire table.

Related

Copy a table data from one database to another database SQL

I have had a look at similar problems, however none of the answers helped in my case.
Just a little bit of background. I have Two databases, both have the same table with the same fields and structure. Data already exists in both tables. I want to overwrite and add to the data in db1.table from db2.table the primary ID is causing a problem with the update.
When I use the query:
USE db1;
INSERT INTO db2.table(field_id,field1,field2)
SELECT table.field_id,table.field1,table.field2
FROM table;
It works to a blank table, because none of the primary keys exist. As soon as the primary key exists it fails.
Would it be easier for me to overwrite the primary keys? or find the primary key and update the fields related to the field_id? Im really not sure how to go ahead from here. The data needs to be migrated every 5min, so possibly a stored procedure is required?
first you should try to add new records then update all records.you can create a procedure like below code
PROCEDURE sync_Data(a IN NUMBER ) IS
BEGIN
insert into db2.table
select *
from db1.table t
where t.field_id not in (select tt.field_id from db2.table tt);
begin
for t in (select * from db1.table) loop
update db2.table aa
set aa.field1 = t.field1,
aa.field2 = t.field2
where aa.field_id = t.field_id;
end loop;
end;
END sync_Data
Set IsIdentity to No in Identity Specification on the table in which you want to move data, and after executing your script, set it to Yes again
I ended up just removing the data in the new database and sending it again.
DELETE FROM db2.table WHERE db2.table.field_id != 0;
USE db1;
INSERT INTO db2.table(field_id,field1,field2)
SELECT table.field_id,table.field1,table.field2
FROM table;
Its not very efficient, but gets the job done. I couldnt figure out the syntax to correctly do an UPDATE or to change the IsIdentity field within MariaDB, so im not sure if they would work or not.
The overhead of deleting and replacing non-trivial amounts of data for an entire table will be prohibitive. That said I'd prefer to update in place (merge) over delete /replace.
USE db1;
INSERT INTO db2.table(field_id,field1,field2)
SELECT t.field_id,t.field1,t.field2
FROM table t
ON DUPLICATE KEY UPDATE field1 = t.field1, field2 = t.field2
This can be used inside a procedure and called every 5 minutes (not recommended) or you could build a trigger that fires on INSERT and UPDATE to keep the tables in sync.
INSERT INTO database1.tabledata SELECT * FROM database2.tabledata;
But you have to keep length of varchar length larger or equal to database2 and keep the same column name

SQL update records with join table

Currently I need to move three columns from table A to table B. And I am using the update join table script to copy the existing data to the new columns. Afterwards the old column at table A will be drop.
Alter table NewB add columnA integer
Alter table NewB add columnB integer
Update NewB
Set NewB.columnA = OldA.columnA, NewB.columnB = OldA.columnB
From NewB
Join OldA on NewB.ID = OldA.ID
Alter table OldA drop column columnA
Alter table OldA drop column columnB
These script will add new columns and update the existing data from the old table to the newly created columns. Then remove the old columns.
But due to system structure, I will required to run SQL Script for more than one times to makes sure the database is up to date.
Although I did If (Columns Exist) Begin (Alter Add, Update, Alter Drop) End to ensure the existence of columns required. But when the script runs at the next time, it will hit error that says the columns was not found from the old table in the "update" query. Because the columns were dropped when the script run at the first time.
Is there other ways to solve?
you will not be able to update using join, But you can do like this :
Update NewB set NewB.columnA = (select OldA.columnA from OldA where NewB.ID = OldA.ID);
Update NewB set NewB.columnB = (select OldA.columnB from OldA where NewB.ID = OldA.ID);
I don't know which database you are using, in database there are some system tables, from where you can get whether the column does exist in table or not, like in oracle, All_TAB_COLUMNS contains the information of all the columns of tables, so you can hit that table like below :
select 1 from ALL_TAB_COLUMNS where TABLE_NAME='OldA' and COLUMN_NAME in ('columnA','columnB');
if resulting records are empty that means specified columns are not present in the table so you can skip your queries.
There must be something wrong with your is column exists check. I have similar DDL and DML operations many times. As you did not show how you are checking column existence I am not able to tell you what's wrong.
Anyway, you are adding a new column to a table. We can check if such column exists, if not - run the script, if yes- skip the script. And here is the check:
IF EXISTS(SELECT 1 FROM [sys].[columns] WHERE OBJECT_ID('[dbo].[NewB]') = [object_id] AND [name] = 'columnA')
BEGIN
BEGIN TRANSACTION;
....
COMMIT TRANSACTION;
END;

Create table with checksum of all tables in a database?

I'm trying to figure out how to determine if a table has been affected by a number of processes that run in sequence, and need to know what the state of the table is before and after each runs. What I've been trying to do is run some SQL before all the processes run that saves a before checksum of every table in the db to a table, then running it again when each ends and updating the table row with an after checksum. After all the processes are over, I compare the checksums and get all rows where before <> after.
Only problem is that I'm not the best guy for SQL, and am a little lost. Here's where I'm at right now:
select checksum_agg(binary_checksum(*)) from empcomp with (nolock)
create table Test_CheckSum_Record ( TableName varchar(max), CheckSum_Before int, CheckSum_After int)
SELECT name into #TempNames
FROM sys.Tables where is_ms_shipped = 0
And the pseudocode for what I want to do is something like
foreach(var name in #TempNames)
insert into Test_CheckSum_Record(name, ExecuteSQL N'select checksum_agg(binary_checksum(*)) from ' + name + ' with (nolock)', null)
But how does one do this?
Judging by the comments you need to create a trigger that handles all CRUD operations and just places a flag
Syntax is
Create TRIGGER [TriggerName] ON [TableName]
AFTER UPDATE, AFTER Delete, AFTER UPDATE
In the trigger you can do a
select CHECKSUM_AGG([Columns you want to compare against])
from [ParentTable] store that value in a variable and check it against the checksum table before column. If it does not exist you add a new entry with the DELETED tables checksum_AGG value as the before entry
Please note the choice not to use the inserted table is just preference for me on calculated columns
I will edit later when I have more time to add code

Nested Loop in Where Statement killing performance

I am having serious performance issues when using a nested loop in a WHERE clause.
When I run the below code as is, it takes several minutes. The trick is I'm using the WHERE clause to pull ALL data if the report_id is NULL, but only certain report_id's if I set them in the parameter string.
The function [fn_Parse_List] turns a VARCHAR string such as '123,456,789' into a table where each row is each number in integer form, which is then used in the IN clause.
When I run the code below with report_id = '456' (the dashed out portion), the code takes seconds, but passing the temporary table and using the SELECT statement in the WHERE clause kills it.
alter procedure dbo.p_revenue
(#report_id varchar(max) = NULL)
as
select cast(value as int) Report_ID
into #report_ID_Temp
from [fn_Parse_List] (#report_id)
SELECT *
FROM BIGTABLE
where #report_id is null
or a.report_id in (select Report_ID from #report_ID_Temp)
--Where #report_id is null or a.report_id in (456)
exec p_revenue #report_id = '456'
Is there a way to optimize this? I tried a JOIN with the table #report_ID_Temp, but it still takes just as long and doesn't work when the report_id is NULL.
You're breaking three different rules.
If you want two query plans, you need two queries: OR does not give you two query plans. IF does.
If you have a temporary table, make sure it has a primary key and any appropriate indexes. In your case, you need an ALTER TABLE statement to add the primary key clustered index. Or you can CREATE TABLE to declare the structure in the first place.
If you think fn_Parse_List is a good idea, you haven't read enough Sommarskog
If I were to write the Stored Procedure for your case, I would use a Table Valued Parameter (TVP) instead of passing multiple values as a comma-seperated string.
Something like the following:
-- Create a type for the TVP
CREATE TYPE REPORT_IDS_PAR AS TABLE(
report_id INT
);
GO
-- Use the TVP type instead of VARCHAR
CREATE PROCEDURE dbo.revenue
#report_ids REPORT_IDS_PAR READONLY
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS(SELECT 1 FROM #report_ids)
SELECT
*
FROM
BIGTABLE;
ELSE
SELECT
*
FROM
#report_ids AS ids
INNER JOIN BIGTABLE AS bt ON
bt.report_id=ids.report_id;
-- OPTION(RECOMPILE) -- see remark below
END
GO
-- Execute the Stored Procedure
DECLARE #ids REPORT_IDS_PAR;
-- Empty table for all rows:
EXEC dbo.revenue #ids;
-- Specific report_id's for specific rows:
INSERT INTO #ids(report_id)VALUES(123),(456),(789);
EXEC dbo.revenue #ids;
GO
If you run this procedure with a TVP with a lot of rows or a wildly varying number of rows, I suggest you add the option OPTION(RECOMPILE) to the query.
I see 2 possible things that could help improve performance. Depends on which part is taking the longest. First off, SELECT INTO is a single threaded operation until SQL Server 2014. If this is taking a long time, create an explicitly defined temp table with CREATE TABLE. Secondly, depending on the number of records inserted into the temp table, you probably need an index on the Report_ID column. That can all be done in the body of the stored procedure. If you do end up using an explicitly defined temp table, I would create the index after the data is loaded.
If that doesn't help, first check that the report_id column on the BIGTABLE is indexed. Then try splitting the select into 2 and combining with a UNION ALL like this:
ALTER PROCEDURE dbo.p_revenue
(
#report_id VARCHAR(MAX) = NULL
)
AS
SELECT CAST(value AS INT) Report_ID
INTO #report_ID_Temp
FROM fn_Parse_List(#report_id);
SELECT *
FROM BIGTABLE
WHERE #report_id IS NULL
UNION ALL
SELECT *
FROM BIGTABLE
WHERE a.report_id IN ( SELECT Report_ID
FROM #report_ID_Temp );
GO
EXEC p_revenue #report_id = '456';
Are you saying I should have two queries, one where it pulls if the report_id doesn't exists and one where there is a list of report_ids?
Yes, yes, yes. The fact, that it somehow works when You enter the numbers directly, distracts You from the core problem. You need table scan when #report_id is null and index seek when it is not and You can not have both in one execution plan. The performance would inevitably have to suffer, one way or another.
I would prefer not to, as the table i'm pulling from is actually a
view with 800 lines with an additional parameter not shown above.
I do not see where is the problem, SELECT * FROM BIGTABLE and SELECT * FROM BIGVIEW seems the same. If You need parameters You can use inline table valued function. If You have more parameters with variable selectivity like #report_id, I guess You would end up with dynamic sql anyway, sooner or later.
UNION ALL as proposed by #db_brad would help, but one of those subquery is executed even when there is no need for it.
As a quick patch You can append OPTION(RECOMPILE) to the SELECT and have table scan one time and index seek the other time, but recompiling every time would induce nontrivial overhead.

select the rows affected by an update

If I have a table with this fields:
int:id_account
int:session
string:password
Now for a login statement I run this sql UPDATE command:
UPDATE tbl_name
SET session = session + 1
WHERE id_account = 17 AND password = 'apple'
Then I check if a row was affected, and if one indeed was affected I know that the password was correct.
Next what I want to do is retrieve all the info of this affected row so I'll have the rest of the fields info.
I can use a simple SELECT statement but I'm sure I'm missing something here, there must be a neater way you gurus know, and going to tell me about (:
Besides it bothered me since the first login sql statement I ever written.
Is there any performance-wise way to combine a SELECT into an UPDATE if the UPDATE did update a row?
Or am I better leaving it simple with two statements? Atomicity isn't needed, so I might better stay away from table locks for example, no?
You should use the same WHERE statement for SELECT. It will return the modified rows, because your UPDATE did not change any columns used for lookup:
UPDATE tbl_name
SET session = session + 1
WHERE id_account = 17 AND password = 'apple';
SELECT *
FROM tbl_name
WHERE id_account = 17 AND password = 'apple';
An advice: never store passwords as plain text! Use a hash function, like this:
MD5('apple')
There is ROW_COUNT() (do read about details in the docs).
Following up by SQL is ok and simple (which is always good), but it might unnecessary stress the system.
This won't work for statements such as...
Update Table
Set Value = 'Something Else'
Where Value is Null
Select Value From Table
Where Value is Null
You would have changed the value with the update and would be unable to recover the affected records unless you stored them beforehand.
Select * Into #TempTable
From Table
Where Value is Null
Update Table
Set Value = 'Something Else'
Where Value is Null
Select Value, UniqueValue
From #TempTable TT
Join Table T
TT.UniqueValue = T.UniqueValue
If you're lucky, you may be able to join the temp table's records to a unique field within Table to verify the update. This is just one small example of why it is important to enumerate records.
You can get the effected rows by just using ##RowCount..
select top (Select ##RowCount) * from YourTable order by 1 desc