Diff / Delta script: ideas on streamlining it?

Diff / Delta script: ideas on streamlining it? - sql

I'm pulling data from a remote DB into a local MS SQL Server DB, the criteria being whether the PK is higher than I have in my data warehouse or whether the edit date is within a range that I provide with an argument. That works super fast so I am happy with it. However, when I attempt to sync this delta table into my data warehouse it takes quite a long time.
Here's my SPROC:
ALTER PROCEDURE [dbo].[sp_Sync_Delta_Table]
#tableName varchar(max)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sql as varchar(4000)
-- Delete rows in MIRROR database where ID exists in the DELTA database
SET #sql = 'Delete from [SERVER-WHS].[MIRROR].[dbo].[' + #tableName
+ '] Where [ID] in (Select [ID] from [SERVER-DELTA].[DELTAS].[dbo].[' + #tableName + '])'
EXEC(#sql)
-- Insert all deltas
SET #sql = 'Insert Into [SERVER-WHS].[MIRROR].[dbo].[' + #tableName
+ '] Select * from [SERVER-DELTA].[DELTAS].[dbo].[' + #tableName + ']'
EXEC(#sql)
END
It works, but I think it takes way too long. For example: inserting 3590 records from the DELTA table into the MIRROR table containing 3,600,761 took over 25 minutes.
Can anyone give me a hint on how I can make this job easier on SSMS? I'm using 2008 R2, btw.
Thanks again!
Nate

The issue is likely the time required to do a table scan on the 3,600,761 to see if the new records are unique.
First of all, let's confirm that the primary key (ID) on the target table is the clustered index and increasing.
SELECT s.name, o.name, i.name, i.type_desc, ic.key_ordinal, c.name
FROM sys.objects o
JOIN sys.columns c ON (c.object_id = o.object_id)
JOIN sys.schemas s ON (s.schema_id = o.schema_id)
JOIN sys.indexes i ON (i.object_id = o.object_id)
JOIN sys.index_columns ic ON (ic.object_id = i.object_id AND ic.index_id = i.index_id AND ic.column_id = c.column_id)
WHERE o.name = '[table_name]'
If the index is not an ascending integer, it is possible that the inserts are causing lots of page splits.
Second, what other objects does that insert affect. Are there triggers, materialized views, or non-clustered indexes?
Third, do you have
My suggestion would be to stage the data on the mirror server in a local table. It can be as simple as as:
SET #sql = 'SELECT INTO * [MIRROR].[dbo].[' + #tableName + '_Staging] from [SERVER-DELTA].[DELTAS].[dbo].[' + #tableName + ']'
EXEC(#sql)
After that add a clustered primary key to the table.
SET #sql = 'ALTER TABLE [MIRROR].[dbo].[' + #tableName + '_Staging] ADD CONSTRAINT [PK_' + #tableName + '] PRIMARY KEY CLUSTERED (Id ASC)'
EXEC(#sql)
At this point, try inserting the data into the real table. The optimizer should be much more helpful this time.

Change the delete portion to:
SET #sql = 'Delete tbl1 from [SERVER-WHS].[MIRROR].[dbo].[' + #tableName
+ '] tbl1 inner join [SERVER-DELTA].[DELTAS].[dbo].[' + #tableName + '] tbl2 on tbl1.[ID] = tbl2.[ID]'
In future use INNER JOIN instead of IN with Sub Query.

Related

Add Primary Key constraint automatically

I have the following table InfoSchema which contains the SchemaName and the TableName of each table in my test database :
SchemaName TableName
dbo Employee
dbo Department
Function Company
Finance Payslips
Sub ProjectSub
I want to add for each table the constraint PrimaryKey to the column ending with ID or Id :
In dbo.Employee there is one column EmployeeId so the query will be like below :
ALTER TABLE dbo.Employee
ADD CONSTRAINT Employee_pk PRIMARY KEY (EmployeeId);
For Sub.ProjectSub there are 3 columns ending with Id :
ProjectId
CompanyId
SubId
The constraint will be added at the first column appearing in the structure of the table.

As I mention in my comment, you can use a dynamic statement to create the statements. I very strongly suggest looking over the SQL generated, however, so I do not include an EXEC sp_executesql statement here. PRINT or SELECT the value of #SQL and check it over first, then run the statements as you need:
DECLARE #SQL nvarchar(MAX),
#CRLF nchar(2) = NCHAR(13) + NCHAR(10);
SET #SQL = STUFF((SELECT #CRLF + #CRLF +
N'ALTER TABLE ' + QUOTENAME(s.[name]) + N'.' + QUOTENAME(t.[name]) + #CRLF +
N'ADD CONSTRAINT ' + QUOTENAME(CONCAT(t.[name],N'_PK')) + N' PRIMARY KEY (' + QUOTENAME(c.[name]) + N');'
FROM sys.schemas s
JOIN sys.tables t ON s.schema_id = t.schema_id
CROSS APPLY (SELECT TOP 1 *
FROM sys.columns c
WHERE c.object_id = t.object_id
AND c.name LIKE '%id'
ORDER BY c.column_id ASC) c
WHERE NOT EXISTS (SELECT 1
FROM sys.key_constraints k
WHERE k.[type] = 'PK'
AND k.parent_object_id = t.object_id)
FOR XML PATH(N''),TYPE).value('.','nvarchar(MAX)'),1,4,N'');
PRINT #SQL;
This assumes that the first column, ordinally, needs to be the PK, and it will not attempt to create a PK on a table that already has one.

Not possible. There is no automatism for this in SQL Server, so it will require at least a script to run over the db to identify tables and columns and issue modify statements. Which is not "automatic" as it will not RUN automatic - you need to run it in a second step.

Convert multiple int primary keys from different tables into Identity

I have a dozen or so different databases with similar structure, with around 50 different tables each, and some of these tables used a sequential int [Id] as Primary Key and Identity.
At some point, these databases were migrated to a different remote infrastructure, namely from Azure to AWS, and somewhere in the process, the Identity property was lost, and as such, new automated inserts are not working as it fails to auto-increment the Id and generate a valid primary key.
I've tried multiple solutions, but am struggling to get any of them to work, as SQL-Server seems extremely finicky with letting you mess with or alter value of Identity columns in any way, and it's driving me insane.
I need to re-enable the Identity in multiple different tables, in multiple databases, but the solutions I've found so far are either extremely convoluted or impractical, for what seems to be a relatively simple problem.
tl;dr - How can I enable Identity for all my int primary keys in multiple different tables at the same time?
My approach so far:
CREATE PROC Fix_Identity #tableName varchar(50)
AS
BEGIN
IF NOT EXISTS(SELECT * FROM sys.identity_columns WHERE OBJECT_NAME(object_id) = #tableName)
BEGIN
DECLARE #keyName varchar(100) = 'PK_dbo.' + #tableName;
DECLARE #reName varchar(100) = #tableName + '.Id_new';
EXEC ('Alter Table ' + #tableName + ' DROP CONSTRAINT ['+ #keyName +']');
EXEC ('Alter Table ' + #tableName + ' ADD Id_new INT IDENTITY(1, 1) PRIMARY KEY');
EXEC ('SET IDENTITY_INSERT [dbo].[' + #tableName + '] ON');
EXEC ('UPDATE ' + #tableName + ' SET [Id_new] = [Id]');
EXEC ('SET IDENTITY_INSERT [dbo].[' + #tableName + '] OFF');
EXEC ('Alter Table ' + #tableName + ' DROP COLUMN Id');
EXEC sp_rename #reName, 'Id', 'Column';
END
END;
I tried creating this procedure, to be executed once per table, but i'm having problems with the UPDATE statement, which I require to guarantee that the new values Identity column will have the same value as the old Id column, but this approach currently doesn't work because:
Cannot update identity column 'Id_new'.

There are assumptions made in that script that you might want to look out for specifically with assuming the PK constraint name. You might want to double check that on all of your tables before. The rest of your script seemed to make sense to me except you will need to reseed the index after updating the data in the new column.
See if this helps:
select t.name AS [Table],c.Name AS [Non-Indent PK],i.name AS [PK Constraint]
from sys.columns c
inner join sys.tables t On c.object_id=t.object_id
inner join sys.indexes i ON i.object_id=c.object_id
AND i.is_primary_key = 1
INNER JOIN sys.index_columns ic ON i.object_id=ic.object_id
AND i.index_id = ic.index_id
AND ic.column_id=c.column_id
WHERE c.Is_Identity=0

Instead of adding an identity, create a sequence and default constraint
declare #table nvarchar(50) = N'dbo.T'
declare #sql nvarchar(max) = (N'select #maxID = max(Id) FROM ' + #table);
declare #maxID int
exec sp_executesql #sql, N'#maxID int output', #maxID=#maxID OUTPUT;
set #sql = concat('create sequence ', #table, '_sequence start with ', #maxID + 1, ' increment by 1')
exec(#sql)
set #sql = concat('alter table ', #table, ' add default(next value for ', #table, '_sequence) for ID ')
exec(#sql)

SQL: Query many tables with same column name but different structure for specific value

I'm working on cleaning up an ERP and I need to get rid of references to unused users and user groups. There are many foreign key constraints and therefor I want to be sure to really get rid of all traces!
I found this tidy tidbit of code to find all tables in my db with a certain column name, in this case let's look at the user groups:
select table_name from information_schema.columns
where column_name = 'GROUP_ID'
With the results I can search through the 40+ tables for my unused ID... but this is tedius. So I'd like to automate this and create a query that loops through all these tables and deletes the rows where it finds Unused_Group in the GROUP_ID column.
Before deleting anything I'd like to visualize the existing data, so I started to build something like this using string concatenation:
declare #group varchar(50) = 'Unused_Group'
declare #table1 varchar(50) = 'TABLE1'
declare #table2 varchar(50) = 'TABLE2'
declare #tableX varchar(50) = 'TABLEX'
select #query1 = 'SELECT ''' + rtrim(#table1) + ''' as ''Table'', '''
+ rtrim(#group) + ''' = CASE WHEN EXISTS (SELECT GROUP_ID FROM ' + rtrim(#table1)
+ ' WHERE GROUP_ID = ''' + rtrim(#group) + ''') then ''MATCH'' else ''-'' end FROM '
+ rtrim(#table1)
select #query2 = [REPEAT FOR #table2 to #tableX]...
EXEC(#query1 + ' UNION ' + #query2 + ' UNION ' + #queryX)
This gives me the results:
TABLE1 | Match
TABLE2 | -
TABLEX | Match
This works for my purposes and I can run it for any user group without changing any other code, and is of course easily adaptable to DELETE from these same tables, but is unmanageable for the 75 or so tables that I have to deal with between users and groups.
I ran into this link on dynamic SQL which was intense and dense enough to scare me away for the moment... but I think the solution might be in there somewhere.
I'm very familiar with FOR() loops in JS and other languages, where this would be a piece of cake with a well structured array, but apparently it's not so simple in SQL (I'm still learning, but found alot of negative talk about the FOR and GOTO solutions available...). Ideally a I'd have a script that queries to find tables with a certain column name, query each table as above, and spit me a list of matches, and then execute a second similar script to delete the rows.
Can anyone help point me in the right direction?

Ok, try this, there are three variables; column, colValue and preview. Column should be the column you're checking equality on (Group_ID), colValue the value you're looking for (Unused_Group) and preview should be 1 to view what you'll delete and 0 to delete it.
Declare #column Nvarchar(256),
#colValue Nvarchar(256),
#preview Bit
Set #column = 'Group_ID'
Set #colValue = 'Unused_Group'
Set #preview = 1 -- 1 = preview; 0 = delete
If Object_ID('tempdb..#tables') Is Not Null Drop Table #tables
Create Table #tables (tID Int, SchemaName Nvarchar(256), TableName Nvarchar(256))
-- Get all the tables with a column named [GROUP_ID]
Insert #tables
Select Row_Number() Over (Order By s.name, so.name), s.name, so.name
From sysobjects so
Join sys.schemas s
On so.uid = s.schema_id
Join syscolumns sc
On so.id = sc.id
Where so.xtype = 'u'
And sc.name = #column
Select *
From #tables
Declare #SQL Nvarchar(Max),
#schema Nvarchar(256),
#table Nvarchar(256),
#iter Int = 1
-- As long as there are tables to look at keep looping
While Exists (Select 1
From #tables)
Begin
-- Get the next table record to look at
Select #schema = SchemaName,
#table = TableName
From #tables
Where tID = #iter
-- If the table we're going to look at has dependencies on tables we have not
-- yet looked at move it to the end of the line and look at it after we look
-- at it's dependent tables (Handle foreign keys)
If Exists (Select 1
From sysobjects o
Join sys.schemas s1
On o.uid = s1.schema_id
Join sysforeignkeys fk
On o.id = fk.rkeyid
Join sysobjects o2
On fk.fkeyid = o2.id
Join sys.schemas s2
On o2.uid = s2.schema_id
Join #tables t
On o2.name = t.TableName Collate Database_Default
And s2.name = t.SchemaName Collate Database_Default
Where o.name = #table
And s1.name = #schema)
Begin
-- Move the table to the end of the list to retry later
Update t
Set tID = (Select Max(tID) From #tables) + 1
From #tables t
Where tableName = #table
And schemaName = #schema
-- Move on to the next table to look at
Set #iter = #iter + 1
End
Else
Begin
-- Delete the records we don't want anymore
Set #Sql = Case
When #preview = 1
Then 'Select * ' -- If preview is 1 select from table
Else 'Delete t ' -- If preview is not 1 the delete from table
End +
'From [' + #schema + '].[' + #table + '] t
Where ' + #column + ' = ''' + #colValue + ''''
Exec sp_executeSQL #SQL;
-- After we've done the work remove the table from our list
Delete t
From #tables t
Where tableName = #table
And schemaName = #schema
-- Move on to the next table to look at
Set #iter = #iter + 1
End
End
Turning this into a stored procedure would simply involve changing the variables declaration at the top to a sproc creation so you would get rid of...
Declare #column Nvarchar(256),
#colValue Nvarchar(256),
#preview Bit
Set #column = 'Group_ID'
Set #colValue = 'Unused_Group'
Set #preview = 1 -- 1 = preview; 0 = delete
...
And replace it with...
Create Proc DeleteStuffFromManyTables (#column Nvarchar(256), #colValue Nvarchar(256), #preview Bit = 1)
As
...
And you'd call it with...
Exec DeleteStuffFromManyTable 'Group_ID', 'Unused_Group', 1
I commented the hell out of the code to help you understand what it's doing; good luck!

You're on the right track with INFORMATION_SCHEMA objects. Execute the below in a query editor, it produces SELECT and DELETE statements for tables that contain GROUP_ID column with 'Unused_Group' value.
-- build select DML to manually review data that will be deleted
SELECT 'SELECT * FROM [' + TABLE_SCHEMA + '].[' + TABLE_NAME + '] WHERE [GROUP_ID] = ''Unused_Group'';'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME = 'GROUP_ID';
-- build delete DML to remove data
SELECT 'DELETE FROM [' + TABLE_SCHEMA + '].[' + TABLE_NAME + '] WHERE [GROUP_ID] = ''Unused_Group'';'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME = 'GROUP_ID';
Since this seems to be a one-time cleanup effort, and especially since you need to review data before it is deleted, I don't see the value in making this more complicated.

Consider adding referential integrity and enforcing cascade delete, if you can. It won't help with visualizing the data before you delete it, but will help controlling orphaned rows.

Automatically Drop and Recreate current indexes

I'm working on providing a method to allow for bulk updating our tables ( greater than 1M new or updated rows per update) and was interested in dropping the current indexes and recreating them after the updates.
I was wondering if anyone has a script to provide loose coupling of these operations so that if the indexes change over time, the update process does not change.
It seems like this is one of those things that the community has already probably solved.

I have script that I use to query the system tables to capture all non-clustered indexes and disable then rebuild upon completion. The below is for use on standard edition, if you are on enterprise I would add the ONLINE option.
Disable
DECLARE #sql AS VARCHAR(MAX);
SET #sql = '';
SELECT
#sql = #sql + 'ALTER INDEX [' + i.name + '] ON [' + o.name + '] DISABLE; '
FROM sys.indexes AS i
JOIN sys.objects AS o ON i.object_id = o.object_id
WHERE i.type_desc = 'NONCLUSTERED'
AND o.type_desc = 'USER_TABLE'
EXEC (#sql)
Rebuild
DECLARE #sql AS VARCHAR(MAX);
SET #sql = '';
SELECT
#sql = #sql + 'ALTER INDEX [' + i.name + '] ON [' + o.name + '] REBUILD WITH (FILLFACTOR = 80); '
FROM sys.indexes AS i
JOIN sys.objects AS o ON i.object_id = o.object_id
WHERE i.type_desc = 'NONCLUSTERED'
AND o.type_desc = 'USER_TABLE'
EXEC (#sql);
I like this method as it is very customizable as you can exclude/include certain tables based on the conditions as well as avoiding a cursor. Also you can change the EXEC to a PRINT and see the code that will execute and manually run it.
Condition to exclude a table
AND o.name NOT IN ('tblTest','tblTest1');

EXEC sp_MSforEachTable 'ALTER INDEX ALL ON ? DISABLE'
and
EXEC sp_MSforEachTable 'ALTER INDEX ALL ON ? REBUILD'
is all you need if you want to do it for all tables and every index.

Automate INDEX rebuild based on fragmentation results?

Is it possible to add a maintenance job to check indexes fragmentation. If greater than 50% then rebuild those indexes automatically ?
Index size can vary from 100MB to 10GB.
SQL 2005.
Thank you.

I use this script . Please note I would advise you reading up about the dmv I am using here they are a hidden gem in SQL2005+.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
CREATE TABLE #FragmentedIndexes
(
DatabaseName SYSNAME
, SchemaName SYSNAME
, TableName SYSNAME
, IndexName SYSNAME
, [Fragmentation%] FLOAT
)
INSERT INTO #FragmentedIndexes
SELECT
DB_NAME(DB_ID()) AS DatabaseName
, ss.name AS SchemaName
, OBJECT_NAME (s.object_id) AS TableName
, i.name AS IndexName
, s.avg_fragmentation_in_percent AS [Fragmentation%]
FROM sys.dm_db_index_physical_stats(db_id(),NULL, NULL, NULL, 'SAMPLED') s
INNER JOIN sys.indexes i ON s.[object_id] = i.[object_id]
AND s.index_id = i.index_id
INNER JOIN sys.objects o ON s.object_id = o.object_id
INNER JOIN sys.schemas ss ON ss.[schema_id] = o.[schema_id]
WHERE s.database_id = DB_ID()
AND i.index_id != 0
AND s.record_count > 0
AND o.is_ms_shipped = 0
DECLARE #RebuildIndexesSQL NVARCHAR(MAX)
SET #RebuildIndexesSQL = ''
SELECT
#RebuildIndexesSQL = #RebuildIndexesSQL +
CASE
WHEN [Fragmentation%] > 30
THEN CHAR(10) + 'ALTER INDEX ' + QUOTENAME(IndexName) + ' ON '
+ QUOTENAME(SchemaName) + '.'
+ QUOTENAME(TableName) + ' REBUILD;'
WHEN [Fragmentation%] > 10
THEN CHAR(10) + 'ALTER INDEX ' + QUOTENAME(IndexName) + ' ON '
+ QUOTENAME(SchemaName) + '.'
+ QUOTENAME(TableName) + ' REORGANIZE;'
END
FROM #FragmentedIndexes
WHERE [Fragmentation%] > 10
DECLARE #StartOffset INT
DECLARE #Length INT
SET #StartOffset = 0
SET #Length = 4000
WHILE (#StartOffset < LEN(#RebuildIndexesSQL))
BEGIN
PRINT SUBSTRING(#RebuildIndexesSQL, #StartOffset, #Length)
SET #StartOffset = #StartOffset + #Length
END
PRINT SUBSTRING(#RebuildIndexesSQL, #StartOffset, #Length)
EXECUTE sp_executesql #RebuildIndexesSQL
DROP TABLE #FragmentedIndexes
Also keep in mind that this script can run a while and block access to your tables. Unless you have Enterprise editions SQL can LOCK the table when rebuilding the index. This will block all queries to that table using the index till the index defrag is finished. Thus it is not advised to run index rebuild during operational hours only during maintenance windows. If you are running enterprise edition you can use the ONLINE=ON option to defrag indexes online. This will use more space but your tables wont be blocked/locked during the defrag operation.
Shout if you need more information.
UPDATED:
If you are running this query on a smaller database you can probably use the 'DETAILED' parameter in the call to sys.dm_db_index_physical_stats. This is probably a more detailed examination of the indexes. The discussion in the comments will also point out that on much larger tables it is probably worth doing a SAMPLED scan as this will help reduce the time needed to do the index scan.

In case, you were thinking of avoiding to create any temp tables and parsing the string to create a list of SQL strings. Here is an efficient way of accomplishing it:
USE databasename
GO
DECLARE #Queryresult NVARCHAR(4000)
SET #Queryresult=''
SELECT
#Queryresult=#Queryresult + 'ALTER INDEX ' + QUOTENAME(i.name) + ' ON '
+ QUOTENAME('dbo') + '.'
+ QUOTENAME(OBJECT_NAME(i.OBJECT_ID)) + ' REBUILD;'
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ss
INNER JOIN sys.indexes i ON i.OBJECT_ID = ss.OBJECT_ID AND i.index_id = ss.index_id
INNER JOIN sys.objects o ON ss.object_id = o.object_id
WHERE ss.avg_fragmentation_in_percent > 50
AND ss.record_count > 0
AND o.is_ms_shipped = 0 --Excludes any objects created as a part of SQL Server installation
AND ss.index_id > 0 --Excludes heap indexes
EXEC sp_executesql #Queryresult

yes, you can.
You can get the fragmented indexes using this query:
SELECT OBJECT_NAME(i.OBJECT_ID) AS TableName,
i.name AS IndexName,
indexstats.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') indexstats
INNER JOIN sys.indexes i ON i.OBJECT_ID = indexstats.OBJECT_ID
AND i.index_id = indexstats.index_id
WHERE indexstats.avg_fragmentation_in_percent > 20
and based on the result just build a command to recreate them
I would wrap everything on a Stored Procedure and call it from a SQL Server Job
FYI, 50% is a very big fragmentation. I would go with less.

You can use sys.dm_db_index_physical_stats to get information about your index fragmentation (see the avg_fragmentation_in_percent column). Then you can do an alter index with the rebuild clause whenever your threshold is reached.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Diff / Delta script: ideas on streamlining it? - sql

Change the delete portion to: SET #sql = 'Delete tbl1 from [SERVER-WHS].[MIRROR].[dbo].[' + #tableName + '] tbl1 inner join [SERVER-DELTA].[DELTAS].[dbo].[' + #tableName + '] tbl2 on tbl1.[ID] = tbl2.[ID]' In future use INNER JOIN instead of IN with Sub Query.

Related

Add Primary Key constraint automatically

Convert multiple int primary keys from different tables into Identity

SQL: Query many tables with same column name but different structure for specific value

Automatically Drop and Recreate current indexes

Automate INDEX rebuild based on fragmentation results?

Categories

Resources