Update Query with NVARCHAR(max) in SQL Server - sql

Have an issue with SQL Server performance and wanted to see if anyone can give some tips about improving the performance of an update query.
What I'm doing is updating one table with data from another table. Here's some of the basics:
SQL Server 2008 R2
Data is pumped to WO table originally from other system (pumped in using datareader and sqlbulkcopy in ADO.NET)
Additional data is pumped to TEMP_REMARKS (pumped in using datareader and sqlbulkcopy in ADO.NET)
Unfortunately, combining the WO and REMARKS in the originating system (via the reader query) is not possible (mainly performance reasons)
Update to WO occurs using value from TEMP_REMARKS where two columns are updated
Note that the column being transferred from TEMP_REMARKS to REMARKS is a nvarchar(max) and is being placed into another nvarchar(max) column (actually two - see query)
WO has 4m+ records
TEMP_REMARKS has 7m+ records
For the join between the two, the following is what is being used:
/* === UPDATE THE DESCRIPTION */
UPDATE WO
SET WO_DESCRIPTION = TEMP_REMARKS.REMARKS
FROM WO
INNER JOIN TEMP_REMARKS ON WO.WO_DESCRIPTION_ID = TEMP_REMARKS.REMARKS_ID;
/* === UPDATE THE FINDINGS */
UPDATE WO
SET FINDINGS = TEMP_REMARKS.REMARKS
FROM WO
INNER JOIN TEMP_REMARKS ON WO.FINDINGS_ID = TEMP_REMARKS.REMARKS_ID;
The problem at this point is that the update to the WO table is taking over two hours to complete. I've tried using the MERGE statement with no success. I've got other more completed procedures in the db that don't take nearly as long, so I'm convinced that it is not the configuration of the SQL Server itself.
Is there something that should be done when updating nvarchar(max) columns?
What can be done to improve the performance of this query?
Here are the table definitions:
CREATE TABLE [dbo].[WO](
[DOCUMENT_ID] [decimal](18, 0) NOT NULL,
[WO_DESCRIPTION_ID] [decimal](18, 0) NULL,
[WO_DESCRIPTION] [nvarchar](max) NULL,
[FINDINGS_ID] [decimal](18, 0) NULL,
[FINDINGS] [nvarchar](max) NULL,
.... bunch of other fields
CONSTRAINT [PK_WO] PRIMARY KEY CLUSTERED
(
[DOCUMENT_ID] ASC
)
This is the table definition for the TEMP_REMARKS:
CREATE TABLE [dbo].[TEMP_REMARKS](
[REMARKS_ID] [decimal](18, 0) NOT NULL,
[REMARKS] [nvarchar](max) NULL
) ON [PRIMARY]

I think, first of all you should consider to create primary key on TEMP_REMARKS, or at least some index on REMARKS_ID

Related

Missed row when running SELECT with READCOMMITTEDLOCK

I have a T-SQL code that delta-copies data from the source table (SrcTable) to the destination table (DestTable). The data is inserted into the source table by multiple sessions and copied to the destination table by a SQL Server Agent job.
Here's the snippet which inserts the batch into the destination table:
...
WITH cte
AS (SELECT st.SrcTable_ID,
st.SrcTable_CreatedDateTime
FROM SrcTable st WITH (READCOMMITTEDLOCK, INDEX(PK_SrcTable))
WHERE st.SrcTable_ID
BETWEEN #FromID AND #ToID)
INSERT DestTable
(
DestTable_SrcTableID
)
SELECT cte.SrcTable_ID
FROM cte;
...
both tables are partitioned on CreatedDateTime column which default to SYSUTCDATETIME
CREATE TABLE [dbo].[SrcTable](
[SrcTable_ID] [BIGINT] IDENTITY(1,1) NOT NULL,
[SrcTable_CreatedDateTime] [DATETIME2](3) NOT NULL,
CONSTRAINT [PK_SrcTable] PRIMARY KEY CLUSTERED
(
[SrcTable_ID] ASC,
[SrcTable_CreatedDateTime] ASC
) ON [ps_Daily]([SrcTable_CreatedDateTime])
) ON [ps_Daily]([SrcTable_CreatedDateTime])
GO
CREATE TABLE [dbo].[DestTable](
[DestTable_ID] [BIGINT] IDENTITY(1,1) NOT NULL,
[DestTable_CreatedDateTime] [DATETIME2](3) NOT NULL,
[DestTable_SrcTableID] [BIGINT] NOT NULL,
CONSTRAINT [PK_DestTable] PRIMARY KEY CLUSTERED
(
[DestTable_ID] ASC,
[DestTable_CreatedDateTime] ASC
) ON [ps_Daily]([DestTable_CreatedDateTime])
) ON [ps_Daily]([DestTable_CreatedDateTime])
GO
This code has been running for years copying millions of records a day with no issues.
Recently it started missing a single row every couple of weeks.
Here's an example of such a batch with #FromID=2140 and #ToID=2566 and one missing row (2140)
SELECT * FROM dbo.SrcTable st
LEFT JOIN dbo.DestTable dt ON st.SrcTable_ID=dt.DestTable_SrcTableID
WHERE st.SrcTable_ID BETWEEN 2140 AND 2566
ORDER BY st.SrcTable_ID ASC
The only plausible explanation that I can think of is that the allocation of identity values (SrcTable_ID) happens outside of the transaction which inserts into the source table (which I learned from an excellent answer by Paul White on the related question, but judging by the time stamps in both tables this scenario seems highly unlikely.
The question is:
How likely is it that the missing row was invisible to the SELECT statement because its' identity was allocated outside of the inserting transaction and before the lock was acquired, given the fact that the next row in the batch (2141) was inserted into the source table a couple of seconds later but was successfully picked up?
We're running on Microsoft SQL Server 2019 (RTM-CU16) (KB5011644) - 15.0.4223.1 (X64)

IF-THEN-ELSE: Create / Truncate Table - SQL Server

I am attempting to create a backup table without having to re-create it every single time. If the table already exists in the next run then it should simply truncate the table.
But it doesn't seem to be working. It says backup_reportsettings is already in the database. Can anyone assist me with this?
--Only re-create table if table does not exist otherwise truncate the existing table.
IF NOT EXISTS (SELECT * FROM [Misc].sys.tables where name= 'dbo.backup_reportsettings')
CREATE TABLE [MISC].dbo.backup_reportsettings
(
[datestamp] [datetime] NULL,
[reportsettingid] [char](8) NOT NULL,
[description] [char](30) NOT NULL,
[formname] [char](30) NOT NULL,
[usersid] [char](8) NOT NULL,
[settings] [text] NOT NULL,
[notes] [varchar](255) NOT NULL,
[userdefault] [char](1) NOT NULL
)
ELSE
TRUNCATE TABLE [Misc].dbo.backup_reportsettings;
What am I doing wrong? Note: this is done within a transaction.
Object names in sys.tables don't have the schema as part of the name. Remove the table schema when verifying whether the table exists:
IF NOT EXISTS (SELECT * FROM [Misc].sys.tables where name= 'backup_reportsettings')
Despite the use of IF, SQL Server needs to Parse/Compile all the statements in your script, so when it sees a CREATE TABLE statement it will give you a compilation error if the table already exists, even though the IF would prevent that code from being executed when that is the case.
The way to get around this is to put your CREATE TABLE statement in dynamic SQL, which will not be parsed/compiled before execution.

Get value of PRIMARY KEY during SELECT in ORACLE

For a specific task I need to store the identity of a row in a tabel to access it later. Most of these tables do NOT have a numeric ID and the primary key sometimes consists of multiple fields. VARCHAR & INT combined.
Background info:
The participating tables have a trigger storing delete, update and insert events in a general 'sync' tabel (Oracle v11). Every 15 minutes a script is then launched to update corresponding tables in a remote database (SQL Server 2012).
One solution I came up with was to use multiple columns in this 'sync' table, 3 INT columns and 3 VARCHAR columns. A table with 2 VARCHAR columns would then use 2 VARCHAR columns in this 'sync' table.
A better/nicer solution would be to 'select' the value of the primary key and store this in this table.
Example:
CREATE TABLE [dbo].[Workers](
[company] [nvarchar](50) NOT NULL,
[number] [int] NOT NULL,
[name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Workers] PRIMARY KEY CLUSTERED ( [company] ASC, [number] ASC )
)
// Fails:
SELECT [PK_Workers], [name] FROM [dbo].[Workers]
UPDATE [dbo].[Workers] SET [name]='new name' WHERE [PK_Workers]=#PKWorkers
// Bad (?) but works:
SELECT ([company] + CAST([number] AS NVARCHAR)) PK, [name] FROM [dbo].[Workers];
UPDATE [dbo].[Workers] SET [name]='newname' WHERE ([company] + CAST([number] AS NVARCHAR))=#PK
The [PK_Workers] fails in these queries. Is there another way to get this value without manually combining and casting the index?
Or is there some other way to do this that I don't know?
for each table create a function returning a concatenated primary key. create a function based index on this function too. then use this function in SELECT and WHERE clauses

Inner join between different database

I want to create a table using the following script in a database called DeltaDatabase:
CREATE TABLE [dbo].[OutStatus](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[OutId] [int] NOT NULL,
[StatusType] [varchar](255) NULL,
[StatusDate] [datetime] NULL)
I would then like to INNER JOIN a column into this table from another database called CoreDatabase.
The column name is sourceId from the table Client. So in other words OutId needs to be foreign key of SourceId.
How do I join that column into my OutStatus table from the other database using the create table script?
The basic syntax to retrieve data would be:
SELECT *
FROM CoreDatabase.dbo.Client c
INNER JOIN DeltaDatabase.dbo.OutStatus os ON c.SourceId = os.OutId
You need to fully qualify the tables name with: DatabaseName.Schema.TableName
You may wish to limit the columns or add a where clause to reduce the data that is returned.
As far as creating a foreign key across databases goes, it's not something you can do. You would have to use triggers or some other logic to maintain referential integrity between the primary and foreign keys.
Try the below query
Select * from DeltaDatabase.dbo.OutStatus OUS
Inner Join CoreDatabase.dbo.Client CL on OUS.OutId=CL.sourceId

How to maintain history of multiple tables in a single table without using CDC feature

Is it possible to consolidate the history of all the tables into a single table?
I tried to use the CDC feature provided by SQL server 2012 enterprise edition, but for that it creates a copy of every table, which increases the number of tables in the database.
Is it also possible track & insert the table name & column name in which DML has occurred into the history table? Will this cause any issues with performance?
Here is one solution using triggers.
1 - Create a trigger for each table that you want history on.
2 - Copy the modified data (INS, UPD, DEL) from base table to audit table during the action.
3 - Store all the data in XML format so that multiple tables can store data in the same audit table.
I did cover this in one of my blog articles. It is a great solution for auditing small amounts of data. There might be an overhead concern when dealing with thousands of record changes per second.
Please test before deploying to a production environment!
Here is the audit table that keeps track of the table name as well as the type of change.
/*
Create data level auditing - table.
*/
-- Remove table if it exists
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'[ADT].[LOG_DML_CHANGES]') AND type in (N'U'))
DROP TABLE [ADT].[LOG_DML_CHANGES]
GO
CREATE TABLE [ADT].[LOG_DML_CHANGES]
(
[ChangeId]BIGINT IDENTITY(1,1) NOT NULL,
[ChangeDate] [datetime] NOT NULL,
[ChangeType] [varchar](20) NOT NULL,
[ChangeBy] [nvarchar](256) NOT NULL,
[AppName] [nvarchar](128) NOT NULL,
[HostName] [nvarchar](128) NOT NULL,
[SchemaName] [sysname] NOT NULL,
[ObjectName] [sysname] NOT NULL,
[XmlRecSet] [xml] NULL,
CONSTRAINT [pk_Ltc_ChangeId] PRIMARY KEY CLUSTERED ([ChangeId] ASC)
)
GO
Here is the article.
http://craftydba.com/?p=2060
The image below shows a single [LOG_DML_CHANGES] table with multiple [TRG_TRACK_DML_CHGS_XXX] triggers.
If you want to more than record that user x updated/deleted/inserted table y id x at time t then it will cause problems.
Choose the tables you want to audit; create Audit tables for them and update them from triggers on the base table. Lot of work, but the best way of doing it.