Missed row when running SELECT with READCOMMITTEDLOCK - sql

I have a T-SQL code that delta-copies data from the source table (SrcTable) to the destination table (DestTable). The data is inserted into the source table by multiple sessions and copied to the destination table by a SQL Server Agent job.
Here's the snippet which inserts the batch into the destination table:
...
WITH cte
AS (SELECT st.SrcTable_ID,
st.SrcTable_CreatedDateTime
FROM SrcTable st WITH (READCOMMITTEDLOCK, INDEX(PK_SrcTable))
WHERE st.SrcTable_ID
BETWEEN #FromID AND #ToID)
INSERT DestTable
(
DestTable_SrcTableID
)
SELECT cte.SrcTable_ID
FROM cte;
...
both tables are partitioned on CreatedDateTime column which default to SYSUTCDATETIME
CREATE TABLE [dbo].[SrcTable](
[SrcTable_ID] [BIGINT] IDENTITY(1,1) NOT NULL,
[SrcTable_CreatedDateTime] [DATETIME2](3) NOT NULL,
CONSTRAINT [PK_SrcTable] PRIMARY KEY CLUSTERED
(
[SrcTable_ID] ASC,
[SrcTable_CreatedDateTime] ASC
) ON [ps_Daily]([SrcTable_CreatedDateTime])
) ON [ps_Daily]([SrcTable_CreatedDateTime])
GO
CREATE TABLE [dbo].[DestTable](
[DestTable_ID] [BIGINT] IDENTITY(1,1) NOT NULL,
[DestTable_CreatedDateTime] [DATETIME2](3) NOT NULL,
[DestTable_SrcTableID] [BIGINT] NOT NULL,
CONSTRAINT [PK_DestTable] PRIMARY KEY CLUSTERED
(
[DestTable_ID] ASC,
[DestTable_CreatedDateTime] ASC
) ON [ps_Daily]([DestTable_CreatedDateTime])
) ON [ps_Daily]([DestTable_CreatedDateTime])
GO
This code has been running for years copying millions of records a day with no issues.
Recently it started missing a single row every couple of weeks.
Here's an example of such a batch with #FromID=2140 and #ToID=2566 and one missing row (2140)
SELECT * FROM dbo.SrcTable st
LEFT JOIN dbo.DestTable dt ON st.SrcTable_ID=dt.DestTable_SrcTableID
WHERE st.SrcTable_ID BETWEEN 2140 AND 2566
ORDER BY st.SrcTable_ID ASC
The only plausible explanation that I can think of is that the allocation of identity values (SrcTable_ID) happens outside of the transaction which inserts into the source table (which I learned from an excellent answer by Paul White on the related question, but judging by the time stamps in both tables this scenario seems highly unlikely.
The question is:
How likely is it that the missing row was invisible to the SELECT statement because its' identity was allocated outside of the inserting transaction and before the lock was acquired, given the fact that the next row in the batch (2141) was inserted into the source table a couple of seconds later but was successfully picked up?
We're running on Microsoft SQL Server 2019 (RTM-CU16) (KB5011644) - 15.0.4223.1 (X64)

Related

create index clause on table variable

I need to create an Index on two columns (within a table variable) which do not form unique key.
Table structure is shown below -
DECLARE #Sample TABLE (
[AssetSk] [int] NOT NULL,
[DateSk] [int] NOT NULL,
[Count] [numeric](38, 2) NULL
)
I am trying to add Index as shown below -
INDEX AD1 CLUSTERED([AssetSk],[DateSk])
However it gives me the following error while running it on SQL Server 2012
" Incorrect syntax near 'INDEX'. If this is intended as a part of a table hint, A WITH keyword and parenthesis are now required. See SQL Server Books Online for proper syntax."
However, this runs perfectly on SQL Server 2014 . Is there any way that I could run it on SQL Server 2012 .
You can't build index other than unique key at table variable using SQL Server version prior to 2014.
However, you can do the trick: add one more colummn with autoincremented value and create unique index including columns you need and this new one.
DECLARE #Sample TABLE (
[ID] bigint identity(1, 1),
[AssetSk] [int] NOT NULL,
[DateSk] [int] NOT NULL,
[Count] [numeric](38, 2) NULL,
UNIQUE NONCLUSTERED ([AssetSk],[DateSk], ID)
)
Update: In fact, creation of such an index on table variable can be useless. Normally SQL Server estimates that a table variable has a single row, thus it will not use this index with relatively high probability.
As far as I know in SQL Server 2012 and below you can not add indexes to table variables. To add an index you must declare the table like this:
CREATE TABLE #Sample (
[AssetSk] [int] NOT NULL,
[DateSk] [int] NOT NULL,
[Count] [numeric](38, 2) NULL
)
And after you can create the index you need like this
CREATE CLUSTERED INDEX IX_MyIndex
ON #Sample ([AssetSk],[DateSk])
Of course, after you're done with the table in four function you can call
DROP TABLE #Sample

Get value of PRIMARY KEY during SELECT in ORACLE

For a specific task I need to store the identity of a row in a tabel to access it later. Most of these tables do NOT have a numeric ID and the primary key sometimes consists of multiple fields. VARCHAR & INT combined.
Background info:
The participating tables have a trigger storing delete, update and insert events in a general 'sync' tabel (Oracle v11). Every 15 minutes a script is then launched to update corresponding tables in a remote database (SQL Server 2012).
One solution I came up with was to use multiple columns in this 'sync' table, 3 INT columns and 3 VARCHAR columns. A table with 2 VARCHAR columns would then use 2 VARCHAR columns in this 'sync' table.
A better/nicer solution would be to 'select' the value of the primary key and store this in this table.
Example:
CREATE TABLE [dbo].[Workers](
[company] [nvarchar](50) NOT NULL,
[number] [int] NOT NULL,
[name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Workers] PRIMARY KEY CLUSTERED ( [company] ASC, [number] ASC )
)
// Fails:
SELECT [PK_Workers], [name] FROM [dbo].[Workers]
UPDATE [dbo].[Workers] SET [name]='new name' WHERE [PK_Workers]=#PKWorkers
// Bad (?) but works:
SELECT ([company] + CAST([number] AS NVARCHAR)) PK, [name] FROM [dbo].[Workers];
UPDATE [dbo].[Workers] SET [name]='newname' WHERE ([company] + CAST([number] AS NVARCHAR))=#PK
The [PK_Workers] fails in these queries. Is there another way to get this value without manually combining and casting the index?
Or is there some other way to do this that I don't know?
for each table create a function returning a concatenated primary key. create a function based index on this function too. then use this function in SELECT and WHERE clauses

How to maintain history of multiple tables in a single table without using CDC feature

Is it possible to consolidate the history of all the tables into a single table?
I tried to use the CDC feature provided by SQL server 2012 enterprise edition, but for that it creates a copy of every table, which increases the number of tables in the database.
Is it also possible track & insert the table name & column name in which DML has occurred into the history table? Will this cause any issues with performance?
Here is one solution using triggers.
1 - Create a trigger for each table that you want history on.
2 - Copy the modified data (INS, UPD, DEL) from base table to audit table during the action.
3 - Store all the data in XML format so that multiple tables can store data in the same audit table.
I did cover this in one of my blog articles. It is a great solution for auditing small amounts of data. There might be an overhead concern when dealing with thousands of record changes per second.
Please test before deploying to a production environment!
Here is the audit table that keeps track of the table name as well as the type of change.
/*
Create data level auditing - table.
*/
-- Remove table if it exists
IF EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'[ADT].[LOG_DML_CHANGES]') AND type in (N'U'))
DROP TABLE [ADT].[LOG_DML_CHANGES]
GO
CREATE TABLE [ADT].[LOG_DML_CHANGES]
(
[ChangeId]BIGINT IDENTITY(1,1) NOT NULL,
[ChangeDate] [datetime] NOT NULL,
[ChangeType] [varchar](20) NOT NULL,
[ChangeBy] [nvarchar](256) NOT NULL,
[AppName] [nvarchar](128) NOT NULL,
[HostName] [nvarchar](128) NOT NULL,
[SchemaName] [sysname] NOT NULL,
[ObjectName] [sysname] NOT NULL,
[XmlRecSet] [xml] NULL,
CONSTRAINT [pk_Ltc_ChangeId] PRIMARY KEY CLUSTERED ([ChangeId] ASC)
)
GO
Here is the article.
http://craftydba.com/?p=2060
The image below shows a single [LOG_DML_CHANGES] table with multiple [TRG_TRACK_DML_CHGS_XXX] triggers.
If you want to more than record that user x updated/deleted/inserted table y id x at time t then it will cause problems.
Choose the tables you want to audit; create Audit tables for them and update them from triggers on the base table. Lot of work, but the best way of doing it.

Update Query with NVARCHAR(max) in SQL Server

Have an issue with SQL Server performance and wanted to see if anyone can give some tips about improving the performance of an update query.
What I'm doing is updating one table with data from another table. Here's some of the basics:
SQL Server 2008 R2
Data is pumped to WO table originally from other system (pumped in using datareader and sqlbulkcopy in ADO.NET)
Additional data is pumped to TEMP_REMARKS (pumped in using datareader and sqlbulkcopy in ADO.NET)
Unfortunately, combining the WO and REMARKS in the originating system (via the reader query) is not possible (mainly performance reasons)
Update to WO occurs using value from TEMP_REMARKS where two columns are updated
Note that the column being transferred from TEMP_REMARKS to REMARKS is a nvarchar(max) and is being placed into another nvarchar(max) column (actually two - see query)
WO has 4m+ records
TEMP_REMARKS has 7m+ records
For the join between the two, the following is what is being used:
/* === UPDATE THE DESCRIPTION */
UPDATE WO
SET WO_DESCRIPTION = TEMP_REMARKS.REMARKS
FROM WO
INNER JOIN TEMP_REMARKS ON WO.WO_DESCRIPTION_ID = TEMP_REMARKS.REMARKS_ID;
/* === UPDATE THE FINDINGS */
UPDATE WO
SET FINDINGS = TEMP_REMARKS.REMARKS
FROM WO
INNER JOIN TEMP_REMARKS ON WO.FINDINGS_ID = TEMP_REMARKS.REMARKS_ID;
The problem at this point is that the update to the WO table is taking over two hours to complete. I've tried using the MERGE statement with no success. I've got other more completed procedures in the db that don't take nearly as long, so I'm convinced that it is not the configuration of the SQL Server itself.
Is there something that should be done when updating nvarchar(max) columns?
What can be done to improve the performance of this query?
Here are the table definitions:
CREATE TABLE [dbo].[WO](
[DOCUMENT_ID] [decimal](18, 0) NOT NULL,
[WO_DESCRIPTION_ID] [decimal](18, 0) NULL,
[WO_DESCRIPTION] [nvarchar](max) NULL,
[FINDINGS_ID] [decimal](18, 0) NULL,
[FINDINGS] [nvarchar](max) NULL,
.... bunch of other fields
CONSTRAINT [PK_WO] PRIMARY KEY CLUSTERED
(
[DOCUMENT_ID] ASC
)
This is the table definition for the TEMP_REMARKS:
CREATE TABLE [dbo].[TEMP_REMARKS](
[REMARKS_ID] [decimal](18, 0) NOT NULL,
[REMARKS] [nvarchar](max) NULL
) ON [PRIMARY]
I think, first of all you should consider to create primary key on TEMP_REMARKS, or at least some index on REMARKS_ID

How to update 2nd table with identity value of inserted rows into 1st table

I have the following table structures
CREATE TABLE [dbo].[WorkItem](
[WorkItemId] [int] IDENTITY(1,1) NOT NULL,
[WorkItemTypeId] [int] NOT NULL,
[ActionDate] [datetime] NOT NULL,
[WorkItemStatusId] [int] NOT NULL,
[ClosedDate] [datetime] NULL,
)
CREATE TABLE [dbo].[RequirementWorkItem](
[WorkItemId] [int] NOT NULL,
[RequirementId] [int] NOT NULL,
)
CREATE TABLE #RequirmentWorkItems
(
RequirementId int,
WorkItemTypeId int,
WorkItemStatusId int,
ActionDate datetime
)
I use the #RequirmentWorkItems table to create workitems for requirements. I then need to INSERT the workitems into the WorkItem table and use the identity values from the WorkItem table to create the cross-reference rows in the RequirementWorkItem table.
Is there a way to do this without cursoring thru each row? And I can't put the RequirementId into the WorkItem table because depending on the WorkItemTypeId the WorkItem could be linked to a Requirement or a Notice or an Event.
So there are really 3 xref tables for WorkItems. Or would it be better to put a RequirementId, NoticeId and EventId in the WorkItem table and 1 of the columns would have a value and other 2 would be null? Hopefully all this makes sense. Thanks.
You should read MERGE and OUTPUT – the swiss army knife of T-SQL for more information about this.
Today I stumbled upon a different use for it, returning values using an OUTPUT clause from a table used as the source of data for an insertion. In other words, if I’m inserting from [tableA] into [tableB] then I may want some values from [tableA] after the fact, particularly if [tableB] has an identity. Observe my first attempt using a straight insertion where I am trying to get a field from #source.[id] that is not used in the insertion: