CRM 2013 - Deadlocks due to internal update queries? - dynamics-crm-2013

Recently we have upgraded our CRM 4 to CRM 2013. We have around 600 users using it. We are monitoring database performance through one tool and observed couple of deadlocks due to following internal queries:
Query 1: Updating QuoteNumber counter in OrganizationBase table
update OrganizationBase set #currentval = CurrentQuoteNumber, CurrentQuoteNumber = CurrentQuoteNumber + 1 where OrganizationId = #orgid
Query 2: Updating OrderNumber counter in OrganizationBase table
update OrganizationBase set #currentval = CurrentOrderNumber, CurrentOrderNumber = CurrentOrderNumber + 1 where OrganizationId = #orgid
I don't know what will be impact due to these deadlocks because I have found counters are updated correctly till now and all quotes and orders have unique number, so no data issue.
My questions are,
1) Have anybody noticed such issue?
2) Any customizations (like plugins/workflows) creates such issues in internal queries? -

Related

Best incremental load method using SSIS with over 20 million records

What is needed: I'm needing 25 million records from oracle incrementally loaded to SQL Server 2012. It will need to have an UPDATE, DELETE, NEW RECORDS feature in the package. The oracle data source is always changing.
What I have: I've done this many times before but not anything past 10 million records.First I have an [Execute SQL Task] that is set to grab the result set of the [Max Modified Date]. I then have a query that only pulls data from the [ORACLE SOURCE] > [Max Modified Date] and have that lookup against my destination table.
I have the the [ORACLE Source] connecting to the [Lookup-Destination table], the lookup is set to NO CACHE mode, I get errors if I use partial or full cache mode because I assume the [ORACLE Source] is always changing. The [Lookup] then connects to a [Conditional Split] where I would input an expression like the one below.
(REPLACENULL(ORACLE.ID,"") != REPLACENULL(Lookup.ID,""))
|| (REPLACENULL(ORACLE.CASE_NUMBER,"")
!= REPLACENULL(ORACLE.CASE_NUMBER,""))
I would then have the rows that the [Conditional Split] outputs into a staging table. I then add a [Execute SQL Task] and perform an UPDATE to the DESTINATION-TABLE with the query below:
UPDATE Destination
SET SD.CASE_NUMBER =UP.CASE_NUMBER,
SD.ID = UP.ID,
From Destination SD
JOIN STAGING.TABLE UP
ON UP.ID = SD.ID
Problem: This becomes very slow and takes a very long time and it just keeps running. How can I improve the time and get it to work? Should I use a cache transformation? Should I use a merge statement instead?
How would I use the expression REPLACENULL in the conditional split when it is a data column? would I use something like :
(REPLACENULL(ORACLE.LAST_MODIFIED_DATE,"01-01-1900 00:00:00.000")
!= REPLACENULL(Lookup.LAST_MODIFIED_DATE," 01-01-1900 00:00:00.000"))
PICTURES BELOW:
A pattern that is usually faster for larger datasets is to load the source data into a local staging table then use a query like below to identify the new records:
SELECT column1,column 2
FROM StagingTable SRC
WHERE NOT EXISTS (
SELECT * FROM TargetTable TGT
WHERE TGT.MatchKey = SRC.MatchKey
)
Then you just feed that dataset into an insert:
INSERT INTO TargetTable (column1,column 2)
SELECT column1,column 2
FROM StagingTable SRC
WHERE NOT EXISTS (
SELECT * FROM TargetTable TGT
WHERE TGT.MatchKey = SRC.MatchKey
)
Updates look like this:
UPDATE TGT
SET
column1 = SRC.column1,
column2 = SRC.column2,
DTUpdated=GETDATE()
FROM TargetTable TGT
WHERE EXISTS (
SELECT * FROM TargetTable SRC
WHERE TGT.MatchKey = SRC.MatchKey
)
Note the additional column DTUpdated. You should always have a 'last updated' column in your table to help with auditing and debugging.
This is an INSERT/UPDATE approach. There are other data load approaches such as windowing (pick a trailing window of data to be fully deleted and reloaded) but the approach depends on how your system works and whether you can make assumptions about data (i.e. posted data in the source will never be changed)
You can squash the seperate INSERT and UPDATE statements into a single MERGE statement, although it gets pretty huge, and I've had performance issues with it and there are other documented issues with MERGE
Unfortunately, there's not a good way to do what you're trying to do. SSIS has some controls and documented ways to do this, but as you have found they don't work as well when you start dealing with large amounts of data.
At a previous job, we had something similar that we needed to do. We needed to update medical claims from a source system to another system, similar to your setup. For a very long time, we just truncated everything in the destination and rebuilt every night. I think we were doing this daily with more than 25M rows. If you're able to transfer all the rows from Oracle to SQL in a decent amount of time, then truncating and reloading may be an option.
We eventually had to get away from this as our volumes grew, however. We tried to do something along the lines of what you're attempting, but never got anything we were satisfied with. We ended up with a sort of non-conventional process. First, each medical claim had a unique numeric identifier. Second, whenever the medical claim was updated in the source system, there was an incremental ID on the individual claim that was also incremented.
Step one of our process was to bring over any new medical claims, or claims that had changed. We could determine this quite easily, since the unique ID and the "change ID" column were both indexed in source and destination. These records would be inserted directly into the destination table. The second step was our "deletes", which we handled with a logical flag on the records. For actual deletes, where records existed in destination but were no longer in source, I believe it was actually fastest to do this by selecting the DISTINCT claim numbers from the source system and placing them in a temporary table on the SQL side. Then, we simply did a LEFT JOIN update to set the missing claims to logically deleted. We did something similar with our updates: if a newer version of the claim was brought over by our original Lookup, we would logically delete the old one. Every so often we would clean up the logical deletes and actually delete them, but since the logical delete indicator was indexed, this didn't need to be done too frequently. We never saw much of a performance hit, even when the logically deleted records numbered in the tens of millions.
This process was always evolving as our server loads and data source volumes changed, and I suspect the same may be true for your process. Because every system and setup is different, some of the things that worked well for us may not work for you, and vice versa. I know our data center was relatively good and we were on some stupid fast flash storage, so truncating and reloading worked for us for a very, very long time. This may not be true on conventional storage, where your data interconnects are not as fast, or where your servers are not colocated.
When designing your process, keep in mind that deletes are one of the more expensive operations you can perform, followed by updates and by non-bulk inserts, respectively.
Incremental Approach using SSIS
Get Max(ID) and Max(ModifiedDate) from Destination Table and Store them in a Variables
Create a temporary staging table using EXECUTE SQL TASK and store that temporary staging table name into the variable
Take a Data Flow Task and Use OLEDB Source and OLEDB Destination to pull the data from the Source System and load the
data into the variable of temporary tables
Take Two Execute SQL Task one for Insert Process and other for Update
Drop the Temporary Table
INSERT INTO sales.salesorderdetails
(
salesorderid,
salesorderdetailid,
carriertrackingnumber ,
orderqty,
productid,
specialofferid,
unitprice,
unitpricediscount,
linetotal ,
rowguid,
modifieddate
)
SELECT sd.salesorderid,
sd.salesorderdetailid,
sd.carriertrackingnumber,
sd.orderqty,
sd.productid ,
sd.specialofferid ,
sd.unitprice,
sd.unitpricediscount,
sd.linetotal,
sd.rowguid,
sd.modifieddate
FROM ##salesdetails AS sd WITH (nolock)
LEFT JOIN sales.salesorderdetails AS sa WITH (nolock)
ON sa.salesorderdetailid = sd.salesorderdetailid
WHERE NOT EXISTS
(
SELECT *
FROM sales.salesorderdetails sa
WHERE sa.salesorderdetailid = sd.salesorderdetailid)
AND sa.salesorderdetailid > ?
UPDATE sa
SET SalesOrderID = sd.salesorderid,
CarrierTrackingNumber = sd.carriertrackingnumber,
OrderQty = sd.orderqty,
ProductID = sd.productid,
SpecialOfferID = sd.specialofferid,
UnitPrice = sd.unitprice,
UnitPriceDiscount = sd.unitpricediscount,
LineTotal = sd.linetotal,
rowguid = sd.rowguid,
ModifiedDate = sd.modifieddate
FROM sales.salesorderdetails sa
LEFT JOIN ##salesdetails sd
ON sd.salesorderdetailid = sa.salesorderdetailid
WHERE sa.modifieddate > sd.modifieddate
AND sa.salesorderdetailid < ?
Entire Process took 2 Minutes to Complete
Incremental Process Screenshot
I am assuming you have some identity like (pk)column in your oracle table.
1 Get max identity (Business key) from Destination database (SQL server one)
2 Create two data flow
a) Pull only data >max identity from oracle and put them Destination directly .( As these are new record).
b) Get all record < max identity and update date > last load put them into temp (staging ) table (as this is updated data)
3 Update Destination table with record from temp table record (created at step b)

SQL Query on table with 30mill records

I have been having problems building a table in my local SQL Server. Orginally it was causing the tempdb table to become full and throw an exception. This has a lot of joins and outer applies, and so to find specifically where the problem lay I did a select on the first table in the sql query to determine how long it took, that was fast so I then added the next table that was the first join in the query and reran, I continued to do this until I found the table that stalled.
I found the problem (or at least the first problem) was with the shipper_container table. This table is huge and pulling it alone gets a System.OutOfMemoryException just showing a select on the results of that table alone (it has only 5 columns). It cuts out at 16 million records but has 30 million rows. It is 1.2GB in size. This doesn't seem so big for me that SQL Management studio couldn't handle it.
Using a WHERE statement to collect values between 1 January - 10 January 2015 still resulted in a search that took over 5 minutes and was still executing when I cancelled. I have also added indexes on each of the select parameters and this did not increase performance either.
Here is the SQL Query. You can see I have commented out the other parameters that have yet to be added in other joins and outer applies.
DECLARE #startDate DATETIME
DECLARE #endDate DATETIME
DECLARE #Shipper_Key INT = NULL
DECLARE #Part_Key INT = NULL
SET #startDate = '2015-01-01'
SET #endDate = '2015-01-10'
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
INSERT Shipped_Container
(
Ship_Date,
Invoice_Quantity,
Shipper_No,
Serial_No,
Truck_Key,
Shipper_Key
)
SELECT
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
FROM Shipper AS S
JOIN Shipper_Line AS SL
--ON SL.PCN = S.PCN
ON SL.Shipper_Key = S.Shipper_Key
JOIN Shipper_Container AS SC
--ON SC.PCN = SL.PCN
ON SC.Shipper_Line_Key = SL.Shipper_Line_Key
WHERE S.Ship_Date >= #startDate AND S.Ship_Date <= #endDate
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
The server instance is run on the local network - could this be an issue? I really have minimal experience at this and would really appreciate help and as detailed and clear as possible. Often in SQL forums people jump right into technical details I don't follow so well.
Don't do a Select ... From yourtable in SS Management Studio when it return
hundrend of thousand or millions of row. 1GB of data gets a lot bigger when the system has to draw and show it on screen in the Management Studio data sheet
The server instance is run on the local network
When you do a Select ... From yourtable in SSMS, the server must send all the data to your laptop/desktop. This is quite a lot of uneeded presure on the network.
It should not be an issue when you insert because everything stays on the server. However, staying on the server does not mean it will be fast if your data model is not good enough.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
You may get dirty data is you use that... It may be better to remove it unless you know why it is there and why you need it.
I have also added indexes on each of the select parameters and this did not increase performance either
If you mean indexes on :
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
What are their definitions ?
If they are individual indexes on 1 column, you can drop indexes on SC.Quantity, S.Shipper_No, SC.Serial_No and S.Truck_Key. They are not used.
Ship_Date and Shipper_key may be usefull. It all depends on your model and existing primary keys. (which you need to describe, see below)
It will help to give a more accurate answer if you could tell us:
the relation between your 3 tables (which field link A to B and in which direction)
the primary key on your 3 tables
a complete list of all your indexes(and columns) on your 3 tables
If none of your indexes are usefull or if they are missing, it will most likely read the whole 3 tables and try to match them. Because it is pretty big, it does not have enough memory to process it and it uses tempdb to store intermediary data.
For now I will suppose that shipper_key + PCN is the primary key on each tables.
I think you can try that:
You can create an index on S.Ship_Date
Create Index Shipper_Line_Ship_Date(Ship_Date) -- subject to updates according to your Primary Key
The query optimizer may not use the indexes (if they exists) with such a where clause:
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
you can use:
AND (S.Shipper_Key = #Shipper_Key or #Shipper_Key is null)
AND (SL.Part_Key = #Part_Key or #Part_Keyis null)
It would help to have indexes on Shipper_Key and PCN
Finally
As I already said above, we need to know more about your data model (create table...), primary keys and indexes (create indexes). You can create a modele here http://sqlfiddle.com/ with all 3 create tables and their indexes. Then go to link and add the link here.
In SSMS, you can right click on a table and go to Script Table as / Create To / New Query Window and add it here or in http://sqlfiddle.com/. Only keep the CREATE TABLE ... part down to the first GO.
You can then do the same thing on all you indexes.
You should also add a copy of you query plan.
In SSMS, go to Query menu / Display Estimated Execution Plan and right click to save it as xml (xml is better). It is only an estimation and it won't execute the whole query. It should be pretty fast.

Performing SQL updates in single statements vs batches

I'm working with large databases and need advice on how to optimize my selects/updates. Here's an ex:
create table Book (
BookID int,
Description nvarchar(max)
)
-- 8 million rows
create table #BookUpdates (
BookID int,
Description nvarchar(max)
)
-- 2 million rows
Let's assume that there's 8 million Books and I have to update the genre for 2 million of them.
Problem: the time to run these updates is very long. It will occasionally cause blocking for the users who are also trying to run statements off the database. I've come up with a solution but want to know if there's a better one out there. I have to prepare one-off random updates like this alot (for whatever reason)
-- normal update
update b set b.Description = bu.Description
from Book b
join #BookUpdates bu
on bu.BookID = b.BookID
-- batch update
while (#BookID < #MaxBookID)
begin
update b set b.Description = bu.Description
from Book b
join #BookUpdates bu
on bu.BookID = b.BookID
where bu.BookID >= #BookID
and bu.BookID < #BookID + 5000
set #BookID = #BookID + 5000
end
The second update works a lot faster. I like this solution because I can print status updates to myself on how long it has left and it doesn't cause performance issues on our customers.
Question: am I missing something important here? Indexes on the temp tables?
I updated the EXAMPLE tables so I don't get more normalization comments. Only 1 description per book :)
You can prevent blocking on the query side by using NOLOCK or READUNCOMITTED hints on the SQL queries.
The real issue with performance is probably the accumulation of changes in the log. Your method of batching the changes in groups of 5,000 is quite reasonable. Because you are setting up the updates in a batch table, you might as well calculate the batch number in the table and then do the looping based on that.
I would try your own suggestion first and index the temp table before you run the update:
CREATE INDEX IDX_BookID ON #BookUpdates(BookID)
Try it with the index and without the index and see what the impact on the runtime is. If you want to avoid impacting your users for this test, run it outside working hours (if you can) or copy Book to another temp table first and test against that.
Regardless, given the volume, I expect you will still cause blocking for other processes. If you are unable to schedule your updates at a time when no other processes are running against this table (which would be the ideal solution), your existing batch update appears to be a perfectly valid solution. Indexing the temp table will likely help with that too so you may be able to increase the batch size without causing blocking.

Avoid SQL deadlocks on CurrentAccountBalance update

I'm evaluating PostgreSQL for some personal project.
I was inspirited by it's Multi-Version Concurrency Control (MVCC)
I simulated a basic need - insert transaction and perform a vendor balance update with many threads at the same time, running the SQL commands like:
INSERT INTO
VendorAccountTransactions (VendorId, BalanceBefore, BalanceAfter)
VALUES (
1,
(SELECT CurrentBalance FROM VendorAccounts WHERE VendorId = 1),
(SELECT CurrentBalance FROM VendorAccounts WHERE VendorId = 1) + 19.99
);
UPDATE VendorAccounts SET CurrentBalance = CurrentBalance + 19.99 WHERE VendorId = 1;
Any Idea how to avoid deadlocks in such common case?
What is is needed - simply insert the transaction description with "balance before" / "balance after" and update the balance.
It will be used in high load application.
How to achieve the right result for this simple business need?
Thank you.
Update:
Maybe there is any other solution to re-design the database to avoid deadlocks or use some other solution to keep the business need solved?
Put the update first and include both statements in a transaction. The update will updlock the vendor row and prevent concurrent transactions from entering the transaction (they will wait until the first tran completed as the updlock is not available).
This will effectively serialize access to a given vendor which will ensure consistency.

sql table cell modified by multiple threads at the same time

if you have table BankAccount that has a column Amount and the value for this column for a specific row can be modified by multiple threads at the same time so it could happen so that the last one to set the value will win.
how do you usually handle this kind of situations ?
UPDATE: I heard that in MSSQL there is update lock UPDLOCK that locks the table or the row that is being updated, could I this here somehow ?
An update statement which references the current value would prevent overwriting. So, instead of doing something like
SELECT Amount FROM BankAccount WHERE account_id = 1
(it comes back as 350 and you want to subtract 50)...
UPDATE BankAccount SET Amount = 300 WHERE account_id = 1
do
UPDATE BankAccount SET Amount = Amount - 50 WHERE account_id = 1
You cannot have several threads modifying the same data at exactly the same time : it will always be the last one which set the value that'll "win".
If the problem is that several threads read and set the value at almost the same time, and reads and writes don't arrive on the right order, the solution is to use Transactions :
start a transaction
read the value
set the new value
commit the transaction
This ensures the read and the write will be done consistently, and no other thread will be able to modify the data during the same transaction.
Quoting the wikipedia page about Database Transactions :
A database transaction comprises a
unit of work performed within a
database management system (or similar
system) against a database, and
treated in a coherent and reliable way
independent of other transactions.
Transactions in a database environment
have two main purposes:
To provide reliable units of work that allow correct recovery from
failures and keep a database
consistent even in cases of system
failure, when execution stops
(completely or partially) and many
operations upon a database remain
uncompleted, with unclear status.
To provide isolation between programs accessing a database
concurrently. Without isolation the
programs' outcomes are typically
erroneous.
You ussually use transactions to overcome this.
Have a look at Database transaction
You should have a database function/procedure which makes operations with the "Amount". This function/procedure should return if the operation was succeeded or failed (for example, you want take $1000, but current AMount is only $550, so operation can not be proceede).
Expamle in T-SQL:
UPDATE BankAccount SET Amount = Amount - 1000 WHERE BankAcountID = 12345 AND Amount >= 1000
RETURN ##ROWCOUNT
If the amaount was changed, the return value will be 1, otherwise 0.
Know, you can safely run this functions/procedures (in several threads too):
DECLARE #Result_01 int, Result_02 int, Result_03 int
EXEC #Result_01 = ChangeBankAccountAmount #BankAcountID = 12345, #Amount = 1000
EXEC #Result_02 = ChangeBankAccountAmount #BankAcountID = 12345, #Amount = 15
EXEC #Result_03 = ChangeBankAccountAmount #BankAcountID = 12345, #Amount = 600, #Amount = -2000
EDIT:
Whole procedure in T-SQL:
CRATE PROC ChangeBankAccountAmount
#BankAccountID int,
#ChangeAmount int,
#MMinAmount int = 0
AS BEGIN
IF #ChangeAmount >= 0
UPDATE BankAccount SET Amount = Amount + #ChangeAmount WHERE BankAcountID = 12345
ELSE
UPDATE BankAccount SET Amount = Amount + #ChangeAmount WHERE BankAcountID = 12345 AND Amount >= #MMinAmount
RETURN ##ROWCOUNT
END
Of course - the "int" datatype is not good for money, you should change it to datatype used in your table.