SQL Azure - One session locking entire DB for Update and Insert - sql

SQL Azure issue.
I've got an issue that manifests as the following exception on our (asp.net) site:
Timeout expired. The timeout period elapsed prior to completion of
the operation or the server is not responding. The statement has been
terminated.
It also results in update and insert statements never completing in SMSS. There aren't any X or IX locks present when querying: sys.dm_tran_locks and there are no transactions when querying sys.dm_tran_active_transactions or sys.dm_tran_database_transactions.
The problem is present for every table in the database but other databases on the same instance don't cause the problem. The duration of the issue can be anywhere from 2 minutes to 2 hours and doesn't happen at any specific times of day.
The database is not full.
At one point this issue didn't resolve itself but I was able to resolve the issue by querying sys.dm_exec_connections finding the longest running session, and then killing it. The odd thing is, that the connection was 15 minutes old, but the lock issue had been present for over 3 hours.
Is there anything else I can check?
EDIT
As per Paul's answer below. I'd actually tracked down the problem before he answered. I will post the steps I used to figure this out below, in case they help anyone else.
The following queries were run when a "timeout period" was present.
select * from sys.dm_exec_requests
As we can see, all the WAIT requests are waiting on session 1021 which is the replication request! The TM Request indicates a DTC transaction and we don't use distributed transactions. You can also see the wait_type of SE_REPL_COMMIT_ACK which again implicates replication.
select * from sys.dm_tran_locks
Again waiting on session 1021
SELECT * FROM sys.dm_db_wait_stats ORDER BY wait_time_ms desc
And yes, SE_REPL_CATCHUP_THROTTLE has a total wait time of 8094034
ms, that is 134.9minutes!!!
Also see the following forum for details on this issue.
http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8
I've been given the following answer in my communication with
Microsoft (we've seen this issue with 4 of our 15 databases in the EU
data center):
Question: Have there been changes to these soft
throttling limits in the last three weeks ie since my problems
started?
Answer: No, there has not.
Question: Are there ways we can
prevent or be warned we are approaching a limit?
Answer: No. The issue
may not be caused by your application but can be caused by other
tenants relying on the same physical hardware. In other words, your
application can have very little load and still run into the problem.
In other words, your own traffic may be a cause of this problem, but
it can just as well be caused by other tenants relying on the same
physical hardware. There's no way to know beforehand that the issue
will soon occur - it can occur at any time without warning. The SQL
Azure operations team does not monitor this type of error, so they
won't automatically try to solve the problem for you. So if you run
into it you have two opitions:
Create a copy of your db and use that and hope the db is placed on another server with less load.
Contact Windows Azure Support and inform the about the problem and let them do Option 1 for you

You might be running into the SE_REPL* issues that are currently plaguing a lot of folks using Sql Azure (my company included).
When you experience the timeouts, try checking your wait requests for wait types of:
SE_REPL_SLOW_SECONDARY_THROTTLE
SE_REPL_COMMIT_ACK
Run the following to check your wait types on current connections:
SELECT TOP 10 r.session_id, r.plan_handle,
r.sql_handle, r.request_id,
r.start_time, r.status,
r.command, r.database_id,
r.user_id, r.wait_type,
r.wait_time, r.last_wait_type,
r.wait_resource, r.total_elapsed_time,
r.cpu_time, r.transaction_isolation_level,
r.row_count
FROM sys.dm_exec_requests r
You can also check a history of sorts for this by running:
SELECT * FROM sys.dm_db_wait_stats
ORDER BY wait_time_ms desc
If you're seeing a lot of SE_REPL* wait types and these are staying set on your connections for any length of time, then basically you're screwed.
Microsoft are aware of the problem, but I've had a support ticket open for a week with them now and they're still working on it apparently.
The SE_REPL* waits happen when the Sql Azure replication slaves fall behind.
Basically the whole db suspends queries while replication catches up :/
So essentially the aspect that makes Sql Azure highly available is causing databases to become randomly unavailable.
I'd laugh at the irony if it wasn't killing us.
Have a look at this thread for details:
http://social.technet.microsoft.com/Forums/en-US/ssdsgetstarted/thread/c3003a28-8beb-4860-85b2-03cf6d0312a8

Related

Timeout expired SQL Server 2008

I have a SQL Server database in production and it has been live for 2 months. A while ago the web application associated with it loading takes too long time. And sometimes it says timeout occurred.
Found a quick fix by running a command 'exec sp_updatestats' will fixed the problem. But I need to be run that one consistently (for every 5 minutes).
So I created a Windows service with timer and started on server. My question is what are the root causes and possible permanent solutions? Anyone?
Here is a Most expensive query from Activity Monitor
WAITFOR(RECEIVE TOP (1) message_type_name, conversation_handle, cast(message_body AS XML) as message_body from [SqlQueryNotificationService-2eea594b-f994-43be-a5ed-d9a47837a391]), TIMEOUT #p2;
To diagnose a poorly performing queries you need to:
Identify the poorly performing query, e.g. via application logging, a SQL Profiler trace filtered to show only queries with a longer duration than a certain threshold etc...
Get an execution plan for the query
At that point you can start to try to figure out what the performance issue is.
Given that exec sp_updatestats fixes your issue it could be that statistics on certain tables are out of date (this is a pretty common cause of performance issues). If thats the case then you might be able to tweak your statistics or at least rebuild only those statistics that are causing issues.
Its worth noting that updating statistics will also cause cached execution plans to become invalid, and so its plausible that your issue is unrelated to statistics - you need to collect more information about the poorly performing queries before looking at solutions.
Your most expensive query looks like its waiting for a message, i.e. its in your list of long running queries because its designed to run for a long time, not because its expensive.
Thanks for everyone i found a solution for my issue . Its quite different I've enabled sql dependency module on my sql server by setting up enable broker on , thats the one causing timeout query so by setting it to false everything is fine working now.

SQL server 2008 replication without reinitialize

I have two databases in different servers - center_db on siglv01\sql2008 and center_db on sig\sql2008.
Can I restart replication without needing to reinitialize it? The connection dropped more than 3 days ago and is now too slow: so I want to start replication without a reinitialize.
Based on the brief conversation above, I don't think you can do this without a re-init. Specifically, the distribution database only keeps so many commands before it starts trimming. The default is 72 hours. If the last command delivered to all of your subscribers is older than that, the distribution database doesn't have what it needs to play forward all of the activity that has happened since then.
Your only hope would be if the distribution agent is still running (it knows when the above situation happens and will give you an error saying as much). If so, try to figure out why delivery is slow (troubleshoot this like any other "slow application"; replication isn't magic) and see if it can get caught up that way. Depending on how many commands are remain undelivered, it may be faster to just re-init.

SQL Connection Issues... next steps

We've released a new game on Facebook that uses SQL Azure and we're getting intermittent connection timeouts.
I dealt with this earlier and implemented a 'retry' solution that seemed to have dealt with the transient connection issues.
However, now that the game is out I'm seeing it happen again. Not often, but it is happening. When it happens, I try logging into the SQL Azure Management web portal and I get a connection timeout there too. Same with trying SSMS.
The query itself is the first one of the game and it's a simple select on a table with 4 records.
After about 4 minutes, the timeouts stop and everything is good for a day or two.
Since these are players around the country, I don't have direct contact with the users.
I'm looking for any advice on how I can figure out what's going on.
Thanks,
Tim
FYI: http://apps.facebook.com/RelicBall/
Depending on how much compute you have in front of your database I would put in a limit on the connection pools that can be created with connection string.
Trying setting if for example you have 2 compute in front of the database.
Max Pool Size=70;
SQL Database can only handle 180 connections this is a hard limit. You can find for example when you are hitting the connection limit a retry framework will make the matter worse as it will try to connecting for a period of time leading to further downtime. This might be the reason you see several minutes as the compute retry frameworks give up.
http://msdn.microsoft.com/en-us/library/windowsazure/ff394114.aspx
Have a look with the following:
-- monitor connections
SELECT
e.connection_id,
s.session_id,
s.login_name,
s.last_request_end_time,
s.cpu_time
FROM
sys.dm_exec_sessions s
INNER JOIN sys.dm_exec_connections e
ON s.session_id = e.session_id
GO
You should try to add cache to you application design, this can greatly reduce you application over head on the database and is recommend practice with SQL Azure. Especially as you can have connection issues. I have seen this type of issue before and it was connection limits so maybe worth investigating a bit of time in that direction to see if that causes. If not I would open a ticket to MS Support.
hths, Goodluck.
EDIT: Premium Database obviously raise the limits on connections so worth of investigation also as quick fix to this issue and potentially a long run one.
http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/23/premium-preview-for-windows-azure-sql-database-now-live.aspx

Log of Queries executed on SQL Server

i ran a t-sql query on SQL Server 2005 DB it was suppose to run for 4 days , but when i checked after 4 days i found the machine was abruptly closed. So i want to trace what happened with the query or in other words whats was the status of the query when machine went down
Is there a mechanism to check this
It may be very difficult to do such a post-mortem. But while it may not be possible to determine what portion of the long running query was active when the shutdown occured, it is important to try and find the root cause of the problem!
Depending on the way the query is written, SQL will attempt to roll-back any SQL that wasn't completed (or any uncommitted transaction if transactions were used). This will happen the first time the server is restarted; If you have any desire to analyze the SQL transaction log (yuck!) make a copy.
Before getting to the part of the query which may have been running at the time of the shutdown, it may be preferable to look into SQL logs (I mean application logs, with information messages, timestamps and such, not SQL transaction logs for a given database), in search of basic facts, and possibly of an error message, happening some time prior to the shutdown and that could plausibly be the origin of the problem. Do also check the Windows event logs, as they may indicate abnormal system-wide conditions that could be a cause or consequence of the SQL shutdown.
The SQL transaction log can be "decompiled" and used to understand what was readily updated/inserted/deleted in the database, however this type of info for a 4 days run may be burried quite deep and long and possibly intersped with unrelated queries. In fact an out of disk space for the SQL log could possibly cause the server to become erratic (I would expect a more elegant handling of such situation, but if somehow the drive in question was also the system drive, bad things could happen...)
Finaly, by analyzing the "4 days long" SQL script it may be possible to infer which parts were completed, thanks to various side-effects of the script. In nothing else, this review may allow putting back the database in a coherent state if necessary, or to truncate the script to exclude the parts readily completed, in preparation for a new run for completign the work.

How to avoid Sql Query Timeout

I have RO access on a SQL View. This query below times out. How to avoid this?
select
count(distinct Status)
from
[MyTable] with (NOLOCK)
where
MemberType=6
The error message I get is:
Msg 121, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
Your query is probably fine. "The semaphore timeout period has expired" is a Network error, not a SQL Server timeout.
There is apparently some sort of network problem between you and the SQL Server.
edit: However, apparently the query runs for 15-20 min before giving the network error. That is a very long time, so perhaps the network error could be related to the long execution time. Optimization of the underlying View might help.
If [MyTable] in your example is a View, can you post the View Definition so that we can have a go at optimizing it?
Although there is clearly some kind of network instability or something interfering with your connection (15 minutes is possible that you could be crossing a NAT boundary or something in your network is dropping the session), I would think you want such a simple?) query to return well within any anticipated timeoue (like 1s).
I would talk to your DBA and get an index created on the underlying tables on MemberType, Status. If there isn't a single underlying table or these are more complex and created by the view or UDF, and you are running SQL Server 2005 or above, have him consider indexing the view (basically materializing the view in an indexed fashion).
You could put an index on MemberType.
Please check your Windows system event log for any errors specifically for the "Event Source: Dhcp". It's very likely a networking error related to DHCP. Address lease time expired or so. It shouldn't be a problem related to the SQL Server or the query itself.
Just search the internet for "The semaphore timeout period has expired" and you'll get plenty of suggestions what might be a solution for your problem. Unfortunately there doesn't seem to be the solution for this problem.
Do you have an index defined over the Status column and MemberType column?
how many records do you have? are there any indexes on the table? try this:
;with a as (
select distinct Status
from MyTable
where MemberType=6
)
select count(Status)
from a
My team were experiencing these issues intermittently with long running SSIS packages. This has been happening since Windows server patching.
Our SSIS and SQL servers are on separate VM servers.
Working with our Wintel Servers team we rebooted both servers and for the moment, the problem appears to have gone away.
The engineer has said that they're unsure if the issue is the patches or new VMTools that they updated at the same time. We'll monitor for now and if the timeout problems recur, they'll try rolling back the VMXNET3 driver, first, then if that doesn't work, take off the June Rollup patches.
So for us the issue is nothing to do with our SQL Queries (we're loading billions of new rows so it has to be long running).
This is happen because another instance of sql server is running. So you need to kill first then you can able to login to SQL Server.
For that go to Task Manager and Kill or End Task the SQL Server service then go to Services.msc and start the SQL Server service.
While I would be tempted to blame my issues - I'm getting the same error with my query, which is much, much bigger and involves a lot of loops - on the network, I think this is not the case.
Unfortunately it's not that simple. Query runs for 3+ hours before getting that error and apparently it crashes at the same time if it's just a query in SSMS and a job on SQL Server (did not look into details of that yet, so not sure if it's the same error; definitely same spot, though).
So just in case someone comes here with similar problem, this thread:
https://www.sqlservercentral.com/Forums/569962/The-semaphore-timeout-period-has-expired
suggest that it may equally well be a hardware issue or actual timeout.
My loops aren't even (they depend on sales level in given month) in terms of time required for each, so good month takes about 20 mins to calculate (query looks at 4 years).
That way it's entirely possible I need to optimise my query. I would even say it's likely, as some changes I did included new tables, which are heaps... So another round of indexing my data before tearing into VM config and hardware tests.
Being aware that this is old question: I'm on SQL Server 2012 SE, SSMS is 2018 Beta and VM the SQL Server runs on has exclusive use of 132GB of RAM (30% total), 8 cores, and 2TB of SSD SAN.