We are facing the SQL Timed out issue and I found that the Error event ID is either Event 5586 or 3355 (Unable to connect / Network Issue), also could see few other DB related error event ids (3351 & 3760 - Permission issues) reported at different times.
what could be the reason? any help would be appreciated..
Can you elaborate a little? When is this happening? Can you reproduce the behavior or is it sporadic?
It appears SharePoint is involved. Is it possible there is high demand for a large file?
You should check for blocking/locking that might be preventing your query from completing. Also, if you have lots of computed/calculated columns (or just LOTS of data), your query make take a long time to compute.
Finally, if you can't find something blocking your result or optimize your query, it's possible to increase the timeout duration (set it to "0" for no timeout). Do this in Enterprise Manager under the server or database settings.
Troubleshooting Kerberos Errors. It never fails.
Are some of your webapps running under either the Local Service or Network Service account? If so, if your databases are not on the same machine (i.e. SharePoint is on machine A and SQL on machine B), authentication will fail for some tasks (i.e. timerjob related actions etc.) but not all. For instance it seems content databases are still accessible (weird, i know, but i've seen it happen....).
Related
We are currently testing an upgrade from CF11 to CF2018 for my company's intranet. To give you an idea how long this site has been running, our first version of CF was 3.1! It is still using application.cfm, and there is code from 1998, when I started writing this thing. Yes, 21 years -- I'm astonished, too. It is a hodgepodge of all kinds of older frameworks, too, including Fusebox.
Anyway, we're running Win 2012 VM connected to a SQL 2016 farm. Everything looked OK initially, but in the Week I've been testing, the server has come to a slowdown once (a page took more than 5 seconds to run, something that usually takes 100ms, no DB involvement), and another time, the server came to a grinding halt. The only way I could restart CF App service was by connecting to the server with another server via Services, because doing it via Remote Desktop was so slow.
Now keep in mind -- it's just me testing. This is a site that doesn't have a ton of users, but still, having 5 concurrent connections is normal and there are upwards of 200-400 users hitting this thing every day.
I have FusionReactor running on this thing now, so the next time a lockup happens, I will be able to take a closer look, but what do you think is the best way I can test this? Our site is mostly transactional, users going and filling out forms to put internal orders through. We also connect to XML web services and REST services; we also provide REST services, too. Obviously there's no way to completely replicate a production server's requests onto a test server, but I need to do more thorough testing. Any advice would be hugely appreciated.
I realize your focus for now is trying to recreate the problem on test. That may not be as easy as hoped. Instead, you should be able to understand and resolve it in production. FusionReactor can help, but the answer may well be in the cf logs.
You don't mention assessing the logs at the time of the hangup. See especially the coldfusion-error log, for outofmemory conditions.
You mention raising the heap, but the problem may be with the metaspace instead. If so, consider simply removing the maxmetaspace setting in the jvm args. That may be the sole and likely cause of such new and unexpected outages.
Or if it's not, and there's nothing in the logs at the time, THEN do consider FR. Does IT show anything happening at the time?
If not then consider a need to tune the cf/web server connector. I assume you're using iis. How many sites do you have? And how many connectors (folders in the cf config/wsconfig folder)? What are the settings in their workers.properties file? Are they optimized for the number of sites using that connector?
Also, have you updated cf2018? Are there any errors in the update error log? Did you update the web server connector also?
Are you running the cf2018 pmt (performance monitoring tool set)? Have you updated it?
There could be still more to consider, but let's see how it goes with those. I have blog posts on these and many more topics that would elaborate on things, both at my site (carehart.org) and the Adobe cf portal (coldfusion.adobe.com).
But let's hear if any of this gets you going.
I have inherited an Access database that has linked SQL tables. I need to test the network traffic that is caused by the execution of the Db. I need to ascertain which parts of the system cause the most Network traffic and therefore are the slowest.
I am not an access guru so ive struggled doing what was suggested, which is : have Task Manager open at the Networking tab.
Then Step in into the app and looking at where there is a significant rise in Network traffic. But this seems rather unreliable and time consuming.
Does anyone have any ideas how I can achieve my goal in Access?
If you really need to analyze the network traffic then you should probably get to know WireShark well enough to do a capture that is filtered on the traffic between the client and the SQL server.
Just had a bizarre issue with SQL Azure, and it's happened in a small phase just before full go live with some users doing some data entry.
"Database 'dbname' on server 'xxx' is not currently available. Please rety the connection later. If the problem persists, contact customer support."
When I tried to connect via SQL Azure database website I got:
"Firewall check failed.
Resource ID : 1. The request minimum guarantee is 0,
maximum limit is 180 and the current usage for the database is 0.
However, the server is currently too busy to support request greater than 0 for this database."
Looking at the databases section of the Azure Management website the site reported it couldn't access the DB, but I didn't capture the exact error message unfortunately.
Bizarrely, a couple of my users were still able to login to our system website that access the DB, and view and save data. Eventually they lost connection too however.
After an hour or so, the databases came back to life and we could fully access them again.
I have looked at the servers master db event table using queries from here and there was a couple of connection failures but nothing interesting. No throttling or deadlocks, a couple of failed connections that said "Client may have timed out when establishing connection. Try increasing the connection timeout." in the description
Any ideas where else to look?
Business users have had a massive drop in confidence because of this.
What your describing normally occurs because of :
1) SQL Connection limit being hit. Assuming you don't see this often you unlikely to be the cause. But worth checking putting a limit on your connection pool can help.
2)You neighbours being extremely noisy and thus the node re-adjusts.
3) Hardware failure and Microsoft bringing your database back online in a different node. This can take some time.
Normally I have seen this when Microsoft have throttled or had problems with a box and had to recover everyone over. Because you are on a shared system you have to keep in mind that they are recovering everyone else also in that node also and thus sometimes this takes time.
The best bet if you are worried and need to get a resolution for the business is to open a support ticket with MS and give them the time and error message you saw this. They will investigate and generally they have really good back end telemetry that will point to a reason. This will allow you to give the business a resolution and then you can make a call on future plans and contingencies. You have to keep in mind though that SQL Azure is shared system and transient errors can happen, you might need to design more failover into your designs.
Question: Is it possible to stop SSMS from monitoring the service status of registered servers?
Details:
SSMS 2008 monitors the service status of every registered server. From what I have seen it seems to reach out to every registered server every minute or so to check it's status, in my case that is over 100 servers. This process has raised issues with our Security and Network departments. Network identified it initially as suspicious traffic due to the fact that it appeard as an unknown utility was scanning the network for SQL Servers. Security was concerned because the Security Event Logs on each server are being filled up with my logon events.
I have looked all over for a setting but can't seem to find one. Am I missing it somewhere?
TIA,
Brian
I finally found an answer!!
While it is not possible (at least that I've found) to stop SSMS from checking the service status of registered servers it is possible to change the interval at which it checks it.
The short version is to create the following registry keys (DWORD):
(SQL Server 2008)
HKLM\Software\Microsoft\Microsoft SQL Server\100\Tools\Shell | PollingInterval = 600 (decimal)
(SQL Server 2005)
HKLM\Software\Microsoft\Microsoft SQL Server\90\Tools\Shell | PollingInterval = 600 (decimal)
This will make SSMS connect automatically every minute instead of every few seconds.
See this MS Connect Post for details.
Since it doesn't appear that there's any way to stop these status checks by SSMS, can you focus on helping them to see their harmlessness?
Can the network group allow certain exceptions to this particular rule (pinging servers on port 1433) in their scanning software, which would allow you and your group to monitor SQL Server uptime? Even if you weren't using SSMS, this type of sweeping monitoring activity is pretty common, and you'll know the requests will only ever come from a handful of workstations.
I don't think these SQL status checks generate any more events in the security log than any other activity, so maybe they were just concerned because it was something they weren't expecting. Could the security group be convinced that these events aren't dangerous, again as long as they're coming from certain approved workstations?
If neither of these is an option (or even if it is), you could help mitigate the problem by not connecting to all your SQL servers at once. Maybe just connect to the ones you need at the time - it looks like loading the entire list actively connects to each of them, but just connecting to the ones you intend to use in that session might help reduce the number of network sessions open.
I hope this helps - if it doesn't, or you've got some additional input that might help find a workaround, please post it!
I have RO access on a SQL View. This query below times out. How to avoid this?
select
count(distinct Status)
from
[MyTable] with (NOLOCK)
where
MemberType=6
The error message I get is:
Msg 121, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
Your query is probably fine. "The semaphore timeout period has expired" is a Network error, not a SQL Server timeout.
There is apparently some sort of network problem between you and the SQL Server.
edit: However, apparently the query runs for 15-20 min before giving the network error. That is a very long time, so perhaps the network error could be related to the long execution time. Optimization of the underlying View might help.
If [MyTable] in your example is a View, can you post the View Definition so that we can have a go at optimizing it?
Although there is clearly some kind of network instability or something interfering with your connection (15 minutes is possible that you could be crossing a NAT boundary or something in your network is dropping the session), I would think you want such a simple?) query to return well within any anticipated timeoue (like 1s).
I would talk to your DBA and get an index created on the underlying tables on MemberType, Status. If there isn't a single underlying table or these are more complex and created by the view or UDF, and you are running SQL Server 2005 or above, have him consider indexing the view (basically materializing the view in an indexed fashion).
You could put an index on MemberType.
Please check your Windows system event log for any errors specifically for the "Event Source: Dhcp". It's very likely a networking error related to DHCP. Address lease time expired or so. It shouldn't be a problem related to the SQL Server or the query itself.
Just search the internet for "The semaphore timeout period has expired" and you'll get plenty of suggestions what might be a solution for your problem. Unfortunately there doesn't seem to be the solution for this problem.
Do you have an index defined over the Status column and MemberType column?
how many records do you have? are there any indexes on the table? try this:
;with a as (
select distinct Status
from MyTable
where MemberType=6
)
select count(Status)
from a
My team were experiencing these issues intermittently with long running SSIS packages. This has been happening since Windows server patching.
Our SSIS and SQL servers are on separate VM servers.
Working with our Wintel Servers team we rebooted both servers and for the moment, the problem appears to have gone away.
The engineer has said that they're unsure if the issue is the patches or new VMTools that they updated at the same time. We'll monitor for now and if the timeout problems recur, they'll try rolling back the VMXNET3 driver, first, then if that doesn't work, take off the June Rollup patches.
So for us the issue is nothing to do with our SQL Queries (we're loading billions of new rows so it has to be long running).
This is happen because another instance of sql server is running. So you need to kill first then you can able to login to SQL Server.
For that go to Task Manager and Kill or End Task the SQL Server service then go to Services.msc and start the SQL Server service.
While I would be tempted to blame my issues - I'm getting the same error with my query, which is much, much bigger and involves a lot of loops - on the network, I think this is not the case.
Unfortunately it's not that simple. Query runs for 3+ hours before getting that error and apparently it crashes at the same time if it's just a query in SSMS and a job on SQL Server (did not look into details of that yet, so not sure if it's the same error; definitely same spot, though).
So just in case someone comes here with similar problem, this thread:
https://www.sqlservercentral.com/Forums/569962/The-semaphore-timeout-period-has-expired
suggest that it may equally well be a hardware issue or actual timeout.
My loops aren't even (they depend on sales level in given month) in terms of time required for each, so good month takes about 20 mins to calculate (query looks at 4 years).
That way it's entirely possible I need to optimise my query. I would even say it's likely, as some changes I did included new tables, which are heaps... So another round of indexing my data before tearing into VM config and hardware tests.
Being aware that this is old question: I'm on SQL Server 2012 SE, SSMS is 2018 Beta and VM the SQL Server runs on has exclusive use of 132GB of RAM (30% total), 8 cores, and 2TB of SSD SAN.