SQL Server response time increasing - sql

I'm currently working on updating an ERP called Sage 1000 to a newer version. The main change occurred along the upgrade is that the ERP solution and its database became on two different server. Ever since, The response time has significantly increased.
The part related to SQL Server is the following: whenever the time response problem happens, I restart the SQL Server service and it starts working just fine, at least for a couple of days until I have to restart it again.
Within the task manager before restarting the service I can see that it consumes nearly 17 to 20 gigabytes of memory witch is an insane amount even for a server run service, after restart, the amount of memory drops to a normal 2 to 3 gigabytes.
So my question is the following: in your experience what could be the cause of such abnormal behavior? Right now I created a planned task to restart the service everyday at 3:00 am, but it's not a radical solution.
Thanks in advance for your help,

Related

Random delays on opening connection to SQL Server

In one of our systems we experience random delays when opening a connection to SQL Server.
The system is running Windows Server 2012 R2 Standard and SQL Server 2012, located on the same physical machine as our application.
Even when our application is idle, it is executing DB operations once every few seconds on average.
DB operations our application executes usually consist of 3 steps:
open a connection to SQL server
run a stored procedure
close the connection
Normally the first step takes a tiny fraction of second, while running a stored procedure may take much longer, depending on many factors.
The problem: opening a connection may randomly take 5-13 seconds. This only happens rarely, once in a few hours, even once in a day.
In other words this could happen once per a few thousand DB operations. We have not detected any discernable pattern in the timing of these delays.
There is nothing suspicious in the SQL Server log files.
Running SQL Server profiler does not seem practical, as the fault may not be exhibited for 10-20 hours.
We have not seen this phenomenon on any other machine.
It looks like we've fixed the problem. Somewhere I read a recommendation to try to use SQL Server authentication instead of Windows authentication. The problem discussed there was not exactly the same as ours but somewhat similar. Since connection string is used in every Open Connection operation, I decided to give it a try. As a result, our application has been working for 3 days in a row by now and there has not been a single incident of opening connection being slow. To put this in a context, before this fix we had several incidents in 24 hours on average, and not a single incident-free 24-hour period for the last two months.

SQL timeouts at 3 minutes past full hour

I have a problem that has puzzled me for a while now. Once in a while, say 4-5 times a week we get timeouts from the database at HH:03 (or HH:02 sometimes I think).
I've been digging into the scheduled tasks on the server to investigate if there is something that puts the server to it's knees in performance without any findings.
I've also gone so fas so that I've made a watchdog for the application so that when the query only has 1 seconds left of it's max query time it checks the processlist for the database and emails this to me. The process list always contains just one entry and that's the entry that is about to get an timeout exception.
To further add to the complexity we have many customers to this application but it's only one of the customers that get this timeout. All customers run the same code but have different databases, different application pools with different application pool identities.
The application is an ASP.net application. The database is a Microsoft SQL 2008 R2 Express edition.
Has anyone heard of something like this? Can anyone give me any pointers about what to investigate in order to resolve this issue?
Kind regards

SQL server 2008 replication without reinitialize

I have two databases in different servers - center_db on siglv01\sql2008 and center_db on sig\sql2008.
Can I restart replication without needing to reinitialize it? The connection dropped more than 3 days ago and is now too slow: so I want to start replication without a reinitialize.
Based on the brief conversation above, I don't think you can do this without a re-init. Specifically, the distribution database only keeps so many commands before it starts trimming. The default is 72 hours. If the last command delivered to all of your subscribers is older than that, the distribution database doesn't have what it needs to play forward all of the activity that has happened since then.
Your only hope would be if the distribution agent is still running (it knows when the above situation happens and will give you an error saying as much). If so, try to figure out why delivery is slow (troubleshoot this like any other "slow application"; replication isn't magic) and see if it can get caught up that way. Depending on how many commands are remain undelivered, it may be faster to just re-init.

What would cause SQL Server to stop writing to the error log?

Error logs for our SQL Server instance are gathering a large amount of data (250k records in a month) all day, then all of a sudden stop at roughly the same time of day (9:15pm), though on different days of the week and at seemingly random intervals of days.
This corresponds to other issues on the server: 1) jobs that move files to shares on the database server fail 2) I am not able to access the server via any method (tried RDP and SSMS). Once the servers are rebooted, SQL Server comes up and SQL Server error logging resumes.
Windows Event Viewer doesn't show any notable error messages for System (the other event logs have wrapped already).
The error logs are being written to the D:\ drive, which has over 100GB free currently. The error log files are in the range of tens of megabytes.
Appreciate any ideas on what might have caused this or how troubleshoot it. Thanks!
The cause appears to have been a corrupted maintenance plan. I discovered this by correlating the timing of the lock-up to the times the maintenance plan was running. The lack of logging made this difficult to confirm. Guessing that at least some parts of it ran normally, but got rolled back on restart.
The current fix was to disable the maintenance plan and replace it with a collection of jobs that do the same tasks. I will likely recreate the original maintenance plan if the server remains stable for another week or two. If we stay stable past that point, it should solidly confirm the maintenance plan as the source of the problem.

FirebirdSQL queries gets stuck at 12:00PM

I'm running Firebird 2.5 (and have also tried earlier versions) on Windows. Every day after 12:00PM running insert/update queries on one specific table hang, but complete successfully by 12:35 or so, no matter when started. It does seem that Firebird is doing some kind of maintenance on the table and it takes half an hour to complete, during which time the table cannot be written to (but the reads are fast). The table itself is really small, some 10000 rows, compared to millions of rows we have in other tables - and other tables do not get stuck.
I haven't been able to find any reason or solution. I tried dumping the table and restoring it, which didn't help, I tried switching between superserver and classic, changed versions with no success.
Has anyone experienced a problem like this?
No. Firebird doesn't have any internal maintenance procedures bind to some specified time of a day. Seems, there is some task on your server scheduled to run at 12:00 PM. Or there are network users of the server who start doing some heavy access at 12:00 PM.
The only maintenance FB does is "garbage collection" (geting rid of old record versions) and this is done on "when needed" basis (usually when records were selected, see the GCPolicy in firebird.conf) not on some predefined time.
Do you experience this hang only on during these certain hours or is it always slow to insert to that table? Have you checked the server load during the slowdown (ie in the task manager, is the CPU maxed out)? Anyway, here is some ideas to check:
What constraints / triggers do you have on the table? If they involve some extensive checks (ie against the other tables which contain millions of rows) this could be the reason inserts take so long.
Perhaps there is some other service which is triggered at that time? Ie do you have a cron job to make backup of the DB at that time? Or perhaps some other system service which runs at that time with higher priority slows down the server?
Do you have trace service active for the table? See fbtrace.conf in FireBird root directory. If it is active, extensive logging might be the cause of slowdown, if it isn't active, using it might help you to find the cause.
What are the setings for ForcedWrites / UnflushedWrites (see firebird.conf)? Does changing them make difference?
Is there something logged for this troublesome timeframe in firebird.log?
To me it looks like you have a process which starts at 12:00 and does something which locks the entire table. Use the monitoring table or the trace manager to see if there is any connection or active transaction which looks suspicious.
I also think your own transaction are started with the WAIT clause without a LOCK TIMEOUT, you might want to change this to NO WAIT or WAIT with a LOCK TIMEOUT, so that your transactions either fail immediately or after the timeout.
My suggestion is to use the TRACE API in 2.5 to track down what is happening near or around that time. That should help get you more information as to what is happening.
I use this for debugging http://upscene.com/products.misc.fbtm.php kinda buggy itself, but when it is working it is a god send.
Are some Client-Connections going DOWN at 12:00 PM? I had a similar problem on a 70.000 records sized table:
Client "A" has a permanently open DB Connection like "select * from TABLE". This is a "read only transaction" but reason enough for the server to generate Record-Versions. Why?
Client "B" made massive Updates to this Table, the Server tries to preserve the world like it was when "A" startet her "select". This is normal for Transaction able DB-Servers, and its implemented by creating Record Copies of the record-data before its updated.
So in my case for this TABLE 170.000 Record Versions existed. You can measure this by
gstat -r -t TABLE db.fdb | grep versions
If Client "B" goes down, the count of Record-Versions is NOT growing any more. Client "A" is the guilty one, freezing all this versions, forces the server to hold it. Finally if Client "A" goes down (or for example a firewall rule cuts all pending connections) Firebird is happy to start the process of getting rid of the now useless Record-Versions.
This "sweep"?! is bad programmed (even 2.5.2) cpu is 3% it do only <10.000 Versions / Minute so this TABLE has a performance of about 2%.