Can highly fragmented indexes cause CPU spike in SQL Server? - sql

I am seeing CPU spikes on database server everyday and I found out that indexes were not rebuilt for quite a while. Could it be a reason of those spikes?

Fragmentation might cause a little more CPU load but it does not cause spikes. Why would it? Look elsewhere. Find out what queries are running during the spikes, and look at long-running queries with lots of CPU.

Yes but it is totally depend on the number of records available in that particular table, Fragmentation can cause CPU spike as well as 100% CPU utilization for server during load.
Because while searching for pages by indexes during load, CPU have 4 milliseconds of quantum for query to execute, if the query executing on CPU required additional pages in the memory(which is not available in memory) due to fragmentation storage engine has to look back and forth in the B+ Tree where pages are scattered leading CPU hast to spill that query in the waiter list, once the data available it moves to the runnable queue(where query waiting for CPU to execute).
Just imagine if the table having huge records then definitely fragmentation would have impact on CPU.

Related

Memory Buffer pool taken by a Table

My server has 250 GB RAM and it's a physical server. Max memory configured to 230 GB when ran a DMV sys.dm_os_buffer_descriptors with joining other DMV, I found a table taking almost 50 GB Buffer pool space. My question is, Is this an Issue? If so what's the best way to tackle it? My PLE is very high, no slowness report. Thanks.
The data most often and recently used will remain buffer pool cache so it is expected that 50GB of table data will be cached when the table and data are used often. Since your PLE is acceptable, there may be no concerns for now.
You may still want to take a look at query plans that use the table in question. It could be that more data than needed is brought into the buffer pool cache due to large scans when fewer pages are actually needed by queries. Query and index tuning may be in order in that case. Tuning will also reduce CPU and other resource utilization, providing headroom for growth and other queries in the workload.

SQL Server CPU utilization vs Long running queries

Correct me if Im wrong,
According to SQL server if a query took more CPU time then it consider as a High cpu consuming query.
My Question - Is all Long running queries are high CPU consuming queries?
Or give me a shot description to identify the difference between them.
My Question - Is all Long running queries are high CPU consuming
queries?
Absolutely not. Queries that are not actively executing (consuming CPU) are waiting on a resource or an operation to complete. This is why analyzing wait statistics is a common performance tuning methodology, which is summarized in Brent Ozar's article.
Common scenarios where long-running queries are not CPU-bound include blocking,
disk I/O and network waits, memory grant waits, unproductive work like hash and sort spills.
Query and index tuning can help mitigate the above problems but performance will ultimately be limited by hardware capabilities. The server must be sized for the expected workload.

SQL Server query performance slows over time

I've seen this question asked in many ways all over the Internet but despite implementing the abundance of advice (and some voodoo), I'm still struggling. I have a 100GB+ database that is constantly inserting and updating records in very large transactions (200+ statements per trans). After a system restart, the performance is amazing (data is written to a large SATA III SSD connected via USB 3.0). The SQL Server instance is running on a VM running under VMWare Workstation. The host is set to hold the entire VM in memory. The VM itself has a paging cache of 5000 MB. The SQL Server user is set to 'hold pages in memory'. I have 5 GBs of RAM allocated to the VM, and the max memory of the SQL Server instance is set to half a Gig.
I have played with every single one of these parameters to attempt to maintain consistent performance, but sure and steady, the performance eventually degrades to the point where it begins to time out. Here's the kicker though, if I stop the application that's loading the database, and then execute the stored proc in the Management Studio, it runs like lightning, clearly indicating it's not an issue with the query, and probably nothing to do with memory management or paging. If I then restart the loader app, it still crawls. If I reboot the VM however, the app once again runs like lightning...for a while...
Does anybody have any other suggestions based upon the symptoms presented?
Depending on how large your hot set is, 5GB memory may just tax it for a 100+gb database.
Check indices and query plans. We can not help you without them. And I bet you miss some indices - which is the standard performance issue people have.
Otherwise, once you made your homework - head over to dba.stackexchange.com and ask there.
Generally - consider that 200 statements per transaction may simply indicate a seriously sub-optimal programming. For example you could bulk-load the data into a temp table then merge into the final one.
Actually, I may have a working theory. What I did was add some logic to the app that when it times out, sit for two minutes, and then try again, and voila! Back to full speed. I rubber-ducky'd my co-worker and came up with the concept that my perceived SSD write speeds were actually the write speed to the VMWare host's virtual USB 3 buffer, and that the actual SSD write speeds were slower. I'm probably hitting against the host's buffer size and by forcing the app to wait 2 minutes, the host has a chance to dump its back-buffered data to the SSD. Elementary, Watson :)
If this approach also fails to be sustainable, I'll report in.
Try executing this to determine your problem queries:
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
FROM
sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
ORDER BY qs.total_worker_time desc
Then check your estimated and actual execution plans on the queries this command helps you pinpoint.
Source How do I find out what is hammering my SQL Server? and at the bottom of the page of http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx
Beyond the excellent indexing suggestions already given,
be sure to read up on parameter sniffing. That could be the cause of the problem.
SQL Server - parameter sniffing
http://www.sommarskog.se/query-plan-mysteries.html#compileps
As a result you could have a bad query plan being re-used, or SQL's buffer could be getting full and writing pages out of memory to disk (maybe that's other allocated memory in your case).
You could run DBCC FreeProcCache and DBCC FreeSystemCache to empty it and see if you get a performance boost.
You should give SQL more memory too - as much as you can while leaving room for other critical programs and the OS. You might have 5gb of Ram on the VM, but SQL is only getting to play with a 1/2 gb, which seems REALLY small for what you're describing.
If those things don't move you in the right direction, install the SQL Management Data Warehouse so you can see exactly what is happening when your slow down begins. Running it takes up additional memory, but you will give the DBA's more to go on.
In the end, what I did was a combination of two things, putting in logic to recover when timeouts occurred, and setting the host core count to only reflect physical cores, not logical cores, so for example, the host has 2 cores that are hyper-threaded. When I set my VM to use 4 cores, it occasionally gets hung in some infinite loop, but when I set it to 2 cores, it runs without fail. Still, aberrant behavior like this is difficult to mitigate reliably.

Improving Solr performance

I have deployed a 5-sharded infrastructure where:
shard1 has 3124422 docs
shard2 has 920414 docs
shard3 has 602772 docs
shard4 has 2083492 docs
shard5 has 11915639 docs
Indexes total size: 100GB
The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I run the server using Jetty (from Solr example download) with:
java -Xmx3024M -Dsolr.solr.home=multicore -jar start.jar
The response time for a query is around 2-3 seconds. Nevertheless, if I execute several queries at the same time the performance goes down inmediately:
1 simultaneous query: 2516ms
2 simultaneous queries: 4250,4469 ms
3 simultaneous queries: 5781, 6219, 6219 ms
4 simultaneous queries: 6484, 7203, 7719, 7781 ms...
Using JConsole for monitoring the server java proccess I checked that Heap Memory and the CPU Usages don't reach the upper limits so the server shouldn't perform as overloaded. Can anyone give me an approach of how I should tune the instance for not being so hardly dependent of the number of simultaneous queries?
Thanks in advance
You may want to consider creating slaves for each shard so that you can support more reads (See http://wiki.apache.org/solr/SolrReplication), however, the performance you're getting isn't very reasonable.
With the response times you're seeing, it feels like your disk must be the bottle neck. It might be cheaper for you just to load up each shard with enough memory to hold the full index (20GB each?). You could look at disk access using the 'sar' utility from the sysstat package. If you're consistently getting over 30% disk utilization on any platter while searches are ongoing, thats a good sign that you need to add some memory and let the OS cache the index.
Has it been awhile since you've run an optimize? Perhaps part of the long lookup times is a result of a heavily fragmented index spread all over the platter.
As I stated on the Solr mailinglist, where you asked same question 3 days ago, Solr/Lucene benefits tremendously from SSD's. While sharding on more machines or adding bootloads of RAM will work for I/O, the SSD option is comparatively cheap and extremely easy.
Buy an Intel X25 G2 ($409 at NewEgg for 160GB) or one of the new SandForce based SSD's. Put your existing 100GB of indexes on it and see what happens. That's half a days work, tops. If it bombs, scavenge the drive for your workstation. You'll be very happy with the performance boost it gives you.

Is defragging tough on replication?

I've been told that defragging causes the log to grow tremendously. Is this true? If so, is there something better to do than defragging that will not impact the log as much? We are running SQL Server 2005 replicating between 2 sites.
There is no 'defrag' in SQL Server. You may be talking about an index reorganize operation or an index rebuild operation. Reorganize is light on log, but index rebuild creates as much log as the size of the index multiplied by a factor. For a large index the rebuild operation may result in log growth.
Having a large log will impact the transactional log reader agent simply because it will have more log records to scan through for a period. Eventually the log reader agent will catch up. The exact numbers (duration of latency, latency size etc) will differ based on a number of factors, your best choice is trial and measurement.
As for alternatives:
Did you measure the index fragmentation factor?
Do you have evidence that performance is affected by fragmentation? Many loads don't care about fragmentation.
Did you analyze the root cause of schema design that leads fragmentation?
If the answers are Yes, Yes and Yes and the conclusion is that periodic index rebuild is unavoidable then there is no alternative, you're going to have to bite the bullet and take this operation into account when calibrating the hardware requirements.