Degree of Parallelism in oracle index affects the DB load

Degree of Parallelism in oracle index affects the DB load - sql

I have 4 DB RAC's in a server.
Executing a query on a particular index causes huge load and affects the other RAC's as well. PFB the degree of the index,
DEGREE : 10
INSTANCES : 1
Will decreasing the degree of this index fix the problem? Please advice!

Without having all the details about your case, my guess would be that you have AMM (Automatic Memory Management) disabled. With AMM disabled, parallel queries cause stress on your Shared Pool which can noticeably stall other queries, whereas the amount of stress grows exponentially (example: parallel 2 might be fine while parallel 90 might stall all nodes for 10 seconds each time the parallel 90 query is started). So based on the limited knowledge of your problem, my first two suggestions would be to check whether you have AMM on. If not, either switch it on, or try to reduce the parallelism of your queries to e.g. 4. PS: Parallelism of statistic gathering is sometimes overlooked as a source of parallel queries in great numbers with high parallelism. HTH

Related

How do I compare two SQL queries to run on Postgres

I need to compare two queries that will run in my Postgres database.
How do I know the execution time and any other statistics of them so I can produce a reliable benchmark between them?

I can think of two interesting data points to collect and compare:
The execution time.
For that, simply execute the query using psql connected via UNIX sockets (to factor out the network) and use psql's \timing command to measure the execution time as seen on the client.
Do not use EXPLAIN (ANALYZE) for that since that would add notable overhead which affects your measurements.
Make sure to run the query several times to get a reliable number. That number will correspond to the execution time with a warm cache.
If you want to measure execution time with a cold cache, restart PostgreSQL and empty the file system cache.
The number of blocks touched by the query.
For that, run EXPLAIN (ANALYZE, BUFFERS) once for each query.
The number of blocks touched is significant for performance: the fewer blocks a query touches, the faster it will (often) be. This number is particularly significant for performance with a cold cache; the fewer blocks, the less execution time will depend on caching.

SQL Server query performance slows over time

I've seen this question asked in many ways all over the Internet but despite implementing the abundance of advice (and some voodoo), I'm still struggling. I have a 100GB+ database that is constantly inserting and updating records in very large transactions (200+ statements per trans). After a system restart, the performance is amazing (data is written to a large SATA III SSD connected via USB 3.0). The SQL Server instance is running on a VM running under VMWare Workstation. The host is set to hold the entire VM in memory. The VM itself has a paging cache of 5000 MB. The SQL Server user is set to 'hold pages in memory'. I have 5 GBs of RAM allocated to the VM, and the max memory of the SQL Server instance is set to half a Gig.
I have played with every single one of these parameters to attempt to maintain consistent performance, but sure and steady, the performance eventually degrades to the point where it begins to time out. Here's the kicker though, if I stop the application that's loading the database, and then execute the stored proc in the Management Studio, it runs like lightning, clearly indicating it's not an issue with the query, and probably nothing to do with memory management or paging. If I then restart the loader app, it still crawls. If I reboot the VM however, the app once again runs like lightning...for a while...
Does anybody have any other suggestions based upon the symptoms presented?

Depending on how large your hot set is, 5GB memory may just tax it for a 100+gb database.
Check indices and query plans. We can not help you without them. And I bet you miss some indices - which is the standard performance issue people have.
Otherwise, once you made your homework - head over to dba.stackexchange.com and ask there.
Generally - consider that 200 statements per transaction may simply indicate a seriously sub-optimal programming. For example you could bulk-load the data into a temp table then merge into the final one.

Actually, I may have a working theory. What I did was add some logic to the app that when it times out, sit for two minutes, and then try again, and voila! Back to full speed. I rubber-ducky'd my co-worker and came up with the concept that my perceived SSD write speeds were actually the write speed to the VMWare host's virtual USB 3 buffer, and that the actual SSD write speeds were slower. I'm probably hitting against the host's buffer size and by forcing the app to wait 2 minutes, the host has a chance to dump its back-buffered data to the SSD. Elementary, Watson :)
If this approach also fails to be sustainable, I'll report in.

Try executing this to determine your problem queries:
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
FROM
sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
ORDER BY qs.total_worker_time desc
Then check your estimated and actual execution plans on the queries this command helps you pinpoint.
Source How do I find out what is hammering my SQL Server? and at the bottom of the page of http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx

Beyond the excellent indexing suggestions already given,
be sure to read up on parameter sniffing. That could be the cause of the problem.
SQL Server - parameter sniffing
http://www.sommarskog.se/query-plan-mysteries.html#compileps
As a result you could have a bad query plan being re-used, or SQL's buffer could be getting full and writing pages out of memory to disk (maybe that's other allocated memory in your case).
You could run DBCC FreeProcCache and DBCC FreeSystemCache to empty it and see if you get a performance boost.
You should give SQL more memory too - as much as you can while leaving room for other critical programs and the OS. You might have 5gb of Ram on the VM, but SQL is only getting to play with a 1/2 gb, which seems REALLY small for what you're describing.
If those things don't move you in the right direction, install the SQL Management Data Warehouse so you can see exactly what is happening when your slow down begins. Running it takes up additional memory, but you will give the DBA's more to go on.

In the end, what I did was a combination of two things, putting in logic to recover when timeouts occurred, and setting the host core count to only reflect physical cores, not logical cores, so for example, the host has 2 cores that are hyper-threaded. When I set my VM to use 4 cores, it occasionally gets hung in some infinite loop, but when I set it to 2 cores, it runs without fail. Still, aberrant behavior like this is difficult to mitigate reliably.

Why is the n+1 selects pattern slow?

I'm rather inexperienced with databases and have just read about the "n+1 selects issue". My follow-up question: Assuming the database resides on the same machine as my program, is cached in RAM and properly indexed, why is the n+1 query pattern slow?
As an example let's take the code from the accepted answer:
SELECT * FROM Cars;
/* for each car */
SELECT * FROM Wheel WHERE CarId = ?
With my mental model of the database cache, each of the SELECT * FROM Wheel WHERE CarId = ? queries should need:
1 lookup to reach the "Wheel" table (one hashmap get())
1 lookup to reach the list of k wheels with the specified CarId (another hashmap get())
k lookups to get the wheel rows for each matching wheel (k pointer dereferenciations)
Even if we multiply that by a small constant factor for an additional overhead because of the internal memory structure, it still should be unnoticeably fast. Is the interprocess communication the bottleneck?
Edit: I just found this related article via Hacker News: Following a Select Statement Through Postgres Internals. - HN discussion thread.
Edit 2: To clarify, I do assume N to be large. A non-trivial overhead will add up to a noticeable delay then, yes. I am asking why the overhead is non-trivial in the first place, for the setting described above.

You are correct that avoiding n+1 selects is less important in the scenario you describe. If the database is on a remote machine, communication latencies of > 1ms are common, i.e. the cpu would spend millions of clock cycles waiting for the network.
If we are on the same machine, the communication delay is several orders of magnitude smaller, but synchronous communication with another process necessarily involves a context switch, which commonly costs > 0.01 ms (source), which is tens of thousands of clock cycles.
In addition, both the ORM tool and the database will have some overhead per query.
To conclude, avoiding n+1 selects is far less important if the database is local, but still matters if n is large.

Assuming the database resides on the same machine as my program
Never assume this. Thinking about special cases like this is never a good idea. It's quite likely that your data will grow, and you will need to put your database on another server. Or you will want redundancy, which involves (you guessed it) another server. Or for security, you might want not want your app server on the same box as the DB.
why is the n+1 query pattern slow?
You don't think it's slow because your mental model of performance is probably all wrong.
1) RAM is horribly slow. Your CPU is wasting around 200-400 CPU cycles each time it needs to read something something from RAM. CPUs have a lot of tricks to hide this (caches, pipelining, hyperthreading)
2) Reading from RAM is not "Random Access". It's like a hard drive: sequential reads are faster.
See this article about how accessing RAM in the right order is 76.6% faster http://lwn.net/Articles/255364/ (Read the whole article if you want to know how horrifyingly complex RAM actually is.)
CPU cache
In your "N+1 query" case, the "loop" for each N includes many megabytes of code (on client and server) swapping in and out of caches on each iteration, plus context switches (which usually dumps the caches anyway).
The "1 query" case probably involves a single tight loop on the server (finding and copying each row), then a single tight loop on the client (reading each row). If those loops are small enough, they can execute 10-100x faster running from cache.
RAM sequential access
The "1 query" case will read everything from the DB to one linear buffer, send it to the client who will read it linearly. No random accesses during data transfer.
The "N+1 query" case will be allocating and de-allocating RAM N times, which (for various reasons) may not be the same physical bit of RAM.
Various other reasons
The networking subsystem only needs to read one or two TCP headers, instead of N.
Your DB only needs to parse one query instead of N.
When you throw in multi-users, the "locality/sequential access" gets even more fragmented in the N+1 case, but stays pretty good in the 1-query case.
Lots of other tricks that the CPU uses (e.g. branch prediction) work better with tight loops.
See: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/13/10533875.aspx

Having the database on a local machine reduces the problem; however, most applications and databases will be on different machines, where each round trip takes at least a couple of milliseconds.
A database will also need a lot of locking and latching checks for each individual query. Context switches have already been mentioned by meriton. If you don't use a surrounding transaction, it also has to build implicit transactions for each query. Some query parsing overhead is still there, even with a parameterized, prepared query or one remembered by string equality (with parameters).
If the database gets filled up, query times may increase, compared to an almost empty database in the beginning.
If your database is to be used by other application, you will likely hammer it: even if your application works, others may slow down or even get an increasing number of failures, such as timeouts and deadlocks.
Also, consider having more than two levels of data. Imagine three levels: Blogs, Entries, Comments, with 100 blogs, each with 10 entries and 10 comments on each entry (for average). That's a SELECT 1+N+(NxM) situation. It will require 100 queries to retrieve the blog entries, and another 1000 to get all comments. Some more complex data, and you'll run into the 10000s or even 100000s.
Of course, bad programming may work in some cases and to some extent. If the database will always be on the same machine, nobody else uses it and the number of cars is never much more than 100, even a very sub-optimal program might be sufficient. But beware of the day any of these preconditions changes: refactoring the whole thing will take you much more time than doing it correctly in the beginning. And likely, you'll try some other workarounds first: a few more IF clauses, memory cache and the like, which help in the beginning, but mess up your code even more. In the end, you may be stuck in a "never touch a running system" position, where the system performance is becoming less and less acceptable, but refactoring is too risky and far more complex than changing correct code.
Also, a good ORM offers you ways around N+1: (N)Hibernate, for example, allows you to specify a batch-size (merging many SELECT * FROM Wheels WHERE CarId=? queries into one SELECT * FROM Wheels WHERE CarId IN (?, ?, ..., ?) ) or use a subselect (like: SELECT * FROM Wheels WHERE CarId IN (SELECT Id FROM Cars)).
The most simple option to avoid N+1 is a join, with the disadvantage that each car row is multiplied by the number of wheels, and multiple child/grandchild items likely ending up in a huge cartesian product of join results.

There is still overhead, even if the database is on the same machine, cached in RAM and properly indexed. The size of this overhead will depend on what DBMS you're using, the machine it's running on, the amount of users, the configuration of the DBMS (isolation level, ...) and so on.
When retrieving N rows, you can choose to pay this cost once or N times. Even a small cost can become noticeable if N is large enough.
One day someone might want to put the database on a separate machine or to use a different dbms. This happens frequently in the business world (to be compliant with some ISO standard, to reduce costs, to change vendors, ...)
So, sometimes it's good to plan for situations where the database isn't lightning fast.
All of this depends very much on what the software is for. Avoiding the "select n+1 problem" isn't always necessary, it's just a rule of thumb, to avoid a commonly encountered pitfall.

Query Cost vs. Execution Speed + Parallelism

My department was recently reprimanded (nicely) by our IT department for running queries with very high costs on the premise that our queries have a real possibility of destabilizing and/or crashing the database. None of us are DBA's; were just researchers who write and execute queries against the database, and I'm probably the only one who ever looked at an explain plan before the reprimand.
We were told that query costs over 100 should be very rare, and queries with costs over 1000 should never be run. The problems I am running into are that cost seems have no correlation with execution time, and I'm losing productivity while trying to optimize my queries.
As an example, I have a query that executes in under 5 seconds with a cost of 10844. I rewrote the query to use a view that contains most of the information I need, and got the cost down to 109, but the new query, which retrieves the same results, takes 40 seconds to run. I found a question here with a possible explanation:
Measuring Query Performance : "Execution Plan Query Cost" vs "Time Taken"
That question led me to parallelism hints. I tried using /*+ no_parallel*/ in the cost 10884 query, but the cost did not change, nor did the execution time, so I'm not sure that parallelism is the explanation for the faster execution time but higher cost. Then, I tried using the /*+ parallel(n)*/ hint, and found that the higher the value of n, the lower the cost of the query. In the case of cost 10844 query, I found that /*+ parallel(140)*/ dropped the cost to 97, with a very minor increase in execution time.
This seemed like an ideal "cheat" to meet the requirements that our IT department set forth, but then I read this:
http://www.oracle.com/technetwork/articles/datawarehouse/twp-parallel-execution-fundamentals-133639.pdf
The article contains this sentence:
Parallel execution can enable a single operation to utilize all system resources.
So, my questions are:
Am I actually placing more strain on the server resources by using the /*+ parallel(n)*/ hint with a very high degree of parallelism, even though I am lowering the cost?
Assuming no parallelism, is execution speed a better measure of resources used than cost?

The rule your DBA gave you doesn't make a lot of sense. Worrying about the cost that is reported for a query is very seldom productive. First, you cannot directly compare the cost of two different queries-- one query that has a cost in the millions may run very quickly and consume very few system resources another query that has a cost in the hundreds may run for hours and bring the server to its knees. Second, cost is an estimate. If the optimizer made an accurate estimate of the cost, that strongly implies that it has come up with the optimal query plan which would mean that it is unlikely that you'd be able to modify the query to return the same results while using fewer resources. If the optimizer made an inaccurate estimate of the cost, that strongly implies that it has come up with a poor query plan in which case the reported cost would have no relationship to any useful metric you'd come up with. Most of the time, the queries you're trying to optimize are the queries where the optimizer generated an incorrect query plan because it incorrectly estimated the cost of various steps.
Tricking the optimizer by using hints that may or may not actually change the query plan (depending on how parallelism is configured, for example) is very unlikely to solve a problem-- it's much more likely to cause the optimizer's estimates to be less accurate and make it more likely that it chooses a query plan that consumes far more resources than it needs to. A parallel hint with a high degree of parallelism, for example, would tell Oracle to drastically reduce the cost of a full table scan which makes it more likely that the optimizer would choose that over an index scan. That is seldom something that your DBAs would want to see.
If you're looking for a single metric that tells you whether a query plan is reasonable, I'd use the amount of logical I/O. Logical I/O is correlated pretty well with actual query performance and with the amount of resources your query consumes. Looking at execution time can be problematic because it varies significantly based on what data happens to be cached (which is why queries often run much faster the second time they're executed) while logical I/O doesn't change based on what data is in cache. It also lets you scale your expectations as the number of rows your queries need to process change. If you're writing a query that needs to aggregate data from 1 million rows, for example, that should consume far more resources than a query that needs to return 100 rows of data from a table with no aggregation. If you're looking at logical I/O, you can easily scale your expectations to the size of the problem to figure out how efficient your queries could realistically be.
In Christian Antognini's "Troubleshooting Oracle Performance" (page 450), for example, he gives a rule of thumb that is pretty reasonable
5 logical reads per row that is returned/ aggregated is probably very good
10 logical reads per row that is returned/ aggregated is probably adequate
20+ logical reads per row that is returned/ aggregated is probably inefficient and needs to be tuned
Different systems with different data models may merit tweaking the buckets a bit but those are likely to be good starting points.
My guess is that if you're researchers that are not developers, you're probably running queries that need to aggregate or fetch relatively large data sets, at least in comparison to those that application developers are commonly writing. If you're scanning a million rows of data to generate some aggregate results, your queries are naturally going to consume far more resources than an application developer whose queries are reading or writing a handful of rows. You may be writing queries that are just as efficient from a logical I/O per row perspective, you just may be looking at many more rows.
If you are running queries against the live production database, you may well be in a situation where it makes sense to start segregating workload. Most organizations reach a point where running reporting queries against the live database starts to create issues for the production system. One common solution to this sort of problem is to create a separate reporting database that is fed from the production system (either via a nightly snapshot or by an ongoing replication process) where reporting queries can run without disturbing the production application. Another common solution is to use something like Oracle Resource Manager to limit the amount of resources available to one group of users (in this case, report developers) in order to minimize the impact on higher priority users (in this case, users of the production system).

Improving Solr performance

I have deployed a 5-sharded infrastructure where:
shard1 has 3124422 docs
shard2 has 920414 docs
shard3 has 602772 docs
shard4 has 2083492 docs
shard5 has 11915639 docs
Indexes total size: 100GB
The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I run the server using Jetty (from Solr example download) with:
java -Xmx3024M -Dsolr.solr.home=multicore -jar start.jar
The response time for a query is around 2-3 seconds. Nevertheless, if I execute several queries at the same time the performance goes down inmediately:
1 simultaneous query: 2516ms
2 simultaneous queries: 4250,4469 ms
3 simultaneous queries: 5781, 6219, 6219 ms
4 simultaneous queries: 6484, 7203, 7719, 7781 ms...
Using JConsole for monitoring the server java proccess I checked that Heap Memory and the CPU Usages don't reach the upper limits so the server shouldn't perform as overloaded. Can anyone give me an approach of how I should tune the instance for not being so hardly dependent of the number of simultaneous queries?
Thanks in advance

You may want to consider creating slaves for each shard so that you can support more reads (See http://wiki.apache.org/solr/SolrReplication), however, the performance you're getting isn't very reasonable.
With the response times you're seeing, it feels like your disk must be the bottle neck. It might be cheaper for you just to load up each shard with enough memory to hold the full index (20GB each?). You could look at disk access using the 'sar' utility from the sysstat package. If you're consistently getting over 30% disk utilization on any platter while searches are ongoing, thats a good sign that you need to add some memory and let the OS cache the index.
Has it been awhile since you've run an optimize? Perhaps part of the long lookup times is a result of a heavily fragmented index spread all over the platter.

As I stated on the Solr mailinglist, where you asked same question 3 days ago, Solr/Lucene benefits tremendously from SSD's. While sharding on more machines or adding bootloads of RAM will work for I/O, the SSD option is comparatively cheap and extremely easy.
Buy an Intel X25 G2 ($409 at NewEgg for 160GB) or one of the new SandForce based SSD's. Put your existing 100GB of indexes on it and see what happens. That's half a days work, tops. If it bombs, scavenge the drive for your workstation. You'll be very happy with the performance boost it gives you.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas