CPU Usage to 100% - sql

I have Oracle 11G R2 running on M-4000 machine (supposedly a powerful machine). Recently, I noticed that my application has gone slow and is taking lot of time in quering from database. To my shock when I saw the statistics of DB machine I found the CPU usage to 100%.
Here is the ash report.
Now can someone put me wise to what should I be doing to avoid such situation.

Those queries that are doing a 'table access full' may be your problem... any full table scan will kill a query and can usually be resolved by adding a simple index. You can profile your queries, and tools will recommend indexes to add in order to improve execution of certain queries. I think I did this with Squirrel on an oracle db.
Also, your IDs seem to be strings and you're doing a 'lower(id) like :3'. This should be changed to use integers, or at the very least get rid of the lower and do a match on '3'.

Related

Different execution plan for the same query

I have two identical SQL Databases that contain nearly the same records in each of their tables. The only difference between them is that one lives on my local machine and the other one is in Azure. Yet, after investigating a performance issue I found out that the two databases produce different execution plans for some of the queries. To give you an example, here is a simple query that takes approximately 1 second to run.
select count(*) from Deposits
inner join households on households.id = deposits.HouseholdId
where CashierId = 'd89c8029-4808-4cea-b505-efd8279dc66d'
It is obvious that the inner join needs to be omitted as it doesn't contribute to the end result. Indeed, it is omitted on my local machine but this is not the fact for Azure. Here are two pictures visualizing the execution plans for my local machine and Azure, respectively.
Just to give you some background on what happened, everything worked perfectly until I scaled down my Azure database to Basic 5 DTUs. Afterwards, some queries became extremely slow and I had no idea why. I scaled up the Db instance again but I saw no improvement. Then I rebuilt the indexes and noticed that if I rebuild them in the correct order, the queries will once again start performing as expected. Yet, I have absolutely no idea why I need to rebuild them in some specific order and, even less, how to determine the correct order. Now, I have issues with virtually all queries related to the Deposits table. I tried rebuilding the indexes but I saw no improvement whatsoever. I suspect that there is something to do with the PK index but I am not quite sure. The table contains approximately 300k rows.
Your databases may have the same schemas and approximate number of records, but it's tough to make them identical. Are you sure your databases are identical?
SELECT SERVERPROPERTY(N'ProductVersion');
What about the hardware they are running on? CPU? Memory? Disks? I mean it's Azure, right? Hard to know what actual server hardware you are using. SQL Server's query optimizer will adjust for hardware differences. Additionally, even if the hardware and software were identical... simply the fact that databases are used differently can make them have differences in statistics. The first time you run a query, it is evaluated, and optimized using statistics. All subsequent calls of that query will use that initially cached query plan. Tables change over time, they get taller. The shape of the data changes, meaning an old cached query plan can eventually fall out of favor. Certain things can reshape the data and enact a change in statistics which in turn invalidate the query plan cache, such as such as rebuilding your indices. Try this. To force a fresh query plan on each statement, add a
OPTION (RECOMPILE)
statement to the bottom of your queries. Does that help or stabilize performance? Furthermore - this is a strech - but can I assume that you aren't using that exact same query over and over? It makes more sense that you haven't hardcoded that GUID and that we really have created a query plan for something that has #CashierID as parameter? If so, then your existing query plan could be victim of parameter sniffing, where the query plan you're pulling was optimized for some specific GUID and does poorly when you pass in anything else.
For more info about what that statement does, have a look here. For more understanding on why it's hard to have identical databases take a look here and here.
Good luck! Hope you can get it sorted.

PostgreSQL decrease performance after migration

After PostgreSQL database server v9.3 migration to v9.6 I noticed a decreace in the performance of the entire system. The config parameters are the same as in v9.3 taking in to account the next parameters:
shared_buffers = 10000MB
work_men= 64MB
maintenance_work_men = 1024MB
Also I tried to monitor some resources, and this is the result
total used free shared buff/cache available
Mem: 31G 385M 4.5G 10G 26G 19G
Swap: 3.0G 0B 3.0G
Also when I run some queries the server internally launches queries like these ones:
select typname from pg_type where oid=1043
set search path to public
deallocate pdo_stmt_0000000e
And then runs my query but I'm afraid that here is some impact in performance after migration. I have another 9.6 server with a fresh install no migration and it's not presenting that problem (response time). It seems to be expending too much time in those queries.
Do you have any tip or advice on how to fix this?
I did it with pg_upgrade, but I noticed that in the process some data doesn't migrate to v9.6 server. After that I did a dump/restore process and vacuum analyze.
In our case, we neglected to:
ANALYZE the database
Postgres specifically might need it after a larger migration.
For example when upgrading django 2.2 to 3.2 and all of the id field types are changed from AutoField to BigAutoField
you could install the pg_stat_statements extension on the slow and the fast system and compare the performance of the top queries in both systems. When there are major differences for the time/execution you can check the execution plans (using explain analyze).
Sometimes new features have a major performance impact after an upgrade. If my memory serves me well the parallel sequential scan - https://blog.2ndquadrant.com/postgresql96-parallel-sequential-scan/ - has been added in 9.6. Though this is basically a great feature, there are some situations in which its use may result in a slowdown of queries. This could be a reason to set parallel_setup_cost (or other planner parameters) to a different value to avoid inefficient parallel sequential scans.
Edited later: As I see in https://www.postgresql.org/docs/9.6/release-9-6.html the parallel query execution has not been activated by default, so it's probably not the reason of the slowdown in your situation. Still I think that only an analysis of the performance of the top queries and their plans may shed light on the issue.

Should I move to NoSQL? (big data)

I'm currently researching a very large table (~100 million rows, 35 columns), it's currently stored in SQL db, but the queries I'm running (and they're various) run very, very slow..
so I get it I should probably move to NoSQL db. question is:
How can I tell which (NoSQL) db is best for me?
How can I move my current SQL table to the new NoSQL scheme?
OR should I stay in SQL and just fine tune it?
A few more details: rows will not be added/removed, this is historical data and all of the analysis will be done on that table. plan to run various queries on it. data is numerical.
I routinely work with a SQL Server 2012 table that has 900 million rows. This table has rows being added to it about every 2 minutes with a total of about 200K per day. I can query this table and get rows back in a couple seconds (using the clustered index / PK). I can also query on one of the other indexes and get results back in seconds or less.
So, it's all a matter of making sure your indexes are set up correctly, AND BEING USED!! Check your queries against the query plan being generated and make sure seeks are being done.
There could be good reasons for moving to NoSQL, or something similar. But moving to NoSQL because you think you can't get good performance in SQL Server, before making sure you've done everything you can do to improve performance first, is not a good reason.
Some food for thought:
100M rows is well within SQL's "sweet spot". You can grow by x10 and still be assured that SQL will be able to support you with fairly trivial effort.
NoSQL is not a silver bullet for solving performance problems at scale. It offers a set of tradeoffs which, with careful planning, can provide better results. But if sounds like you don't fully understand your performance issues in SQL, and without that your chances of making the correct design decisions in a NoSQL environment are slim.
One of the common tradeoffs in NoSQL systems is that they typically provide less flexibilty in querying, in return for greater flexibility in schema management. You mentioned your queries are "various"- if they are truly varied, or more importantly- frequently changing - then moving to a NoSQL system can put you in a world of pain. Especially if you are not familiar with the technology yet.
Bottom line- You aren't doing anything which is clearly "beyond" the capabilities of SQL, and your problems are probably caused more by inefficient implementation than by any inherent platform limitations. Moving to a NoSQL system won't magically solve any of your problems, and will probably introduce new ones.
If you are running a query on columns that are not indexed you will be very slow. You can add more indexes to speed them up. If your DB is static this should work.
One major speed up is the usage of map-reduce queries, where aggregations are carried out by multiple processes or computers. NoSQL databases like MongoDB can be used in such ways. But even MySQL has Cluster capabilities nowadays: http://www.mysql.de/products/cluster/scalability.html. SQL Server can be clustered as well.
So I guess the best first shot would be to optimize your indexes in the table to the query. Each argument column to the query (compare, count ...) etc. should be indexed.
If this is not doing any better you probably count and calculate a lot and you should use map-reduce jobs and a DB which can handle this like MongoDB: http://docs.mongodb.org/manual/aggregation/
I hope this helps

Slow SQL query processing 5000 rows, which SQL Server tool can help me identify the issue?

I'm writing a heavy SQL query using SQL Server 2008. When the query process 100 rows, it finished instantly. When the query process 5000 rows it takes about 1.1 minutes.
I used the Actual Execution Plan to check its performance while it processing 5000 rows. The query contained 18 sub-queries,
there is no significate higher percentage of query cost shown in the plan, e.g. all around 0%,2%,5%,7%. The highest one is 11%.
The screenshot below shows the highest process in the query. (e.g.94% of 11%)
I also used the Client Statistic Tool, Trial 10 shows when it process 5000 rows, Trial 9 shows when it process 100 rows.
Can anybody tell me where (or which SQL Server Tool) I can find the data/detail that indicates the slow process when the query execute 5000 rows?
Add:
Indexes, keys are added.
The actual exe plan shows no comment and no high percentage on each sub-query.
I just found 'Activity Monitor' shows one sub-query's 'Average Duration' is 40000ms in 'Recent Expansive Queries', while the actual plan shows this query takes only 5% cost of total process.
Thanks
For looking at performance, using the database tuning advisor and/or the missing index DMVs then examining the execution plan either in management studio or with something like sql sentry plan explorer should be enough to give you an indication where you need to make modifications.
Understanding the execution plan and how the physical operators relate to your logical operations is the biggest key to performance tuning, that and a good understanding of indexes and statistics.
I don't think there is any tool that will just automagically fix performance for you.
While I do believe that learning the underpinnings of the execution plan and how the SQL Server query optimizer operates is an essential requirement to be a good database developer, and humans are still way better at diagnosing and fiddling with SQL to get it right than most tools native or third-party, there is in fact a tool which SQL Server Management Studio provides which can (sometimes) "automagically" fix performance for you:
Database Engine Tuning Advisor
Which you can access via the ribbon menu under Query -> Analyze Query Using Database Engine Tuning Advisor OR (more helpfully) by selecting your query, right-clicking on the selection, and choosing Analyze Query using Database Engine Tuning Advisor, which gives the added bonus of automatically filtering down to only the database objects being used by your query.
All the tuning advisor actually does is investigates to see if there are any indexes or statistics that could be added to your objects. It then "recommends" them and you can apply none, some, or all of them if you choose.
Caveat emptor alert! All of its recommendations are geared towards making that particular query run faster, so what it definitely does not do is help make you good decisions about the consequences of adding an index that only gets used by maybe one or two queries but has to be updated constantly when you add data to your database. This is a SQL anti-pattern known as "index shotgunning" and is generally frowned upon by DBAs, who would rather see a query rewritten to take advantage of more useful indexes.

What is the purpose for using OPTION(MAXDOP 1) in SQL Server?

I have never clearly understood the usage of MAXDOP. I do know that it makes the query faster and that it is the last item that I can use for Query Optimization.
However, my question is, when and where it is best suited to use in a query?
As Kaboing mentioned, MAXDOP(n) actually controls the number of CPU cores that are being used in the query processor.
On a completely idle system, SQL Server will attempt to pull the tables into memory as quickly as possible and join between them in memory. It could be that, in your case, it's best to do this with a single CPU. This might have the same effect as using OPTION (FORCE ORDER) which forces the query optimizer to use the order of joins that you have specified. IN some cases, I have seen OPTION (FORCE PLAN) reduce a query from 26 seconds to 1 second of execution time.
Books Online goes on to say that possible values for MAXDOP are:
0 - Uses the actual number of available CPUs depending on the current system workload. This is the default value and recommended setting.
1 - Suppresses parallel plan generation. The operation will be executed serially.
2-64 - Limits the number of processors to the specified value. Fewer processors may be used depending on the current workload. If a value larger than the number of available CPUs is specified, the actual number of available CPUs is used.
I'm not sure what the best usage of MAXDOP is, however I would take a guess and say that if you have a table with 8 partitions on it, you would want to specify MAXDOP(8) due to I/O limitations, but I could be wrong.
Here are a few quick links I found about MAXDOP:
Books Online: Degree of Parallelism
General guidelines to use to configure the MAXDOP option
This is a general rambling on Parallelism in SQL Server, it might not answer your question directly.
From Books Online, on MAXDOP:
Sets the maximum number of processors
the query processor can use to execute
a single index statement. Fewer
processors may be used depending on
the current system workload.
See Rickie Lee's blog on parallelism and CXPACKET wait type. It's quite interesting.
Generally, in an OLTP database, my opinion is that if a query is so costly it needs to be executed on several processors, the query needs to be re-written into something more efficient.
Why you get better results adding MAXDOP(1)? Hard to tell without the actual execution plans, but it might be so simple as that the execution plan is totally different that without the OPTION, for instance using a different index (or more likely) JOINing differently, using MERGE or HASH joins.
As something of an aside, MAXDOP can apparently be used as a workaround to a potentially nasty bug:
Returned identity values not always correct
There are a couple of parallization bugs in SQL server with abnormal input. OPTION(MAXDOP 1) will sidestep them.
EDIT: Old. My testing was done largely on SQL 2005. Most of these seem to not exist anymore, but every once in awhile we question the assumption when SQL 2014 does something dumb and we go back to the old way and it works. We never managed to demonstrate that it wasn't just a bad plan generation on more recent cases though since SQL server can be relied on to get the old way right in newer versions. Since all cases were IO bound queries MAXDOP 1 doesn't hurt.
Adding my two cents, based on a performance issue I observed.
If simple queries are getting parellelized unnecessarily, it can bring more problems than solving one. However, before adding MAXDOP into the query as "knee-jerk" fix, there are some server settings to check.
In Jeremiah Peschka - Five SQL Server Settings to Change, MAXDOP and "COST THRESHOLD FOR PARALLELISM" (CTFP) are mentioned as important settings to check.
Note: Paul White mentioned max server memory aslo as a setting to check, in a response to Performance problem after migration from SQL Server 2005 to 2012. A good kb article to read is Using large amounts of memory can result in an inefficient plan in SQL Server
Jonathan Kehayias - Tuning ‘cost threshold for parallelism’ from the Plan Cache helps to find out good value for CTFP.
Why is cost threshold for parallelism ignored?
Aaron Bertrand - Six reasons you should be nervous about parallelism has a discussion about some scenario where MAXDOP is the solution.
Parallelism-Inhibiting Components are mentioned in Paul White - Forcing a Parallel Query Execution Plan