I'm trying to work out the cause of the high DTU on a database (its rank is S2 that is also geo replicated). On a server which is unsure if its V12 or the older (different problem).
Friday last week and this Friday we have a spike that looks like this:
Looking at the resource stats:
SELECT TOP 1000 *
FROM sys.dm_db_resource_stats
ORDER BY end_time DESC
avg CPU kicks around 3-5% during the peak
but most significantly the avg_data_io_percentage is roaming about 72% - 90%
How can I track down the IO further?
Query Performance Insight is quite useful but execution count and cpu could be misleading in this case?
TOP 5 queries per CPU consumption
top 5 during that odd period:
Are the likely offenders the queries that that appear differently in those top five?
Is there a better way to see the IO graph or data? Am I looking at the wrong thing? :D
Thanks in advance.
You can use SSMS and the built-in reports for Query Performance Insights/Query Data Store to look at IO-intensive queries. I suggest connecting to the database using SSMS and looking at the most resource intensive queries using the logical reads, logical writes, and physical reads metrics. You should find your offender in one of these.
Thanks,
Torsten
Related
we have had a single user in our dedicated SQL pool spike the CPU to 100% on her own previously. We believe this is due to her query having multiple subqueries and being in a medium resource class. However, we cannot tell the execution plan. We typically operate at 500 DWU, which has 20 concurrency slots with the medium rc getting two slots.
If the query has, what looks like 4 or 5 subqueries, should we expect that this query to take a total of 10 concurrency slots? Also, how do we look at the execution plan? It doesn't seem the same as normal SQL.
Thank you!
Its very hard to tell as to if the sub queries are the real culprits . To me I think the data plays a bigger role . For the execution plan , not sure if you have gone through this doc , it should help .
I have an application running on Postgres database, sometimes when I have about 8-10 people working on the application, the CPU usage soars high to something between 99-100%, The application was built on Codeigniter framework which I believe had made provision for closing up connections to the database each and every time it is not needed, What could be solution to this problem. I would appreciate any suggestions. Thank you
Basically, what the people do on the application is to running insert queries but at a very fast rate, A person could run between 70 - 90 insert queries in a minute.
I came across with the similar kind of issue. The reason was - some transactions were getting stuck and running since long time. Hence CPU utilization got increased to 100%. Following command helped to find out the connections running for the longest time:
SELECT max(now() - xact_start) FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'active');
This command shows the the amount of time a connection has been running. This time should not be greater than an hour. So killing the connection which was running for a long long time or stuck at any point worked for me. I followed this post for monitoring and solving my issue. Post includes lots of useful commands to monitor this situation.
You need to find out what PostgreSQL is doing. Relevant resources:
Monitoring in general
Monitoring queries
Finding slow queries
Once you find what the slow or the most common queries are use, use EXPLAIN to make sure they are being executed efficiently.
Here are some cases we met that cause high CPU usage of Postgres.
Incorrect indexes are used in the query
Check the query plan - Through EXPLAIN, we could check the query plan, if the index is used in the query, the Index Scan could be found in the query plan result.
Solution: add the corresponding index for the query SQL to reduce CPU usage
Query with sort operation
Check EXPLAIN (analyze, buffers) - If the memory is insufficient to do the sorting operation, the temporary file could be used to do the sorting, and high CPU usage comes up.
Note: DO NOT "EXPLAIN (analyze)" in a busy production system as it actually executes the query behind the scenes to provide more accurate planner information and its impact is significant
Solution: Tune up the work_mem and sorting operations
Sample: Tune sorting operations in PostgreSQL with work_mem
Long-running transactions
Find long-running transactions through
SELECT pid
, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes';
Solution:
Kill the long-running transaction through select pg_terminate_backend(pid)
Optimize the transaction or query SQL through corresponding indexes.
I am new to SQL/RDBMS.
I have an application which adds rows with 10 columns in PostgreSQL server using the libpq library. Right now, my server is running on same machine as my visual c++ application.
I have added around 15-20 million records. The simple query of getting total count is taking 4-5 minutes using select count(*) from <tableName>;.
I have indexed my table with the time I am entering the data (timecode). Most of the time I need count with different WHERE / AND clauses added.
Is there any way to make things fast? I need to make it as fast as possible because once the server moves to network, things will become much slower.
Thanks
I don't think network latency will be a large factor in how long your query takes. All the processing is being done on the PostgreSQL server.
The PostgreSQL MVCC design means each row in the table - not just the index(es) - must be walked to calculate the count(*) which is an expensive operation. In your case there are a lot of rows involved.
There is a good wiki page on this topic here http://wiki.postgresql.org/wiki/Slow_Counting with suggestions.
Two suggestions from this link, one is to use an index column:
select count(index-col) from ...;
... though this only works under some circumstances.
If you have more than one index see which one has the least cost by using:
EXPLAIN ANALYZE select count(index-col) from ...;
If you can live with an approximate value, another is to use a Postgres specific function for an approximate value like:
select reltuples from pg_class where relname='mytable';
How good this approximation is depends on how often autovacuum is set to run and many other factors; see the comments.
Consider pg_relation_size('tablename') and divide it by the seconds spent in
select count(*) from tablename
That will give the throughput of your disk(s) when doing a full scan of this table. If it's too low, you want to focus on improving that in the first place.
Having a good I/O subsystem and well performing operating system disk cache is crucial for databases.
The default postgres configuration is meant to not consume too much resources to play nice with other applications. Depending on your hardware and the overall utilization of the machine, you may want to adjust several performance parameters way up, like shared_buffers, effective_cache_size or work_mem. See the docs for your specific version and the wiki's performance optimization page.
Also note that the speed of select count(*)-style queries have nothing to do with libpq or the network, since only one resulting row is retrieved. It happens entirely server-side.
You don't state what your data is, but normally the why to handle tables with a very large amount of data is to partition the table. http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
This will not speed up your select count(*) from <tableName>; query, and might even slow it down, but if you are normally only interested in a portion of the data in the table this can be helpful.
I understand there are limitations to using sqlite, but I'd like to know if it should be able to handle this scenario.
My table has over 300 million records and the db is about 12 gigs. The data import util with sqlite is nice and fast. But then I added an index to a string column in this table, and it ran all night to complete this operation. I haven't compared this to other db's, but seemed quite slow to me.
Now that my index is added, I'm wanting to look for duplicates in the data. So I'm trying to run a "having count > 0" query and it seems to be taking hours as well. My query looks like:
select col1, count(*)
from table1
group by col1
having count(*) > 1
I would assume this query would use my index on col1, but the slow query execution makes me wonder if it is not?
Would perhaps sql server handle this kind of thing better?
SQLite's count() isn't optimized - it does a full table scan even if indexed. Here is the recommended approach to speed things up. Run EXPLAIN QUERY PLAN to verify and you'll see:
EXPLAIN QUERY PLAN SELECT COUNT(FIELD_NAME) FROM TABLE_NAME;
I get something like this:
0|0|0|SCAN TABLE TABLE_NAME (~1000000 rows)
But then I added an index to a string column in this table, and it ran all night to complete this
operation. I haven't compared this to other db's, but seemed quite slow to me.
I hate to tell yuo, but how does your server look like? Not arguing, but that is a possibly very resoruce intensive operation that may require a lot of IO and normal computers or chehap web servers with a slow hard disc are not suited for significant database work. I run hundreds og gigabyte db project work and my smallest "large data" server has 2 SSD and 8 Velociraptors for data and log. The largest one has 3 storage nodes with a total of 1000gb SSD discs - simply because IO is what a db server lives and breathes on.
So I'm trying to run a "having count > 0" query and it seems to be taking hours as well
How much RAM? ENough to fit it all in memory, or a low memory virtual server where the missing memory blows up to bad IO? How much memory can / does SqlLite use? How is the temp setup? In memory? Sql server would possibly use a lot of memory / tempdb space for this type of check.
increase the sqlite cache via PRAGMA cache_size=<number of pages>. the memory used is <number of pages> times <size of page>. (which can be set via PRAGMA page_size=<size of page>)
by setting those values to 16000 and 32768 respectively (or about 512MB), i was able to get this one program's bulk load down from 20mins to 2mins. (although i think that if the disk on that system wasn't so slow, this might not have had as much effect)
but you might not have this extra memory available on lesser embedded platforms, i don't recommend increasing it as much as i did on those, but for desktop or laptop level systems it can greatly help.
How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium