After PostgreSQL database server v9.3 migration to v9.6 I noticed a decreace in the performance of the entire system. The config parameters are the same as in v9.3 taking in to account the next parameters:
shared_buffers = 10000MB
work_men= 64MB
maintenance_work_men = 1024MB
Also I tried to monitor some resources, and this is the result
total used free shared buff/cache available
Mem: 31G 385M 4.5G 10G 26G 19G
Swap: 3.0G 0B 3.0G
Also when I run some queries the server internally launches queries like these ones:
select typname from pg_type where oid=1043
set search path to public
deallocate pdo_stmt_0000000e
And then runs my query but I'm afraid that here is some impact in performance after migration. I have another 9.6 server with a fresh install no migration and it's not presenting that problem (response time). It seems to be expending too much time in those queries.
Do you have any tip or advice on how to fix this?
I did it with pg_upgrade, but I noticed that in the process some data doesn't migrate to v9.6 server. After that I did a dump/restore process and vacuum analyze.
In our case, we neglected to:
ANALYZE the database
Postgres specifically might need it after a larger migration.
For example when upgrading django 2.2 to 3.2 and all of the id field types are changed from AutoField to BigAutoField
you could install the pg_stat_statements extension on the slow and the fast system and compare the performance of the top queries in both systems. When there are major differences for the time/execution you can check the execution plans (using explain analyze).
Sometimes new features have a major performance impact after an upgrade. If my memory serves me well the parallel sequential scan - https://blog.2ndquadrant.com/postgresql96-parallel-sequential-scan/ - has been added in 9.6. Though this is basically a great feature, there are some situations in which its use may result in a slowdown of queries. This could be a reason to set parallel_setup_cost (or other planner parameters) to a different value to avoid inefficient parallel sequential scans.
Edited later: As I see in https://www.postgresql.org/docs/9.6/release-9-6.html the parallel query execution has not been activated by default, so it's probably not the reason of the slowdown in your situation. Still I think that only an analysis of the performance of the top queries and their plans may shed light on the issue.
Related
I tried to use 'PARALLEL'hint for the first time, but when I saw the execution plan, parallel processing didn't happen.
Here is the code and execution plan.
SELECT /*+ PARALLEL(SCORE 4) */
*
FROM SCORE;
I learned that if parallel processing happens successfully, 'PX COORDINATOR'operation has to be written in an execution plan, but as you can see in the image, there is no 'PX COORDINATOR'operation.
In this situation, why does parallel processing not happen even if I wrote the parallel hint?
And how can I make parallel processing happen succesfully?
If you give me some advice, I'll really appreciate it.
(I'm using oracle 11g.)
What edition are you using? XE does not have access to parallelism (which is sensible given in 11g XE can only use a single core anyway).
The latest version 18c XE can use two cpu cores but the parallelism restriction remains. Luckily, your query probably won’t benefit much by parallelism - the table is small so is quick to read and the data transfer to the client will need to be single threaded anyway (otherwise you need multiple client connections).
For 2601 (expected) rows it is not worth to process the query in parallel. Splitting the query, running in parallel and combine results afterwards would take longer.
Apart from that, Oracle has to know where and how to split the operation. Do you have any index on this table? If not, how can Oracle know where the operation could be split for parallel processing?
I'm hoping to catch the eye of someone with experience in both SQL Server and DB2. I thought I'd ask to see if anyone could comment on these from the top of their head. The following is a list of features with SQL Server, that I'd like to do with DB2 as well.
Configuration option "optimize for ad hoc workloads", which saves first-time query plans as stubs, to avoid memory pressure from heavy-duty one-time queries (especially helpful with an extreme number of parameterized queries). What - if any - is the equivalent for this with DB2?
On a similar note, what would be the equivalents for SQL Server configuration options auto create statistics, auto update statistics and auto update statistics async. Which all are fundamental for creating and maintaining proper statistics without causing too much overhead during business hours?
Indexes. MSSQL standard for index maintenance is REORGANIZE when fragmentation is between 5 - 35%, REBUILD (technically identical to DROP & RECREATE) when over 35%. As importantly, MSSQL supports ONLINE index rebuilds which keeps the associated data accessible by read / write operations. Anything similar with DB2?
Statistics. In SQL Server the standard statistics update procedure is all but useless in larger DB's, as the sample ratio is far too low. Is there an equivalent to UPDATE STATISTICS X WITH FULLSCAN in DB2, or a similarly functioning consideration?
In MSSQL, REBUILD index operations also fully recreate the underlying statistics, which is important to consider with maintenance operations in order to avoid overlapping statistics maintenance. The best method for statistics updates in larger DB's also involves targeting them on a per-statistic basis, since full table statistics maintenance can be extremely heavy when for example only a few of the dozens of statistics on a table actually need to be updated. How would this relate to DB2?
Show execution plan is an invaluable tool for analyzing specific queries and potential index / statistic issues with SQL Server. What would be the best similar method to use with DB2 (Explain tools? Or something else)?
Finding the bottlenecks: SQL Server has system views such as sys.dm_exec_query_stats and sys.dm_exec_sql_text, which make it extremely easy to see the most run, and most resource-intensive (number of logical reads, for instance) queries that need tuning, or proper indexing. Is there an equivalent query in DB2 you can use to instantly recognize problems in a clear and easy to understand manner?
All these questions represent a big chunk of where many of the problems are with SQL Server databases. I'd like to take that know-how, and translate it to DB2.
I'm assuming this is about DB2 for Linux, Unix and Windows.
Configuration option "optimize for ad hoc workloads", which saves first-time query plans as stubs, to avoid memory pressure from heavy-duty one-time queries (especially helpful with an extreme number of parameterized queries). What - if any - is the equivalent for this with DB2?
There is no equivalent; DB2 will evict least recently used plans from the package cache. One can enable automatic memory management for the package cache, where DB2 will grow and shrink it on demand (taking into account other memory consumers of course).
what would be the equivalents for SQL Server configuration options auto create statistics, auto update statistics and auto update statistics async.
Database configuration parameters auto_runstats and auto_stmt_stats
MSSQL standard for index maintenance is REORGANIZE when fragmentation is between 5 - 35%, REBUILD (technically identical to DROP & RECREATE) when over 35%. As importantly, MSSQL supports ONLINE index rebuilds
You have an option of automatic table reorganization (which includes indexes); the trigger threshold is not documented. Additionally you have a REORGCHK utility that calculates and prints a number of statistics that allow you to decide what tables/indexes you want to reorganize manually. Both table and index reorganization can be performed online with read-only or full access.
Is there an equivalent to UPDATE STATISTICS X WITH FULLSCAN in DB2, or a similarly functioning consideration? ... The best method for statistics updates in larger DB's also involves targeting them on a per-statistic basis, since full table statistics maintenance can be extremely heavy when for example only a few of the dozens of statistics on a table actually need to be updated.
You can configure automatic statistics collection to use sampling or not (configuration parameter auto_sampling). When updating statistics manually using the RUNSTATS utility you have full control over the sample size and what statistics to collect.
Show execution plan is an invaluable tool for analyzing specific queries and potential index / statistic issues with SQL Server. What would be the best similar method to use with DB2
You have both GUI (Data Studio, Data Server Manager) and command-line (db2expln, db2exfmt) tools to generate query plans, including plans for statements that are in the package cache or are currently executing.
Finding the bottlenecks: SQL Server has system views such as sys.dm_exec_query_stats and sys.dm_exec_sql_text, which make it extremely easy to see the most run, and most resource-intensive (number of logical reads, for instance) queries that need tuning
There is an extensive set of monitor procedures, views and table functions, e.g. MONREPORT.DBSUMMARY(), TOP_DYNAMIC_SQL, SNAP_GET_DYN_SQL, MON_CURRENT_SQL, MON_CONNECTION_SUMMARY etc.
I have Oracle 11G R2 running on M-4000 machine (supposedly a powerful machine). Recently, I noticed that my application has gone slow and is taking lot of time in quering from database. To my shock when I saw the statistics of DB machine I found the CPU usage to 100%.
Here is the ash report.
Now can someone put me wise to what should I be doing to avoid such situation.
Those queries that are doing a 'table access full' may be your problem... any full table scan will kill a query and can usually be resolved by adding a simple index. You can profile your queries, and tools will recommend indexes to add in order to improve execution of certain queries. I think I did this with Squirrel on an oracle db.
Also, your IDs seem to be strings and you're doing a 'lower(id) like :3'. This should be changed to use integers, or at the very least get rid of the lower and do a match on '3'.
I am switching to PostgreSQL from SQLite for a typical Rails application.
The problem is that running specs became slow with PG.
On SQLite it took ~34 seconds, on PG it's ~76 seconds which is more than 2x slower.
So now I want to apply some techniques to bring the performance of the specs on par with SQLite with no code modifications (ideally just by setting the connection options, which is probably not possible).
Couple of obvious things from top of my head are:
RAM Disk (good setup with RSpec on OSX would be good to see)
Unlogged tables (can it be applied on the whole database so I don't have change all the scripts?)
As you may have understood I don't care about reliability and the rest (the DB is just a throwaway thingy here).
I need to get the most out of the PG and make it as fast as it can possibly be.
Best answer would ideally describe the tricks for doing just that, setup and the drawbacks of those tricks.
UPDATE: fsync = off + full_page_writes = off only decreased time to ~65 seconds (~-16 secs). Good start, but far from the target of 34.
UPDATE 2: I tried to use RAM disk but the performance gain was within an error margin. So doesn't seem to be worth it.
UPDATE 3:*
I found the biggest bottleneck and now my specs run as fast as the SQLite ones.
The issue was the database cleanup that did the truncation. Apparently SQLite is way too fast there.
To "fix" it I open a transaction before each test and roll it back at the end.
Some numbers for ~700 tests.
Truncation: SQLite - 34s, PG - 76s.
Transaction: SQLite - 17s, PG - 18s.
2x speed increase for SQLite.
4x speed increase for PG.
First, always use the latest version of PostgreSQL. Performance improvements are always coming, so you're probably wasting your time if you're tuning an old version. For example, PostgreSQL 9.2 significantly improves the speed of TRUNCATE and of course adds index-only scans. Even minor releases should always be followed; see the version policy.
Don'ts
Do NOT put a tablespace on a RAMdisk or other non-durable storage.
If you lose a tablespace the whole database may be damaged and hard to use without significant work. There's very little advantage to this compared to just using UNLOGGED tables and having lots of RAM for cache anyway.
If you truly want a ramdisk based system, initdb a whole new cluster on the ramdisk by initdbing a new PostgreSQL instance on the ramdisk, so you have a completely disposable PostgreSQL instance.
PostgreSQL server configuration
When testing, you can configure your server for non-durable but faster operation.
This is one of the only acceptable uses for the fsync=off setting in PostgreSQL. This setting pretty much tells PostgreSQL not to bother with ordered writes or any of that other nasty data-integrity-protection and crash-safety stuff, giving it permission to totally trash your data if you lose power or have an OS crash.
Needless to say, you should never enable fsync=off in production unless you're using Pg as a temporary database for data you can re-generate from elsewhere. If and only if you're doing to turn fsync off can also turn full_page_writes off, as it no longer does any good then. Beware that fsync=off and full_page_writes apply at the cluster level, so they affect all databases in your PostgreSQL instance.
For production use you can possibly use synchronous_commit=off and set a commit_delay, as you'll get many of the same benefits as fsync=off without the giant data corruption risk. You do have a small window of loss of recent data if you enable async commit - but that's it.
If you have the option of slightly altering the DDL, you can also use UNLOGGED tables in Pg 9.1+ to completely avoid WAL logging and gain a real speed boost at the cost of the tables getting erased if the server crashes. There is no configuration option to make all tables unlogged, it must be set during CREATE TABLE. In addition to being good for testing this is handy if you have tables full of generated or unimportant data in a database that otherwise contains stuff you need to be safe.
Check your logs and see if you're getting warnings about too many checkpoints. If you are, you should increase your checkpoint_segments. You may also want to tune your checkpoint_completion_target to smooth writes out.
Tune shared_buffers to fit your workload. This is OS-dependent, depends on what else is going on with your machine, and requires some trial and error. The defaults are extremely conservative. You may need to increase the OS's maximum shared memory limit if you increase shared_buffers on PostgreSQL 9.2 and below; 9.3 and above changed how they use shared memory to avoid that.
If you're using a just a couple of connections that do lots of work, increase work_mem to give them more RAM to play with for sorts etc. Beware that too high a work_mem setting can cause out-of-memory problems because it's per-sort not per-connection so one query can have many nested sorts. You only really have to increase work_mem if you can see sorts spilling to disk in EXPLAIN or logged with the log_temp_files setting (recommended), but a higher value may also let Pg pick smarter plans.
As said by another poster here it's wise to put the xlog and the main tables/indexes on separate HDDs if possible. Separate partitions is pretty pointless, you really want separate drives. This separation has much less benefit if you're running with fsync=off and almost none if you're using UNLOGGED tables.
Finally, tune your queries. Make sure that your random_page_cost and seq_page_cost reflect your system's performance, ensure your effective_cache_size is correct, etc. Use EXPLAIN (BUFFERS, ANALYZE) to examine individual query plans, and turn the auto_explain module on to report all slow queries. You can often improve query performance dramatically just by creating an appropriate index or tweaking the cost parameters.
AFAIK there's no way to set an entire database or cluster as UNLOGGED. It'd be interesting to be able to do so. Consider asking on the PostgreSQL mailing list.
Host OS tuning
There's some tuning you can do at the operating system level, too. The main thing you might want to do is convince the operating system not to flush writes to disk aggressively, since you really don't care when/if they make it to disk.
In Linux you can control this with the virtual memory subsystem's dirty_* settings, like dirty_writeback_centisecs.
The only issue with tuning writeback settings to be too slack is that a flush by some other program may cause all PostgreSQL's accumulated buffers to be flushed too, causing big stalls while everything blocks on writes. You may be able to alleviate this by running PostgreSQL on a different file system, but some flushes may be device-level or whole-host-level not filesystem-level, so you can't rely on that.
This tuning really requires playing around with the settings to see what works best for your workload.
On newer kernels, you may wish to ensure that vm.zone_reclaim_mode is set to zero, as it can cause severe performance issues with NUMA systems (most systems these days) due to interactions with how PostgreSQL manages shared_buffers.
Query and workload tuning
These are things that DO require code changes; they may not suit you. Some are things you might be able to apply.
If you're not batching work into larger transactions, start. Lots of small transactions are expensive, so you should batch stuff whenever it's possible and practical to do so. If you're using async commit this is less important, but still highly recommended.
Whenever possible use temporary tables. They don't generate WAL traffic, so they're lots faster for inserts and updates. Sometimes it's worth slurping a bunch of data into a temp table, manipulating it however you need to, then doing an INSERT INTO ... SELECT ... to copy it to the final table. Note that temporary tables are per-session; if your session ends or you lose your connection then the temp table goes away, and no other connection can see the contents of a session's temp table(s).
If you're using PostgreSQL 9.1 or newer you can use UNLOGGED tables for data you can afford to lose, like session state. These are visible across different sessions and preserved between connections. They get truncated if the server shuts down uncleanly so they can't be used for anything you can't re-create, but they're great for caches, materialized views, state tables, etc.
In general, don't DELETE FROM blah;. Use TRUNCATE TABLE blah; instead; it's a lot quicker when you're dumping all rows in a table. Truncate many tables in one TRUNCATE call if you can. There's a caveat if you're doing lots of TRUNCATES of small tables over and over again, though; see: Postgresql Truncation speed
If you don't have indexes on foreign keys, DELETEs involving the primary keys referenced by those foreign keys will be horribly slow. Make sure to create such indexes if you ever expect to DELETE from the referenced table(s). Indexes are not required for TRUNCATE.
Don't create indexes you don't need. Each index has a maintenance cost. Try to use a minimal set of indexes and let bitmap index scans combine them rather than maintaining too many huge, expensive multi-column indexes. Where indexes are required, try to populate the table first, then create indexes at the end.
Hardware
Having enough RAM to hold the entire database is a huge win if you can manage it.
If you don't have enough RAM, the faster storage you can get the better. Even a cheap SSD makes a massive difference over spinning rust. Don't trust cheap SSDs for production though, they're often not crashsafe and might eat your data.
Learning
Greg Smith's book, PostgreSQL 9.0 High Performance remains relevant despite referring to a somewhat older version. It should be a useful reference.
Join the PostgreSQL general mailing list and follow it.
Reading:
Tuning your PostgreSQL server - PostgreSQL wiki
Number of database connections - PostgreSQL wiki
Use different disk layout:
different disk for $PGDATA
different disk for $PGDATA/pg_xlog
different disk for tem files (per database $PGDATA/base//pgsql_tmp) (see note about work_mem)
postgresql.conf tweaks:
shared_memory: 30% of available RAM but not more than 6 to 8GB. It seems to be better to have less shared memory (2GB - 4GB) for write intensive workloads
work_mem: mostly for select queries with sorts/aggregations. This is per connection setting and query can allocate that value multiple times. If data can't fit then disk is used (pgsql_tmp). Check "explain analyze" to see how much memory do you need
fsync and synchronous_commit: Default values are safe but If you can tolerate data lost then you can turn then off
random_page_cost: if you have SSD or fast RAID array you can lower this to 2.0 (RAID) or even lower (1.1) for SSD
checkpoint_segments: you can go higher 32 or 64 and change checkpoint_completion_target to 0.9. Lower value allows faster after-crash recovery
I have never clearly understood the usage of MAXDOP. I do know that it makes the query faster and that it is the last item that I can use for Query Optimization.
However, my question is, when and where it is best suited to use in a query?
As Kaboing mentioned, MAXDOP(n) actually controls the number of CPU cores that are being used in the query processor.
On a completely idle system, SQL Server will attempt to pull the tables into memory as quickly as possible and join between them in memory. It could be that, in your case, it's best to do this with a single CPU. This might have the same effect as using OPTION (FORCE ORDER) which forces the query optimizer to use the order of joins that you have specified. IN some cases, I have seen OPTION (FORCE PLAN) reduce a query from 26 seconds to 1 second of execution time.
Books Online goes on to say that possible values for MAXDOP are:
0 - Uses the actual number of available CPUs depending on the current system workload. This is the default value and recommended setting.
1 - Suppresses parallel plan generation. The operation will be executed serially.
2-64 - Limits the number of processors to the specified value. Fewer processors may be used depending on the current workload. If a value larger than the number of available CPUs is specified, the actual number of available CPUs is used.
I'm not sure what the best usage of MAXDOP is, however I would take a guess and say that if you have a table with 8 partitions on it, you would want to specify MAXDOP(8) due to I/O limitations, but I could be wrong.
Here are a few quick links I found about MAXDOP:
Books Online: Degree of Parallelism
General guidelines to use to configure the MAXDOP option
This is a general rambling on Parallelism in SQL Server, it might not answer your question directly.
From Books Online, on MAXDOP:
Sets the maximum number of processors
the query processor can use to execute
a single index statement. Fewer
processors may be used depending on
the current system workload.
See Rickie Lee's blog on parallelism and CXPACKET wait type. It's quite interesting.
Generally, in an OLTP database, my opinion is that if a query is so costly it needs to be executed on several processors, the query needs to be re-written into something more efficient.
Why you get better results adding MAXDOP(1)? Hard to tell without the actual execution plans, but it might be so simple as that the execution plan is totally different that without the OPTION, for instance using a different index (or more likely) JOINing differently, using MERGE or HASH joins.
As something of an aside, MAXDOP can apparently be used as a workaround to a potentially nasty bug:
Returned identity values not always correct
There are a couple of parallization bugs in SQL server with abnormal input. OPTION(MAXDOP 1) will sidestep them.
EDIT: Old. My testing was done largely on SQL 2005. Most of these seem to not exist anymore, but every once in awhile we question the assumption when SQL 2014 does something dumb and we go back to the old way and it works. We never managed to demonstrate that it wasn't just a bad plan generation on more recent cases though since SQL server can be relied on to get the old way right in newer versions. Since all cases were IO bound queries MAXDOP 1 doesn't hurt.
Adding my two cents, based on a performance issue I observed.
If simple queries are getting parellelized unnecessarily, it can bring more problems than solving one. However, before adding MAXDOP into the query as "knee-jerk" fix, there are some server settings to check.
In Jeremiah Peschka - Five SQL Server Settings to Change, MAXDOP and "COST THRESHOLD FOR PARALLELISM" (CTFP) are mentioned as important settings to check.
Note: Paul White mentioned max server memory aslo as a setting to check, in a response to Performance problem after migration from SQL Server 2005 to 2012. A good kb article to read is Using large amounts of memory can result in an inefficient plan in SQL Server
Jonathan Kehayias - Tuning ‘cost threshold for parallelism’ from the Plan Cache helps to find out good value for CTFP.
Why is cost threshold for parallelism ignored?
Aaron Bertrand - Six reasons you should be nervous about parallelism has a discussion about some scenario where MAXDOP is the solution.
Parallelism-Inhibiting Components are mentioned in Paul White - Forcing a Parallel Query Execution Plan