I'm hoping to catch the eye of someone with experience in both SQL Server and DB2. I thought I'd ask to see if anyone could comment on these from the top of their head. The following is a list of features with SQL Server, that I'd like to do with DB2 as well.
Configuration option "optimize for ad hoc workloads", which saves first-time query plans as stubs, to avoid memory pressure from heavy-duty one-time queries (especially helpful with an extreme number of parameterized queries). What - if any - is the equivalent for this with DB2?
On a similar note, what would be the equivalents for SQL Server configuration options auto create statistics, auto update statistics and auto update statistics async. Which all are fundamental for creating and maintaining proper statistics without causing too much overhead during business hours?
Indexes. MSSQL standard for index maintenance is REORGANIZE when fragmentation is between 5 - 35%, REBUILD (technically identical to DROP & RECREATE) when over 35%. As importantly, MSSQL supports ONLINE index rebuilds which keeps the associated data accessible by read / write operations. Anything similar with DB2?
Statistics. In SQL Server the standard statistics update procedure is all but useless in larger DB's, as the sample ratio is far too low. Is there an equivalent to UPDATE STATISTICS X WITH FULLSCAN in DB2, or a similarly functioning consideration?
In MSSQL, REBUILD index operations also fully recreate the underlying statistics, which is important to consider with maintenance operations in order to avoid overlapping statistics maintenance. The best method for statistics updates in larger DB's also involves targeting them on a per-statistic basis, since full table statistics maintenance can be extremely heavy when for example only a few of the dozens of statistics on a table actually need to be updated. How would this relate to DB2?
Show execution plan is an invaluable tool for analyzing specific queries and potential index / statistic issues with SQL Server. What would be the best similar method to use with DB2 (Explain tools? Or something else)?
Finding the bottlenecks: SQL Server has system views such as sys.dm_exec_query_stats and sys.dm_exec_sql_text, which make it extremely easy to see the most run, and most resource-intensive (number of logical reads, for instance) queries that need tuning, or proper indexing. Is there an equivalent query in DB2 you can use to instantly recognize problems in a clear and easy to understand manner?
All these questions represent a big chunk of where many of the problems are with SQL Server databases. I'd like to take that know-how, and translate it to DB2.
I'm assuming this is about DB2 for Linux, Unix and Windows.
Configuration option "optimize for ad hoc workloads", which saves first-time query plans as stubs, to avoid memory pressure from heavy-duty one-time queries (especially helpful with an extreme number of parameterized queries). What - if any - is the equivalent for this with DB2?
There is no equivalent; DB2 will evict least recently used plans from the package cache. One can enable automatic memory management for the package cache, where DB2 will grow and shrink it on demand (taking into account other memory consumers of course).
what would be the equivalents for SQL Server configuration options auto create statistics, auto update statistics and auto update statistics async.
Database configuration parameters auto_runstats and auto_stmt_stats
MSSQL standard for index maintenance is REORGANIZE when fragmentation is between 5 - 35%, REBUILD (technically identical to DROP & RECREATE) when over 35%. As importantly, MSSQL supports ONLINE index rebuilds
You have an option of automatic table reorganization (which includes indexes); the trigger threshold is not documented. Additionally you have a REORGCHK utility that calculates and prints a number of statistics that allow you to decide what tables/indexes you want to reorganize manually. Both table and index reorganization can be performed online with read-only or full access.
Is there an equivalent to UPDATE STATISTICS X WITH FULLSCAN in DB2, or a similarly functioning consideration? ... The best method for statistics updates in larger DB's also involves targeting them on a per-statistic basis, since full table statistics maintenance can be extremely heavy when for example only a few of the dozens of statistics on a table actually need to be updated.
You can configure automatic statistics collection to use sampling or not (configuration parameter auto_sampling). When updating statistics manually using the RUNSTATS utility you have full control over the sample size and what statistics to collect.
Show execution plan is an invaluable tool for analyzing specific queries and potential index / statistic issues with SQL Server. What would be the best similar method to use with DB2
You have both GUI (Data Studio, Data Server Manager) and command-line (db2expln, db2exfmt) tools to generate query plans, including plans for statements that are in the package cache or are currently executing.
Finding the bottlenecks: SQL Server has system views such as sys.dm_exec_query_stats and sys.dm_exec_sql_text, which make it extremely easy to see the most run, and most resource-intensive (number of logical reads, for instance) queries that need tuning
There is an extensive set of monitor procedures, views and table functions, e.g. MONREPORT.DBSUMMARY(), TOP_DYNAMIC_SQL, SNAP_GET_DYN_SQL, MON_CURRENT_SQL, MON_CONNECTION_SUMMARY etc.
Related
Is is possible to somehow optimize the performance of the queries (apart from playing with hardware and OS settings) under these conditions
1) You can't add indexes.
2) You can't alter the queries themselves.
This is the common constraint while bench-marking the performance of a database.
I understand that the dbms has a query optimizer that plays a number game with all the statistics pertaining to accessing the tables touched by the query. Are there cases when the query optimizer comes up with sub optimal solutions. I know that you can force the optimizer to use a particular query plan. Not sure how to cache it though without altering the query plan. DB in question is Sybase
Independent of the specific case here (Sybase), there are multiple ways to optimize a query under the given conditions. Syntax is system-specific.
Most systems rely on statistics for finding the best query plan. So updating the statistics could help improve performance.
Many systems allow to set an optimization level independent of the application. This can have positive impact on the performance.
Many systems allow to re-use query plans for similar ad-hoc queries (dynamic SQL). Usually this has positive impact.
Allowing the database system (independent to the OS) to assign more memory to bottlenecks can also help.
What privileges do you have, what are the benchmark rules?
Data Henrik mentions optimisation level - you can set this system-wide for Sybase, or per session.
You can even have a flexible method that sets the level according to application name or login Id (see Rob Verschoor's Sybase site - login triggers.) I'd guess if you're not allowed to change queries or indexes you'd not likely be allowed to do this.
As far as I can tell you don't have a specific problem - you just mention benchmarking.
You should be sure all tables have UPDATE INDEX STATISTICS run on them, and you could then do your benchmarks with the 3 Sybase optimisation levels - OLTP, MIX, DSS.
If you have specific problems, that's another subject.
I want to make sure the stress to the server is minimal while running queries from a read only schema (a user can select data and create temp tables and variables, but can't execute SPs, write and other more advanced stuff). What db hints/other tricks could I use in this situation?
Currently I am:
Using the WITH (NOLOCK) hint for every table
Setting the DEADLOCK_PRIORITY for the whole batch to -10 (although I am not sure it's really needed, since I am using NOLOCK)
My goals is to take as little server resources as possible and allow other more important things to be processed by the server freely. The queries that I am going to send to the server are local (can't be saved as SPs) and there will be many of them coming from various users every 5 minutes. They are generally simple SELECTs and are cheap in isolation. Are there any other ways to make them even less expensive?
EDIT:
I am not the owner of the server I am connecting to, so I can only use the SQL query I am passing to the server to achieve what I want.
The two measures you have taken will have little impact. They are mostly used out of superstitiousness. They can have an impact in rare cases. Practically, READ UNCOMMITTED (which is 100% identical to NOLOCK) enables allocation order scans on B-trees. That is only important for tables that are not in-memory anyway.
If you want to minimize locking and blocking, snapshot isolation can be a simple and very effective solution.
In order to truly minimize the impact of a certain workload you need to use Resource Governor. Everything else are partial solutions/workarounds.
Consider limiting CPU usage, memory, IO and parallelism.
I'm writing a heavy SQL query using SQL Server 2008. When the query process 100 rows, it finished instantly. When the query process 5000 rows it takes about 1.1 minutes.
I used the Actual Execution Plan to check its performance while it processing 5000 rows. The query contained 18 sub-queries,
there is no significate higher percentage of query cost shown in the plan, e.g. all around 0%,2%,5%,7%. The highest one is 11%.
The screenshot below shows the highest process in the query. (e.g.94% of 11%)
I also used the Client Statistic Tool, Trial 10 shows when it process 5000 rows, Trial 9 shows when it process 100 rows.
Can anybody tell me where (or which SQL Server Tool) I can find the data/detail that indicates the slow process when the query execute 5000 rows?
Add:
Indexes, keys are added.
The actual exe plan shows no comment and no high percentage on each sub-query.
I just found 'Activity Monitor' shows one sub-query's 'Average Duration' is 40000ms in 'Recent Expansive Queries', while the actual plan shows this query takes only 5% cost of total process.
Thanks
For looking at performance, using the database tuning advisor and/or the missing index DMVs then examining the execution plan either in management studio or with something like sql sentry plan explorer should be enough to give you an indication where you need to make modifications.
Understanding the execution plan and how the physical operators relate to your logical operations is the biggest key to performance tuning, that and a good understanding of indexes and statistics.
I don't think there is any tool that will just automagically fix performance for you.
While I do believe that learning the underpinnings of the execution plan and how the SQL Server query optimizer operates is an essential requirement to be a good database developer, and humans are still way better at diagnosing and fiddling with SQL to get it right than most tools native or third-party, there is in fact a tool which SQL Server Management Studio provides which can (sometimes) "automagically" fix performance for you:
Database Engine Tuning Advisor
Which you can access via the ribbon menu under Query -> Analyze Query Using Database Engine Tuning Advisor OR (more helpfully) by selecting your query, right-clicking on the selection, and choosing Analyze Query using Database Engine Tuning Advisor, which gives the added bonus of automatically filtering down to only the database objects being used by your query.
All the tuning advisor actually does is investigates to see if there are any indexes or statistics that could be added to your objects. It then "recommends" them and you can apply none, some, or all of them if you choose.
Caveat emptor alert! All of its recommendations are geared towards making that particular query run faster, so what it definitely does not do is help make you good decisions about the consequences of adding an index that only gets used by maybe one or two queries but has to be updated constantly when you add data to your database. This is a SQL anti-pattern known as "index shotgunning" and is generally frowned upon by DBAs, who would rather see a query rewritten to take advantage of more useful indexes.
I am looking for a way to get statistics from both Oracle and DB2 databases for select/update/insert/delete operations count performed on every table. To say in other way, I would like to know how many scan operations were performed on given table vs. how many modifying operations were executed.
I had found that it is possible to it do in MS SQL Server as described in http://msdn.microsoft.com/en-us/library/dd894051%28v=sql.100%29.aspx
The reason I need it, is because it provides reasonable statistic if it is worthwhile to apply compression for a given table. The better the scan / update ratio - the better candidate the table is. I think this also holds true for other databases.
So is it possible to get these statistics in Oracle or/and DB2 ? Thanks in advance.
In Oracle you can see how many update/delete/inserts have been on a table in sys.dba_tab_modifications. The data is flushed to the table every 4 hours. For the reads you can use dba_hist_seg_stat, part of AWR. Use of this is licensed.
sys.dba_tab_modifications is reset once a table gets new optimizer statistics.
My answer applies to the DB2 database engines for Linux, UNIX, and Windows (LUW) platforms, not DB2 for iSeries (AS/400) or DB2 for z/OS, which have significantly different engine internals than the LUW platforms. All of the documentation links I've included reference version 9.7 of DB2 for LUW.
DB2 for LUW provides extensive performance and utilization statistics in every version of the data engine, including the no-cost DB2 Express-C product. The collection of these statistics is governed by a series of database engine settings called system monitor switches. The statistics you seek involve the table monitor switch, and possibly also the statement and UOW (unit of work) monitor switches. When those system monitor switches are enabled, you can retrieve running totals of various performance gauges and counters from snapshot monitors or by selecting from administrative SQL views (in the SYSIBMADM schema) that present the same snapshot monitor output as SQL result sets. The snapshot monitors incur less system overhead than event monitors, which run in the background as a trace and store a stream of detailed information to special tables or files.
Compression is a licensed feature that alters the internal storage of tables and indexes all the way from the tablespace to the buffer pool (RAM cache) to the transaction log file. In most cases, the additional CPU overhead of compression and decompression is more than offset by the overall reduction in I/O. The deep row compression feature compresses rows in tables by building and using a 12-bit dictionary of multi-byte patterns that can even cross column boundaries. Enabling deep row compression for a table typically reduces its size by 40% or more before DBA intervention. Indexes are compressed through a shorthand algorithm that exploits their sorted nature by omitting common leading bytes between the current and previous index keys.
What are the best options/recommendations and optimizations that can be performed when working with a large SQL Server 2005 table that contains anywhere from 100-200 Million records?
Since you didn't state the purpose of the database, or the requirements, here are some general things, in no particular order:
Small clustered index on each table. Consider making this your primary key on each table. This will be very efficient and save on space in the main table and dependent tables.
Appropriate non-clustered indexes (covering indexes where possible)
Referential Integrity
Normalized Tables
Consistent naming on all database objects for easier maintenance
Appropriate Partitioning (table and index) if you have the Enterprise Edition of SQL Server
Appropriate check constraints on tables if you are going to allow direct data manipulation in the database.
Decide where your business rules are going to reside and don't deviate from that. In most cases they do not belong in the database.
Run Query Analyzer on your heavily used queries (at least) and look for table scans. This will kill performance.
Be prepared to deal with deadlocks. With a database of this size, especially if there will be heavy writing, deadlocks could very well be a problem.
Take ample advantage of views to hide query join complexity and potential for query optimization and flexible security implementation.
Consider using schemas to better organize data and flexible security implementation.
Get familiar with Profiler. With a database of this size, you will more than likely be spending some time trying to determine query bottlenecks. Profiler can help you here.
A rule of thumb is if a table will contain more than 25 million records, you should consider table (and index) partitioning, but this feature is only available in the Enterprise edition of SQL Server (and correspondingly, developer edition).