I wonder if some open-source SQL database servers have a possibility, how to find out (maybe even in graphical representation), what actually happened inside during the query (e.g. whether table scan was used, or if and which index(es) were used..) step-by-step. It would be useful for database optimization.
Most servers have some sort of way to display a query execution plan. Explain query in mysql, for instance. Which server are you using?
Most all will have tools/commands to describe query plans,
the graphical part you may have to pay for.
Related
I am doing a paper about query optimization in different DBMS, and I am trying to find core differences in those.
Both use CBO, cost based optimization in the same way, parse the query -> generate plans -> pick best one given statistics about the database.
I'm still researching information on those two engines, but if someone knows how they differ (or not) will be appreciated.
Not a comprehensive answer at all, but wanted to give you my insight. In short, Oracle has a much more developed SQL optimizer.
For starters, Oracle has much more algorithms to choose from. This means, sometimes Oracle distinguish between subtle differences and offer, let's say, three algorithms; MySQL (under the same circumstances) only has one to choose from. Therefore, Oracle has better options for particular cases.
Another difference is that MySQL's execution plans are not very readable. I'm not saying they are bad internally, just that the explain extended doesn't tell you many specifics. Oracle makes a very clear difference between access and filter predicates, while in MySQL you don't really know what's going on.
Oracle has many algorithms suitable for parallel processing in multi servers, while MySQL is limited to multiple thread in the same machine. This can make a difference for highly parallelisable queries than benefit for multi-servers.
Oracle still has a RBO (Rule-Based Optimizer) than can be useful on some occasions. MySQL doesn't. Anyway, Oracle recommends not to use it, but it's still there if you need it.
Oracle offers a myriad of "hints" to the optimizer in the form of comments (/* ... */ as far as I remember?) where you can tweak the execution plan to suit your needs. MySQL has fewer "clauses" for this.
We have our data marts/warehouse on Oracle 11g implemented as a star schema. Business reports are designed using OBIEE. I come from a ETL background and have very little knowledge in OBIEE.
Once the OBIEE RPD is designed, I see that OBIEE starts generating SELECT queries in the background to feed data into the reports. On many occasions, I have noticed that the SELECT queries are not optimized (big fact table is fully scanned more than once in separate WITH clauses).
When the report performance is bad, the OBIEE queries are sent to the ETL team for performance tuning. I'm confused about how I can tune them because they are auto generated. I know there is an option to write custom sql in OBIEE (without going via RPD) for each report, but our standards do not allow that and I also think it does not leverage the benefits of OBIEE.
Has anyone faced a problem like above? How to tune such queries?
Firstly, you're right that custom SQL (known as direct database query) is not a good idea in principle, though it is useful on occasion. But it's not the solution to your problem.
Tuning the OBI queries generated is an OBI RPD task, for the OBI developer; tuning the database for the OBI queries generated is a database/ETL task. But you can't really do one without the other – OBI needs to be designed so as to generate suitable queries, and the database needs to be designed in such a way that suitable good queries can be generated to answer the question being asked.
OBI is basically a SQL generator, and if the RPD model is bad suboptimal, then the resulting query will be bad suboptimal. OBI will generate SQL based on the information it has in the RPD about the layout and structure of the data and database.
You're obviously coming at it from the database side, and so to you the SQL is bad because it isn't what you'd write. It's also possible that the database design is bad for getting an answer to the question that OBI is being asked.
As jackohug says, OBIEE is a SQL generator, and the general aproach is to try to optimize the query generated by OBIEE, not try to change this query. Somehow, depending on the performance problem, you can try some tricks.
First all, is your table partioned and your reports can benefit from the partioning?
Second, add indexes on the fact table so any filter on the dimensions can benefit the access to the fact table.
Third, building agregate tables, resuming the fact table, so when reports don't show much detail you first access to the agregate table with much less data, and is only as the users drill down through structure (and while doing so, they are applying filters to the data they are interested in) that they access to the much detailed fact table but applying filters to avoid full scans.
You could also tell OBIEE to use hints when accessing to the table, although, as with Direct Database Query I wouldn't recommend it, I would try first optimizing using the first three aproaches.
Regards
if you have diagnostics and tuning pack licenses, you can run the SQL Tuning Advisor. The SQL Tuning Advisor is running the optimizer in tuning mode and it may be able to generate a SQL Profile with a better execution plan. Sometimes the advisor recommends indexes for tuning as well. Both SQL Profiles and indexes do not require a change to the application.
I've yet to have much success with the SQL tuning advisor. Some experience in SQL tuning and a bit of research can typically produce a far better plan.
If all the layers are built well and all you need is a final tweak then add a hidden column to the start of the report (Answer/Analysis) with a SQL hint.
I'd be very careful about adding hints through the RPD layers because of the many different and unexpected ways that others will join and use the tables.
I've found a number of resources that talk about tuning the database server, but I haven't found much on the tuning of the individual queries.
For instance, in Oracle, I might try adding hints to ignore indexes or to use sort-merge vs. correlated joins, but I can't find much on tuning Postgres other than using explicit joins and recommendations when bulk loading tables.
Do any such guides exist so I can focus on tuning the most run and/or underperforming queries, hopefully without adversely affecting the currently well-performing queries?
I'd even be happy to find something that compared how certain types of queries performed relative to other databases, so I had a better clue of what sort of things to avoid.
update:
I should've mentioned, I took all of the Oracle DBA classes along with their data modeling and SQL tuning classes back in the 8i days ... so I know about 'EXPLAIN', but that's more to tell you what's going wrong with the query, not necessarily how to make it better. (eg, are 'while var=1 or var=2' and 'while var in (1,2)' considered the same when generating an execution plan? What if I'm doing it with 10 permutations? When are multi-column indexes used? Are there ways to get the planner to optimize for fastest start vs. fastest finish? What sort of 'gotchas' might I run into when moving from mySQL, Oracle or some other RDBMS?)
I could write any complex query dozens if not hundreds of ways, and I'm hoping to not have to try them all and find which one works best through trial and error. I've already found that 'SELECT count(*)' won't use an index, but 'SELECT count(primary_key)' will ... maybe a 'PostgreSQL for experienced SQL users' sort of document that explained sorts of queries to avoid, and how best to re-write them, or how to get the planner to handle them better.
update 2:
I found a Comparison of different SQL Implementations which covers PostgreSQL, DB2, MS-SQL, mySQL, Oracle and Informix, and explains if, how, and gotchas on things you might try to do, and his references section linked to Oracle / SQL Server / DB2 / Mckoi /MySQL Database Equivalents (which is what its title suggests) and to the wikibook SQL Dialects Reference which covers whatever people contribute (includes some DB2, SQLite, mySQL, PostgreSQL, Firebird, Vituoso, Oracle, MS-SQL, Ingres, and Linter).
As for badly performing queries - do explain analyze and read it.
You can put explain analyze output on site like explain.depesz.com - it will help you find the elements that really take the most time.
There is a nice online tool that takes the output of EXPLAIN ANALYZE, and graphically shows you critical parts (e.g. wrong estimates, hot spots, etc)
http://explain.depesz.com/help
Btw, I think posted queries become public, and the "previous explains" link has been hit by spambots.
http://www.postgresql.org/docs/current/static/indexes-examine.html
You can give hints: SET enable_indexscan TO false; would make PostgreSQL try to not use indexes
To address your point, unfortunately the only way to tune a query in Postgres is pretty much to tune the database underlying it. In oracle, you can set all of those options on a query by query basis, trump the optimizers plan in the process, but in Postgres, you're pretty much at the mercy of the optimizer, for good and ill.
The PGAdmin3 tool includes a graphical explanation tool for breaking down how a query is handled. It also is especially helpful for showing where table scans occur.
Best I've seen are in here: http://wiki.postgresql.org/wiki/Using_EXPLAIN, but the latest PDF in there is from 2008, so there may be something more recent. I'm interested to hear other user's answers.
Also, something's brewing in the contrib packages: http://www.sai.msu.su/~megera/wiki/plantuner
I'm looking for a tool, which would help creating complex SQL queries. Sometimes it's difficult to even verify, whether the results of a query are correct. It's especially easy to get queries joining several tables to return too little or too much data.
The tool should enable at least creation of test tables, some kind of visualization how the queries gather their data and hopefully give better parsing of error cases than for example Oracle does.
Are there tools like this or do I have to stick with creating test tables manually, filling them with test data and commiting all kinds of queries with SQuirrel SQL?
When you have a very complex query it is usually easiest to validate by breaking it up into multiple queries that populate temp tables. These intermediary results can be individually verified and then you bring them together to produce the final result set. Depending on performance needs you can stick with the temp table approach or you can then rewrite to a single statement. Typically when I have a huge query it is for background processing so I stick with the temp table approach.
What RDBMS are you using? All of the major ones have some type of console available (e.g.-SSMS in SQL Server, Toad in Oracle, MySQL Query Browser/Administrator for MySQL, etc.), and they all have Query Execution Plans where you can see how the query will actually run. So, the answer to your question is that it's entirely dependent on what RDBMS you're using, but the safe bet answer is: Yes.
I recommend trying SQL Server 2008 Management Studio Express (SSMSE) if you are working with SQL Server. I have used it at work and I believe it does everything you are looking for.
You can get it and SQL Server (express editions) here.
Certainly not a free, open-source solution, but I believe Quest Software's TOAD will fit your requirements. Quest seems to offer alot of tools in that space...they have tools for modeling and analysis, however I've never used the modeler or analyzer.
I personally have experience with the commercial version of TOAD for Oracle. It's GUI is overwhelming at first, but after you mentally filter out all of the extra buttons that you'll never use, it's manageable.
In the never-ending search for performance (and my own bludgeoning experience), I've learnt a few things that could drag down the performance of a SQL statement.
Obsessive Compulsive Subqueries Disorder
Doing crazy type conversions (and nest those into oblivion)
Group By on aggregate functions of said crazy type conversions
Where fldID in (select EVERYTHING from my 5mil record table)
I typically work with MSSQL. What tools are available to test the performance of a SQL statement? Are these tools built in and specific to each type of DB server? Or are there general tools available?
SQL Profiler (built-in): Monitoring with SQL Profiler
SQL Benchmark Pro (Commercial)
SQL Server 2008 has the new Data Collector
SQL Server 2005 (onwards) has a missing indexes Dynamic Management View (DMV) which can be quite useful (but only for query plans currently in the plan cache): About the Missing Indexes Feature.
There is also the SQL Server Database Engine Tuning Advisor which does a reasonable job (just don't implement everything it suggests!)
I mostly just use Profiler and the execution plan viewer
Execution Plans are one of the first things to look at when debugging query performance problems. An execution plan will tell you how much time is roughly spent in each portion of your query, and can be used to quickly identify if you are missing indexes or have expensive joins or loops.
MSSQL has a database tuning advisor that will often recommend indexes for tables based upon common queries run during the tuning period, however it wo't rewrite a query for you.
In my opinion, experience and experimentation are the best tools for writing good SQL queries.
In mysql (may be in other databases too) you can EXPLAIN your query to see what database server thinks about it. This usually used to deside which indexes should be created. And this one is build-in, so you can use it without installing additional software.
Adam Machanic has a simple tool called SqlQueryStress that might be of use. It is designed to be used to "run a quick performance test against a single query, in order to test ideas or validate changes".