How to rewrite SQL queries who give better performance in MySQL

How to rewrite SQL queries who give better performance in MySQL - sql

I have a project where all SQL queries was already written. Most of them can be improved, and I want to do that.
But how I can check and analyze the both queries [pre-written and I write same queries using different way].
I want to analyze the queries if I rewrite them then I can check if I did right or if old query is better than mine.
I need a queries analyzer tool for Mysql based RDBMS.
[if any mistake goes then edit it]

MySQL has an EXPLAIN statement, which tells you what execution path the engine will follow. You can use that to determine what changes to make:
EXPLAIN SELECT foo,bar from glurch WHERE baz > 1 ORDER BY foo;
See: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html
Pay special attention to the rows column, which shows how many table rows need to be examined to execute that part of the query, as well as the Extra column, which show more importantly, how the query gets executed.
For example, if you see "Using temporary" in the Extra column, that usually means that the DB will need to write the query results out to a temporary table (possibly on disk), sort them, and then re-read them (or at least some of them).
You can avoid temporary tables and other nasty performance killers by providing appropriate indexes. And not just enough indexes, but rather the correct indexes. Just because you've indexed a column doesn't mean that index can be used in your query.

When you have a good RDMBS engine, the query parser should determine the most performant execution plan for your query, no matter how it is written.
I mean: the order of the tables in your from and join clauses, or the order of your filter-criteria should not matter.
The first thing to do, is look at the execution plan and analyze it. Check wether you've to add indexes, or modify indexes for instance.
Your question is also a bit vague. Can you give a sample of a query which you'd like to refactor ?

Related

Does SQLite have an eval command?

I have a query referenced in Why is SQLite refusing to use available indexes when adding a JOIN? that is a compound query. When the segments of the query are evaluated individually, the query plan generated applies the relevant indicies and runs smoothly. However, when run together (via a JOIN) it fails to do so. Therefore, I was wondering if there was a way to create a query that runs 'eval' on the subquery and passes that to the outer query to force SQLite to use the query plans that would have been generated had they been done individually.

The answer to your other question tells you why already: indexes are not used when they're not useful.
In essence:
If it's cheapest to hop back and forth on disk pages to fetch a handful of rows that match a query, an index gets used.
If it's cheapest to just read the entire mess and filter out uneeded rows, an index is not used.
Some databases (e.g. Postgres) offer an intermediary level between the two in the form of a bitmap index scan: it amounts to the second with a pre-flight check based on the index, to avoid visiting disk pages that contain no matching rows.
That's all there is to it, really: a few rows, index; lots of rows, no index.
Naturally, poorly written queries don't use indexes either, but that's for different reasons: they just confuse the query planner, and while smart the latter is not all-knowing. Joining on a union or an aggregate, in particular, are a prime recipe for not using indexes. (And that is what you are doing.)

Per usual you should write your queries and indexes that way so that Sqlite's query optimizer recognizes the optimal indexes and just uses them.
But as your question in this case is more specific it seems you look for an equivalent of SQL Server's FORCE(INDEX) clause.
As I have read about it in Sqlite there is the clause INDEXED BY, though it seems Sqlite's community's opinions about it are split (probably because of what I mentioned in my first sentence)
link 1 sqlite.org's documentation about it
link 2 for a tutorial on that

Using temp table for sorting data in SQL Server

Recently, I came across a pattern (not sure, could be an anti-pattern) of sorting data in a SELECT query. The pattern is more of a verbose and non-declarative way for ordering data. The pattern is to dump relevant data from actual table into temporary table and then apply orderby on a field on the temporary table. I guess, the only reason why someone would do that is to improve the performance (which I doubt) and no other benefit.
For e.g. Let's say, there is a user table. The table might contain rows in millions. We want to retrieve all the users whose first name starts with 'G' and sorted by first name. The natural and more declarative way to implement a SQL query for this scenario is:
More natural and declarative way
SELECT * FROM Users
WHERE NAME LIKE 'G%'
ORDER BY Name
Verbose way
SELECT * INTO TempTable
FROM Users
WHERE NAME LIKE 'G%'
SELECT * FROM TempTable
ORDER BY Name
With that context, I have few questions:
Will there be any performance difference between two ways if there is no index on the first name field. If yes, which one would be better.
Will there be any performance difference between two ways if there is an index on the first name field. If yes, which one would be better.
Should not the SQL Server optimizer generate same execution plan for both the ways?
Is there any benefit in writing a verbose way from any other persective like locking/blocking?
Thanks in advance.

Reguzlarly: Anti pattern by people without an idea what they do.
SOMETIMES: ok, because SQL Server has a problem that is not resolvable otherwise - not seen that one in yeas, though.
It makes things slower because it forces the tmpddb table to be fully populated FIRST, while otherwise the query could POSSIBLY be resoled more efficiently.
last time I saw that was like 3 years ago. We got it 3 times as fast by not being smart and using a tempdb table ;)
Answers:
1: No, it still needs a table scan, obviously.
2: Possibly - depends on data amount, but an index seek by index would contain the data in order already (as the index is ordered by content).
3: no. Obviously. Query plan optimization is statement by statement. By cutting the execution in 2, the query optimizer CAN NOT merge the join into the first statement.
4: Only if you run into a query optimizer issue or a limitation of how many tables you can join - not in that degenerate case (degenerate in a technical meaning - i.e. very simplistic). BUt if you need to join MANY MANY tables it may be better to go with an interim step.

If the field you want to do an order by on is not indexed, you could put everything into a temp table and index it and then do the ordering and it might be faster. You would have to test to make sure.

There is never any benefit of the second approach that I can think of.
It means if the data is available pre-ordered SQL Server can't take advantage of this and adds an unnecessary blocking operator and additional sort to the plan.
In the case that the data is not available pre-ordered SQL Server will sort it in a work table either in memory or tempdb anyway and adding an explicit #temp table just adds an unnecessary additional step.
Edit
I suppose one case where the second approach could give an apparent benefit might be if the presence of the ORDER BY caused SQL Server to choose a different plan that turned out to be sub optimal. In which case I would resolve that in a different way by either improving statistics or by using hints/query rewrite to avoid the undesired plan.

how to tune this view?fetching time takes 9.968 but i want in .5. so how to give better performance

SELECT
/*+ INDEX(ID_BL_REF_NO REF_number_BL_idx*/ DECODE(BL_TYPE,'E',BL_ORIGIN_NAME,'I',BL_FINAL_NAME) FROM_PORT,
DECODE(BL_TYPE,'I',BL_ORIGIN_NAME,'E',BL_FINAL_NAME) TO_PORT,
(BL_VESSEL_CONNECT||'/'||BL_VOYAGE_CONNECT||'/'||BL_PORT_CONNECT) Mother_vessel_voyage_port,
SUM(BLC_SIZE) No_of_20s,
SUM(BLC_SIZE) No_of_40s,
SUM(DECODE(BLC_SIZE,'20',1,'40',2)) Teus,
SUM(BLC_GROSSWT) GrossWt,
round((BLC_GROSSWT/SUM(DECODE(BLC_SIZE,'20',1,'40',2))),2) AverageWt,
SUM(DECODE(BLF_MODE,'P',BLF_LOCAL_AMOUNT)) PREPAID,
SUM(DECODE(BLF_MODE,'C',BLF_LOCAL_AMOUNT)) COLLECT,
SUM(DECODE(BLF_MODE,'E',BLF_LOCAL_AMOUNT)) ELSEWHERE,
(SUM(DECODE(BLF_MODE,'P',BLF_LOCAL_AMOUNT)+DECODE(BLF_MODE,'C',BLF_LOCAL_AMOUNT)+DECODE(BLF_MODE,'E',BLF_LOCAL_AMOUNT))/SUM(DECODE(BLC_SIZE,'20',1,'40',2))) AVERAGE
FROM ID_BL_DETAILS,id_bl_containers,ID_BL_FREIGHT
WHERE BL_REFNO=BLC_REFNO
AND BLF_REFNO=BLC_REFNO
GROUP BY BL_VESSEL_CONNECT,BL_VOYAGE_CONNECT,BL_PORT_CONNECT,BL_ORIGIN_NAME,BL_LODPORT,BL_DISPORT,BL_FINAL_NAME,BLC_GROSSWT,BL_TYPE

Your WHERE clause contains only joins. There are no filters. This means your query needs to consider all the rows in at least one table. From this it follows that your query should execute a FULL TABLE SCAN of at least one of your tables, not an indexed read. A full table scan is the most efficient way of getting all the rows in a table.
So don't fix the syntax of your INDEX hint, get rid of it.
Next, figure out which table should drive your query. This is business logic. Probably your requirement is something like
"Summarise BL_DETAILS and BL_FREIGHT
for every row in BL_CONTAINERS."
In which case you might think you need a full table scan of BL_CONTAINERS. But if BL_FREIGHT has more rows than BL_CONTAINERS and every BLF_REF_NO matches a BL_REF_NO (i.e. there is a foreign key on BL_FREIGHT.BLF_REF_NO referencing BL_CONTAINERS.BL_REF_NO) it would probably be better to drive from BL_FREIGHT.
Note that this is true if you are only interested in BL_CONTAINERS which have matching BL_FREIGHT rows. But if you want to include containers which have not been used (i.e. they have no matching no BL_FREIGHT records) you need to use outer joins and drive off the BL_CONTAINERS table.
These considerations get more complicated when you throw BL_DETAILS into the mix. Your report seems to be based around the BL_DETAILS categories (as Jeffrey observes it is hard for us to understand your query without aliases or describes). So perhaps BL_DETAILS is the right candidate for driving table.
As you can see, tuning requires insight into the business logic and the details of the data model. You have that local knowledge, we do not.
There are tools which can help you. Oracle has EXPLAIN PLAN which will show you how the database will execute the query. The query optimizer gets better with each release so it matters which version of the database you're using. Here is the documentation for 10g.
The important thing to note is that you need to give the database accurate statistics, in order for it to come up with a good plan. Find out more.

run explain on your query and make sure the proper indexes are set up. Will improve the speed of the query
http://www.sql.org/sql-database/postgresql/manual/sql-explain.html

Your question states that the query takes 9.968 seconds and you want it to be 0.5 seconds or less. This can only be done effectively (if possible at all) when you know where those 9.968 seconds are spent on. And to know where time in a query is being spent, you'll not only want to explain the statement, you'll want to trace an execution of that query. The latter will give you a breakdown of how the time in your query is spent.
There are two threads on OTN that describe how you can do that.
If you want to do the bare minimum, please follow this one:
http://forums.oracle.com/forums/thread.jspa?messageID=1812597
And if you want to give full details, please follow this one:
http://forums.oracle.com/forums/thread.jspa?threadID=863295
Happy tracing!
Regards,
Rob.

Cost of logic in a query

I have a query that looks something like this:
select xmlelement("rootNode",
(case
when XH.ID is not null then
xmlelement("xhID", XH.ID)
else
xmlelement("xhID", xmlattributes('true' AS "xsi:nil"), XH.ID)
end),
(case
when XH.SER_NUM is not null then
xmlelement("serialNumber", XH.SER_NUM)
else
xmlelement("serialNumber", xmlattributes('true' AS "xsi:nil"), XH.SER_NUM)
end),
/*repeat this pattern for many more columns from the same table...*/
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
It's ugly and I don't like it, and it is also the slowest executing query (there are others of similar form, but much smaller and they aren't causing any major problems - yet). Maintenance is relatively easy as this is mostly a generated query, but my concern now is for performance. I am wondering how much of an overhead there is for all of these case expressions.
To see if there was any difference, I wrote another version of this query as:
select xmlelement("rootNode",
xmlforest(XH.ID, XH.SER_NUM,...
(I know that this query does not produce exactly the same, thing, my plan was to move the logic for handling the renaming and xsi:nil attribute to XSL or maybe to PL/SQL)
I tried to get execution plans for both versions, but they are the same. I'm guessing that the logic does not get factored into the execution plan. My gut tells me the second version should execute faster, but I'd like some way to prove that (other than writing a PL/SQL test function with timing statements before and after the query and running that code over and over again to get a test sample).
Is it possible to get a good idea of how much the case-when will cost?
Also, I could write the case-when using the decode function instead. Would that perform better (than case-statements)?

Just about anything in your SELECT list, unless it is a user-defined function which reads a table or view, or a nested subselect, can usually be neglected for the purpose of analyzing your query's performance.
Open your connection properties and set the value SET STATISTICS IO on. Check out how many reads are happening. View the query plan. Are your indexes being used properly? Do you know how to analyze the plan to see?

For the purposes of performance tuning you are dealing with this statement:
SELECT *
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
How does that query perform? If it returns in markedly less time than the XML version then you need to consider the performance of the functions, but I would astonished if that were the case (oh ho!).
Does this return one row or several? If one row then you have only two things to work with:
is XH.ID indexed and, if so, is the index being used?
does the "many more columns from the same table" indicate a problem with chained rows?
If the query returns several rows then ... Well, actually you have the same two things to work with. It's just the emphasis is different with regards to indexes. If the index has a very poor clustering factor then it could be faster to avoid using the index in favour of a full table scan.
Beyond that you would need to look at physical problems - I/O bottlenecks, poor interconnects, a dodgy disk. The reason why your scope for tuning the query is so restricted is because - as presented - it is a single table, single column read. Most tuning is about efficient joining. Now if XH transpires to be a view over a complex query then it is a different matter.

You can use good old tkprof to analyze statistics. One of the many forms of ALTER SESSION that turn on stats gathering. The DBMS_PROFILER package also gathers statistics if your cursor is in a PL/SQL code block.

Does the way you write sql queries affect performance?

say i have a table
Id int
Region int
Name nvarchar
select * from table1 where region = 1 and name = 'test'
select * from table1 where name = 'test' and region = 1
will there be a difference in performance?
assume no indexes
is it the same with LINQ?

Because your qualifiers are, in essence, actually the same (it doesn't matter what order the where clauses are put in), then no, there's no difference between those.
As for LINQ, you will need to know what query LINQ to SQL actually emits (you can use a SQL Profiler to find out). Sometimes the query will be the simplest query you can think of, sometimes it will be a convoluted variety of such without you realizing it, because of things like dependencies on FKs or other such constraints. LINQ also wouldn't use an * for select.
The only real way to know is to find out the SQL Server Query Execution plan of both queries. To read more on the topic, go here:
SQL Server Query Execution Plan Analysis

Should it? No. SQL is a relational algebra and the DBMS should optimize irrespective of order within the statement.
Does it? Possibly. Some DBMS' may store data in a certain order (e.g., maintain a key of some sort) despite what they've been told. But, and here's the crux: you cannot rely on it.
You may need to switch DBMS' at some point in the future. Even a later version of the same DBMS may change its behavior. The only thing you should be relying on is what's in the SQL standard.
Regarding the query given: with no indexes or primary key on the two fields in question, you should assume that you'll need a full table scan for both cases. Hence they should run at the same speed.

I don't recommend the *, because the engine should look for the table scheme before executing the query. Instead use the table fields you want to avoid unnecessary overhead.
And yes, the engine optimizes your queries, but help him :) with that.
Best Regards!

For simple queries, likely there is little or no difference, but yes indeed the way you write a query can have a huge impact on performance.
In SQL Server (performance issues are very database specific), a correlated subquery will usually have poor performance compared to doing the same thing in a join to a derived table.
Other things in a query that can affect performance include using SARGable1 where clauses instead of non-SARGable ones, selecting only the fields you need and never using select * (especially not when doing a join as at least one field is repeated), using a set-bases query instead of a cursor, avoiding using a wildcard as the first character in a a like clause and on and on. There are very large books that devote chapters to more efficient ways to write queries.
1 "SARGable", for those that don't know, are stage 1 predicates in DB2 parlance (and possibly other DBMS'). Stage 1 predicates are more efficient since they're parts of indexes and DB2 uses those first.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas