How to Performance tune a query that has Between statement for range of dates - sql

I am working on performance tuning all the slow running queries. I am new to Oracle have been using sql server for a while. Can someone help me tune the query to make it run faster.
Select distinct x.a, x.b from
from xyz_view x
where x.date_key between 20101231 AND 20160430
Appreciate any help or suggestions

First, I'd start by looking at why the DISTINCT is there. In my experience many developers tack on the DISTINCT because they know that they need unique results, but don't actually understand why they aren't already getting them.
Second, a clustered index on the column would be ideal for this specific query because it puts all of the rows right next to each other on disk and the server can just grab them all at once. The problem is, that might not be possible because you already have a clustered index that's good for other uses. In that case, try a non-clustered index on the date column and see what that does.
Keep in mind that indexing has wide-ranging effects, so using a single query to determine indexing isn't a good idea.

I would also add if you are pulling from a VIEW, you should really investigate the design of the view. It typically has a lot of joins that may not be necessary for your query. In addition, if the view is needed, you can look at creating an indexed view which can be very fast.

There is not much more you can do to optimize this query so long as you have established that the DISTINCT is really needed.
You can add a [NOLOCK] to the FROM clause if reading uncommitted pages is not an issue.
However you can analyze if the time is being inserted as well, and if so, is it really relevant, if not set the time to midnight this will improve indexes.
Biggest improvements I've seen is dividing the date field in the table into 3 fields, 1 for each date part. This can really improve performance.

Related

SQL Optimizing the Query

I am using the following query,
SELECT qTermine.REISENR,
qTermine.REISEVON,
qTermine.REISEBIS,
MIN(qPreise.PREIS) AS PREIS
FROM qTermine,
qPreise
WHERE qPreise.REISENR = qTermine.REISENR
AND qPreise.PROVKLASSE = 1
AND qPreise.VERFUEGBAR <> 0
GROUP BY qTermine.REISENR,
qTermine.REISEVON,
qTermine.REISEBIS
I only want to select those rows where the Price is minimum.
Is it OK or we can optimize it?
The query itself is good as it is. It's hard to come at this question, as for most uses, what you've written is fine. Are you currently experiencing performance issues, or is this query being executed extreme amounts of time? Or are you just cautious and want to avoid introducing a big bottleneck? In case of the latter, I don't think you need to be worried; again, in most uses, that's fine.
Make sure you have an index on REISENR on both tables and on qPreise.PREIS. As for PROVKLASSE and VERFUEGBAR I have a feeling that they won't be unique enough for indexes to actually ever be used if you were to add them (you could check that in your execution plan), but it's hard to tell without intimately knowing your data.
You query is good and You can have a better performance if you have index on "qPreise.PREIS"

How to improve query performance

I have a lot of records in table. When I execute the following query it takes a lot of time. How can I improve the performance?
SET ROWCOUNT 10
SELECT StxnID
,Sprovider.description as SProvider
,txnID
,Request
,Raw
,Status
,txnBal
,Stxn.CreatedBy
,Stxn.CreatedOn
,Stxn.ModifiedBy
,Stxn.ModifiedOn
,Stxn.isDeleted
FROM Stxn,Sprovider
WHERE Stxn.SproviderID = SProvider.Sproviderid
AND Stxn.SProviderid = ISNULL(#pSProviderID,Stxn.SProviderid)
AND Stxn.status = ISNULL(#pStatus,Stxn.status)
AND Stxn.CreatedOn BETWEEN ISNULL(#pStartDate,getdate()-1) and ISNULL(#pEndDate,getdate())
AND Stxn.CreatedBy = ISNULL(#pSellerId,Stxn.CreatedBy)
ORDER BY StxnID DESC
The stxn table has more than 100,000 records.
The query is run from a report viewer in asp.net c#.
This is my go-to article when I'm trying to do a search query that has several search conditions which might be optional.
http://www.sommarskog.se/dyn-search-2008.html
The biggest problem with your query is the column=ISNULL(#column, column) syntax. MSSQL won't use an index for that. Consider changing it to (column = #column AND #column IS NOT NULL)
You should consider using the execution plan and look for missing indexes. Also, how long it takes to execute? What is slow for you?
Maybe you could also not return so many rows, but that is just a guess. Actually we need to see your table and indexes plus the execution plan.
Check sql-tuning-tutorial
For one, use SELECT TOP () instead of SET ROWCOUNT - the optimizer will have a much better chance that way. Another suggestion is to use a proper inner join instead of potentially ending up with a cartesian product using the old style table,table join syntax (this is not the case here but it can happen much easier with the old syntax). Should be:
...
FROM Stxn INNER JOIN Sprovider
ON Stxn.SproviderID = SProvider.Sproviderid
...
And if you think 100K rows is a lot, or that this volume is a reason for slowness, you're sorely mistaken. Most likely you have really poor indexing strategies in place, possibly some parameter sniffing, possibly some implicit conversions... hard to tell without understanding the data types, indexes and seeing the plan.
There are a lot of things that could impact the performance of query. Although 100k records really isn't all that many.
Items to consider (in no particular order)
Hardware:
Is SQL Server memory constrained? In other words, does it have enough RAM to do its job? If it is swapping memory to disk, then this is a sure sign that you need an upgrade.
Is the machine disk constrained. In other words, are the drives fast enough to keep up with the queries you need to run? If it's memory constrained, then disk speed becomes a larger factor.
Is the machine processor constrained? For example, when you execute the query does the processor spike for long periods of time? Or, are there already lots of other queries running that are taking resources away from yours...
Database Structure:
Do you have indexes on the columns used in your where clause? If the tables do not have indexes then it will have to do a full scan of both tables to determine which records match.
Eliminate the ISNULL function calls. If this is a direct query, have the calling code validate the parameters and set default values before executing. If it is in a stored procedure, do the checks at the top of the s'proc. Unless you are executing this with RECOMPILE that does parameter sniffing, those functions will have to be evaluated for each row..
Network:
Is the network slow between you and the server? Depending on the amount of data pulled you could be pulling GB's of data across the wire. I'm not sure what is stored in the "raw" column. The first question you need to ask here is "how much data is going back to the client?" For example, if each record is 1MB+ in size, then you'll probably have disk and network constraints at play.
General:
I'm not sure what "slow" means in your question. Does it mean that the query is taking around 1 second to process or does it mean it's taking 5 minutes? Everything is relative here.
Basically, it is going to be impossible to give a hard answer without a lot of questions asked by you. All of these will bear out if you profile the queries, understand what and how much is going back to the client and watch the interactions amongst the various parts.
Finally depending on the amount of data going back to the client there might not be a way to improve performance short of hardware changes.
Make sure Stxn.SproviderID, Stxn.status, Stxn.CreatedOn, Stxn.CreatedBy, Stxn.StxnID and SProvider.Sproviderid all have indexes defined.
(NB -- you might not need all, but it can't hurt.)
I don't see much that can be done on the query itself, but I can see things being done on the schema :
Create an index / PK on Stxn.SproviderID
Create an index / PK on SProvider.Sproviderid
Create indexes on status, CreatedOn, CreatedBy, StxnID
Something to consider: When ROWCOUNT or TOP are used with an ORDER BY clause, the entire result set is created and sorted first and then the top 10 results are returned.
How does this run without the Order By clause?

how to tune this view?fetching time takes 9.968 but i want in .5. so how to give better performance

SELECT
/*+ INDEX(ID_BL_REF_NO REF_number_BL_idx*/ DECODE(BL_TYPE,'E',BL_ORIGIN_NAME,'I',BL_FINAL_NAME) FROM_PORT,
DECODE(BL_TYPE,'I',BL_ORIGIN_NAME,'E',BL_FINAL_NAME) TO_PORT,
(BL_VESSEL_CONNECT||'/'||BL_VOYAGE_CONNECT||'/'||BL_PORT_CONNECT) Mother_vessel_voyage_port,
SUM(BLC_SIZE) No_of_20s,
SUM(BLC_SIZE) No_of_40s,
SUM(DECODE(BLC_SIZE,'20',1,'40',2)) Teus,
SUM(BLC_GROSSWT) GrossWt,
round((BLC_GROSSWT/SUM(DECODE(BLC_SIZE,'20',1,'40',2))),2) AverageWt,
SUM(DECODE(BLF_MODE,'P',BLF_LOCAL_AMOUNT)) PREPAID,
SUM(DECODE(BLF_MODE,'C',BLF_LOCAL_AMOUNT)) COLLECT,
SUM(DECODE(BLF_MODE,'E',BLF_LOCAL_AMOUNT)) ELSEWHERE,
(SUM(DECODE(BLF_MODE,'P',BLF_LOCAL_AMOUNT)+DECODE(BLF_MODE,'C',BLF_LOCAL_AMOUNT)+DECODE(BLF_MODE,'E',BLF_LOCAL_AMOUNT))/SUM(DECODE(BLC_SIZE,'20',1,'40',2))) AVERAGE
FROM ID_BL_DETAILS,id_bl_containers,ID_BL_FREIGHT
WHERE BL_REFNO=BLC_REFNO
AND BLF_REFNO=BLC_REFNO
GROUP BY BL_VESSEL_CONNECT,BL_VOYAGE_CONNECT,BL_PORT_CONNECT,BL_ORIGIN_NAME,BL_LODPORT,BL_DISPORT,BL_FINAL_NAME,BLC_GROSSWT,BL_TYPE
Your WHERE clause contains only joins. There are no filters. This means your query needs to consider all the rows in at least one table. From this it follows that your query should execute a FULL TABLE SCAN of at least one of your tables, not an indexed read. A full table scan is the most efficient way of getting all the rows in a table.
So don't fix the syntax of your INDEX hint, get rid of it.
Next, figure out which table should drive your query. This is business logic. Probably your requirement is something like
"Summarise BL_DETAILS and BL_FREIGHT
for every row in BL_CONTAINERS."
In which case you might think you need a full table scan of BL_CONTAINERS. But if BL_FREIGHT has more rows than BL_CONTAINERS and every BLF_REF_NO matches a BL_REF_NO (i.e. there is a foreign key on BL_FREIGHT.BLF_REF_NO referencing BL_CONTAINERS.BL_REF_NO) it would probably be better to drive from BL_FREIGHT.
Note that this is true if you are only interested in BL_CONTAINERS which have matching BL_FREIGHT rows. But if you want to include containers which have not been used (i.e. they have no matching no BL_FREIGHT records) you need to use outer joins and drive off the BL_CONTAINERS table.
These considerations get more complicated when you throw BL_DETAILS into the mix. Your report seems to be based around the BL_DETAILS categories (as Jeffrey observes it is hard for us to understand your query without aliases or describes). So perhaps BL_DETAILS is the right candidate for driving table.
As you can see, tuning requires insight into the business logic and the details of the data model. You have that local knowledge, we do not.
There are tools which can help you. Oracle has EXPLAIN PLAN which will show you how the database will execute the query. The query optimizer gets better with each release so it matters which version of the database you're using. Here is the documentation for 10g.
The important thing to note is that you need to give the database accurate statistics, in order for it to come up with a good plan. Find out more.
run explain on your query and make sure the proper indexes are set up. Will improve the speed of the query
http://www.sql.org/sql-database/postgresql/manual/sql-explain.html
Your question states that the query takes 9.968 seconds and you want it to be 0.5 seconds or less. This can only be done effectively (if possible at all) when you know where those 9.968 seconds are spent on. And to know where time in a query is being spent, you'll not only want to explain the statement, you'll want to trace an execution of that query. The latter will give you a breakdown of how the time in your query is spent.
There are two threads on OTN that describe how you can do that.
If you want to do the bare minimum, please follow this one:
http://forums.oracle.com/forums/thread.jspa?messageID=1812597
And if you want to give full details, please follow this one:
http://forums.oracle.com/forums/thread.jspa?threadID=863295
Happy tracing!
Regards,
Rob.

How to use Explain Plan to optimize queries?

I have been tasked to optimize some sql queries at work. Everything I have found points to using Explain Plan to identify problem areas. The problem I can not find out exactly what explain plan is telling me. You get Cost, Cardinality, and bytes.
What do this indicate, and how should I be using this as a guide. Are low numbers better? High better? Any input would be greatly appreciated.
Or if you have a better way to go about optimizing a query, I would be interested.
I also assume you are using Oracle. And I also recommend that you check out the explain plan web page, for starters. There is a lot to optimization, but it can be learned.
A few tips follow:
First, when somebody tasks you to optimize, they are almost always looking for acceptable performance rather than ultimate performance. If you can reduce a query's running time from 3 minutes down to 3 seconds, don't sweat reducing it down to 2 seconds, until you are asked to.
Second, do a quick check to make sure the queries you are optimizing are logically correct. It sounds absurd, but I can't tell you the number of times I've been asked for advice on a slow running query, only to find out that it was occasionally giving wrong answers! And as it turns out, debugging the query often turned out to speed it up as well.
In particular, look for the phrase "Cartesian Join" in the explain plan. If you see it there, the chances are awfully good that you've found an unintentional cartesian join. The usual pattern for an unintentional cartesian join is that the FROM clause lists tables separated by comma, and the join conditions are in the WHERE clause. Except that one of the join conditions is missing, so that Oracle has no choice but to perform a cartesian join. With large tables, this is a performance disaster.
It is possible to see a Cartesian Join in the explain plan where the query is logically correct, but I associate this with older versions of Oracle.
Also look for the unused compound index. If the first column of a compound index is not used in the query, Oracle may use the index inefficiently, or not at all. Let me give an example:
The query was:
select * from customers
where
State = #State
and ZipCode = #ZipCode
(The DBMS was not Oracle, so the syntax was different, and I've forgotten the original syntax).
A quick peek at the indexes revealed an index on Customers with the columns
(Country, State, ZipCode) in that order. I changed the query to read
select * from customers
where Country = #Country
and State = #State
and ZipCode = #ZipCode
and now it ran in about 6 seconds instead of about 6 minutes, because the optimizer was able to use the index to good advantage. I asked the application programmers why they had omitted the country from the criteria, and this was their answer: they knew that all the addresses had country equal to 'USA' so they figured they could speed up the query by leaving that criterion out!
Unfortunately, optimizing database retrieval is not really the same as shaving microseconds off of computing time. It involves understanding the database design, especially indexes, and at least an overview of how the optimizer does its job.
You generally get better results from the optimizer when you learn to collaborate with it instead of trying to outsmart it.
Good luck coming up to speed at optimization!
You get more than that actually depending on what you are doing. Check out this explain plan page. I'm assuming a little bit here that you are using Oracle and know how to run the script to display the plan output. What may be more important to start with is looking at the left hand side for the use of a particular index or not and how that index is being utilized. You should see things like "(Full)", "(By Index Rowid)", etc if you are doing joins. The cost would be the next thing to look at with lower costs being better and you will notice that if you are doing a join that is not using an index you may get a very large cost. You may also want to read details about the explain plan columns.
You got the fuzzy end of the lollipop.
There is absolutely no way, in isolation, without a ton of additional information and experience, to look at an explain plan and determine what (if anything) is causing less than optimum performance. If query tuning could be reduced to a 10 step process it would be done by an automated process. I was about to list all of the things you need to understand to be effective at this but that would be a very long list.
the only short answer I can think of... is look for steps in the plan that are going through way more bytes than you'd guess. Then think about how you can reduce that number... via an index or partitioning.
Seriously, get Jonathan's Lewis book on Cost Based Oracle Fundementals
Get Tom Kyte's book on Oracle database Architecture and rent a cabin in the woods for a few weeks.
This is a massive area of expertise (aka a black art).
The approach I generally take is:
Run the SQL statement in question,
Get the actual plan (look up dbms_xplan),
Compare the estimated number of rows (cardinality) vs actual number of rows. A big difference indicates a problem to be fixed (e.g. index, histogram)
Consider if you can create an index to speed part of the process (generally where you conceptually think the plan should go first). Try some indexes.
You need to understand the O() impacts of different indexes in the context of what you are asking the database. It helps you understand data structures like b-trees, hash tables etc. Then, create an index that might work and repeat the process.
If Oracle decides not to use your index, apply an INDEX() hint and look at the new plan. The cost will be greater than the plan it did choose - this is why it didn't pick your index. The hinted plan might lead to some insight about why your index is not good.

How do you optimize tables for specific queries?

What are the patterns you use to determine the frequent queries?
How do you select the optimization factors?
What are the types of changes one can make?
This is a nice question, if rather broad (and none the worse for that).
If I understand you, then you're asking how to attack the problem of optimisation starting from scratch.
The first question to ask is: "is there a performance problem?"
If there is no problem, then you're done. This is often the case. Nice.
On the other hand...
Determine Frequent Queries
Logging will get you your frequent queries.
If you're using some kind of data access layer, then it might be simple to add code to log all queries.
It is also a good idea to log when the query was executed and how long each query takes. This can give you an idea of where the problems are.
Also, ask the users which bits annoy them. If a slow response doesn't annoy the user, then it doesn't matter.
Select the optimization factors?
(I may be misunderstanding this part of the question)
You're looking for any patterns in the queries / response times.
These will typically be queries over large tables or queries which join many tables in a single query. ... but if you log response times, you can be guided by those.
Types of changes one can make?
You're specifically asking about optimising tables.
Here are some of the things you can look for:
Denormalisation. This brings several tables together into one wider table, so in stead of your query joining several tables together, you can just read one table. This is a very common and powerful technique. NB. I advise keeping the original normalised tables and building the denormalised table in addition - this way, you're not throwing anything away. How you keep it up to date is another question. You might use triggers on the underlying tables, or run a refresh process periodically.
Normalisation. This is not often considered to be an optimisation process, but it is in 2 cases:
updates. Normalisation makes updates much faster because each update is the smallest it can be (you are updating the smallest - in terms of columns and rows - possible table. This is almost the very definition of normalisation.
Querying a denormalised table to get information which exists on a much smaller (fewer rows) table may be causing a problem. In this case, store the normalised table as well as the denormalised one (see above).
Horizontal partitionning. This means making tables smaller by putting some rows in another, identical table. A common use case is to have all of this month's rows in table ThisMonthSales, and all older rows in table OldSales, where both tables have an identical schema. If most queries are for recent data, this strategy can mean that 99% of all queries are only looking at 1% of the data - a huge performance win.
Vertical partitionning. This is Chopping fields off a table and putting them in a new table which is joinned back to the main table by the primary key. This can be useful for very wide tables (e.g. with dozens of fields), and may possibly help if tables are sparsely populated.
Indeces. I'm not sure if your quesion covers these, but there are plenty of other answers on SO concerning the use of indeces. A good way to find a case for an index is: find a slow query. look at the query plan and find a table scan. Index fields on that table so as to remove the table scan. I can write more on this if required - leave a comment.
You might also like my post on this.
That's difficult to answer without knowing which system you're talking about.
In Oracle, for example, the Enterprise Manager lets you see which queries took up the most time, lets you compare different execution profiles, and lets you analyze queries over a block of time so that you don't add an index that's going to help one query at the expense of every other one you run.
Your question is a bit vague. Which DB platform?
If we are talking about SQL Server:
Use the Dynamic Management Views. Use SQL Profiler. Install the SP2 and the performance dashboard reports.
After determining the most costly queries (i.e. number of times run x cost one one query), examine their execution plans, and look at the sizes of the tables involved, and whether they are predominately Read or Write, or a mixture of both.
If the system is under your full control (apps. and DB) you can often re-write queries that are badly formed (quite a common occurrance), such as deep correlated sub-queries which can often be re-written as derived table joins with a little thought. Otherwise, you options are to create covering non-clustered indexes and ensure that statistics are kept up to date.
For MySQL there is a feature called log slow queries
The rest is based on what kind of data you have and how it is setup.
In SQL server you can use trace to find out how your query is performing. Use ctrl + k or l
For example if u see full table scan happening in a table with large number of records then it probably is not a good query.
A more specific question will definitely fetch you better answers.
If your table is predominantly read, place a clustered index on the table.
My experience is with mainly DB2 and a smattering of Oracle in the early days.
If your DBMS is any good, it will have the ability to collect stats on specific queries and explain the plan it used for extracting the data.
For example, if you have a table (x) with two columns (date and diskusage) and only have an index on date, the query:
select diskusage from x where date = '2008-01-01'
will be very efficient since it can use the index. On the other hand, the query
select date from x where diskusage > 90
would not be so efficient. In the former case, the "explain plan" would tell you that it could use the index. In the latter, it would have said that it had to do a table scan to get the rows (that's basically looking at every row to see if it matches).
Really intelligent DBMS' may also explain what you should do to improve the performance (add an index on diskusage in this case).
As to how to see what queries are being run, you can either collect that from the DBMS (if it allows it) or force everyone to do their queries through stored procedures so that the DBA control what the queries are - that's their job, keeping the DB running efficiently.
indices on PKs and FKs and one thing that always helps PARTITIONING...
1. What are the patterns you use to determine the frequent queries?
Depends on what level you are dealing with the database. If you're a DBA or a have access to the tools, db's like Oracle allow you to run jobs and generate stats/reports over a specified period of time. If you're a developer writing an application against a db, you can just do performance profiling within your app.
2. How do you select the optimization factors?
I try and get a general feel for how the table is being used and the data it contains. I go about with the following questions.
Is it going to be updated a ton and on what fields do updates occur?
Does it have columns with low cardinality?
Is it worth indexing? (tables that are very small can be slowed down if accessed by an index)
How much maintenance/headache is it worth to have it run faster?
Ratio of updates/inserts vs queries?
etc.
3. What are the types of changes one can make?
-- If using Oracle, keep statistics up to date! =)
-- Normalization/De-Normalization either one can improve performance depending on the usage of the table. I almost always normalize and then only if I can in no other practical way make the query faster will de-normalize. A nice way to denormalize for queries and when your situation allows it is to keep the real tables normalized and create a denormalized "table" with a materialized view.
-- Index judiciously. Too many can be bad on many levels. BitMap indexes are great in Oracle as long as you're not updating the column frequently and that column has a low cardinality.
-- Using Index organized tables.
-- Partitioned and sub-partitioned tables and indexes
-- Use stored procedures to reduce round trips by applications, increase security, and enable query optimization without affecting users.
-- Pin tables in memory if appropriate (accessed a lot and fairly small)
-- Device partitioning between index and table database files.
..... the list goes on. =)
Hope this is helpful for you.