Query optimization from from query execution plan - sql

I have not found any suitable way to show query plan other than image, so i added image. in image i got the execution plan and i want to reduce fullouter join cost
, if any one suggest me the way of reducing cost it would be great for better query plan link
WITH cte AS
(
SELECT
coalesce(fact_connect_hours.dimProviderId,fact_connect_hour_hum_shifts.dimProviderId,fact_connect_hour_clock_times.dimProviderId)
as dimProviderId,
coalesce(fact_connect_hours.dimScribeId,fact_connect_hour_hum_shifts.dimScribeId,fact_connect_hour_clock_times.dimScribeId)
as dimScribeId
,coalesce(fact_connect_hours.dimDateId,fact_connect_hour_hum_shifts.dimDateId,fact_connect_hour_clock_times.dimDateId)
as dimDateId
,factConnectHourId
,totalProviderLogTime
,providerFirstJoinTime
,providerLastEndTime
,scribeFirstLogin
,scribeLastLogout
,totalScribeLogTime
, totalScopeTime
, totalStreamTime
, firstScopeJoinTime
, lastScopeEndTime
, scopeLastActivityTime
, firstStreamJoinTime
, lastStreamEndTime
, streamLastActivityTime
,fact_connect_hour_hum_shifts.shiftStartTime
,fact_connect_hour_hum_shifts.shiftEndTime
,fact_connect_hour_hum_shifts.totalShiftTime
,fact_connect_hour_clock_times.ClockStartTimestamp
,fact_connect_hour_clock_times.ClockEndTimestamp
,fact_connect_hour_clock_times.totalClockTime
,fact_connect_hour_hum_shifts.shiftTitle
,fact_connect_hours.dimStatusId
,dim_status.status
FROM fact_connect_hours
INNER JOIN dim_status on fact_connect_hours.dimStatusId=dim_status.dimStatusId
full outer JOIN fact_connect_hour_hum_shifts
ON ( fact_connect_hour_hum_shifts.dimDateId=fact_connect_hours.dimDateId
and fact_connect_hour_hum_shifts.dimProviderId=fact_connect_hours.dimProviderId
and fact_connect_hour_hum_shifts.dimScribeId=fact_connect_hours.dimScribeId)
full outer join fact_connect_hour_clock_times
on (fact_connect_hours.dimDateId = fact_connect_hour_clock_times.dimDateId
and fact_connect_hours.dimProviderId= fact_connect_hour_clock_times.dimProviderId
and fact_connect_hours.dimScribeId = fact_connect_hour_clock_times.dimScribeId
)
WHERE coalesce(fact_connect_hours.dimDateId,fact_connect_hour_hum_shifts.dimDateId,fact_connect_hour_clock_times.dimDateId)>=732
) SELECT cte.*
,dim_date.tranDate
,dim_date.tranMonth
,dim_date.tranMonthName
,dim_date.tranYear
,dim_date.tranWeek
,dim_scribe.scribeUId
,dim_scribe.scribeFirstname
,dim_scribe.scribeFullname
,dim_scribe.scribeLastname
,dim_scribe.location
,dim_scribe.partner
,dim_scribe.beta
,dim_scribe.currentStatus
,dim_scribe.scribeEmail
,dim_scribe.augmedixEmail
,dim_scribe.partner
,dim_provider.scribeManager
,dim_provider.clinicalAccountManagerName
,dim_provider.providerUId
,dim_provider.beta
,dim_provider.accountName
,dim_provider.accountGroup
,dim_provider.accountType
,dim_provider.goLiveDate
,dim_provider.siteName
,dim_provider.churnDate
,dim_provider.providerFullname
,dim_provider.providerEmail
from cte
INNER JOIN dim_date on cte.dimDateId=dim_date.dimDateId
inner JOIN aug_bi_dw.dbo.dim_provider AS dim_provider on cte.dimProviderId=dim_provider.dimProviderId
inner join aug_bi_dw.dbo.dim_scribe AS dim_scribe on cte.dimScribeId=dim_scribe.dimScribeId
where dim_date.dimDateId>=732

Based on the table names (dim* and fact*), I'll assume you are doing a report of sorts over a data warehouse schema. Assuming this is the case, then likely the best thing you can do to improve performance is to consider using Columnstore indexes (and batch mode execution which is implicit once you enable Columnstores). These indexes are heavily compressed and often give significant performance gains on IO-bound workloads. Fact tables are the usual candidates as they are largest/often don't fit in the buffer pool.
Columnstores are supported in all editions in SQL 2016 onwards and go faster in Enterprise Edition (more parallelism, faster operations internally like using SIMD instructions, etc.). Please note that they don't directly support primary keys, so this may impact how you lay out the tables a bit. You can create keys (as b-tree secondary indexes internally), so some of the space savings are lost if you use primary keys. Often, fact tables + columnstores also use partitioning to get another layer of filtering without secondary indexes.
Please consider trying your query again with columnstores replacing the fact tables (perhaps on a copy of your database to do an experiment). When you look at the query plans that result, I suggest that you also look to see if the operators are running in batch mode. Batch mode operators are different than their row mode counterparts. The batch mode ones are optimized for the architectures of modern CPUs to minimize the amount of memory traffic in and out of the CPU. As a rough rule-of-thumb, 10x-100x difference is possible with columnstores + batch mode.

The only filter that can help you is 'where dim_date.dimDateId>=X'
And this comes up with a join to the cte and the cte field is composed out of 3 tables outer join themselves. For best performance i would choose to tell sql what to do step by step else it is very risky to perform with the best plan as is:
use 3 statements at tables fact_connect_hours, fact_connect_hour_hum_shifts and fact_connect_hour_clock_times using the filter and get results (just primary keys or all columns needed) into 3 temps like #fact_connect_hours, #fact_connect_hour_hum_shifts and #fact_connect_hour_clock_times
Use the statement as is but replace with the temps or use temps join real tables if temps only have the PK
Add indexes (if not present already) to columns fact_connect_hours.dimDateId, fact_connect_hour_hum_shifts.dimDateId and fact_connect_hour_clock_times.dimDateId
This way you make sure you filter directly what you need at the simplest steps possible, then the complicated query will work at a preset number of rows thus performance is guaranteed as very good vs very bad plan applied on a few rows is practically non important.
Lesser detail: Pay attention to 'INNER JOIN dim_status' - if there is no FK constraint the cardinality estimator may miss estimated returned rows as been unable to understand the relation between the tables.
I can also see an attempt for optimization as the filter had ascend upwards into the cte. This is a similar plan to what i propose at a lesser restriction. Using my plan will enforce rows searches to the core-root source.

Related

Self-Joins: is there a way to improve the performance of this query?

The purpose of all this is to create a lookup table to avoid a self join down the road, which would involve joins for the same data against much bigger data sets.
In this instance a sales order may have one or both of bill to and ship to customer ID.
The tables here are aggregates of data from 5 different servers, differentiated by the box_id. The customer table is ~1.7M rows, and sales_order is ~55M. The end result is ~52M records and takes on average about 80 minutes to run.
The query:
SELECT DISTINCT sog.box_id ,
sog.sales_order_id ,
cb.cust_id AS bill_to_customer_id ,
cb.customer_name AS bill_to_customer_name ,
cs.cust_id AS ship_to_customer_id ,
cs.customer_name AS ship_to_customer_name
FROM sales_order sog
LEFT JOIN customer cb ON cb.cust_id = sog.bill_to_id AND cb.box_id = sog.box_id
LEFT JOIN customer cs ON cs.cust_id = sog.ship_to_id AND cs.box_id = sog.box_id
The execution plan:
https://www.brentozar.com/pastetheplan/?id=SkjhXspEs
All of this is happening on SQL Server.
I've tried reproducing the bill to and ship to customer sets as CTEs and joining to those, but found no performance benefit.
The only indexes on these tables are the primary keys (which are synthetic IDs). Somewhat curiously the execution plan analyzer is not recommending adding any indexes to either table; it usually wants me to slap indexes on almost everything.
I don't know that there necessarily IS a way to make this run faster, but I am trying to improve my query optimization and have hit the limit of my knowledge. Any insight is much appreciated.
When you run queries like yours -- queries with no WHERE filters -- often the DBMS decides it has to scan entire tables. (In SQL Server execution plans, "clustered index scan" means it is scanning the whole table.) It certainly has to wrangle all the data in the tables. The lookup table you want to create is often called a "materialized view." (An online version of SQL server has built in support for materialized views, but other versions still don't.)
Depending on how you will use your data, you may be better off avoiding this materialized lookup table. If all your uses of your proposed lookup table involve filtering out a small subset of rows using WHERE clauses, an ordinary non-materialized view may be a good choice. When you give queries involving ordinary views, the query planner folds those views into the query, and may recommend helpful indexes.

Materialized view Vs Temp tables in Oracle

I have a base transaction table. Then I have around 15 intermediate steps, where I'm combining dimension tables, performing some aggregation and implementing business logic. The way I'm handling currently is creating temporary tables for intermediate stages and post these 15 steps populating the final result in a physical table. It it a better approach or using materialized view instead of these intermediate temp tables is a better approach. If using materialized views for the intermediate steps is a better approach, can you kindly let me know why?
Have already tried scripting both the approaches, scripted 15 intermediate steps as global temporary table as well as Materialized view. I found marginal improvement in performance in MVs when compared to Temp tables, but comes at the cost of excess physical storage. Not sure which is the best practice and why
Temporary tables write to disk, so there's I/O costs for both reading and writing. Also most sites don't manage their temporary tables properly and they end up on the default temporary tablespace, which is the same TEMP tablespace everybody uses for sorting, etc. So there's potential for resource contention there.
Materialized views are intended for materializing aspects of our data set which are commonly reused by many different queries. That's why the most common use case is for storing higher level aggregates of low level data. That doesn't sound like the use case you have here. And lo!
I'm doing a complete refresh of MVs and not a incremental refresh
So nope.
Then I have around 15 intermediate steps, where I'm combining dimension tables, performing some aggregation and implementing business logic.
This is a terribly procedural way of querying data. Sometimes there's no way of avoiding this, especially in certain data warehouse scenarios. However, it doesn't follow that we need to materialize the outputs of those queries. An alternative approach is to use WITH clauses. The output from one WITH subquery can feed into lower subqueries.
with sq1 as (
select whatever
, count(*) as t1_tot
from t1
group by whatever
) , sq2 as (
select sq1.whatever
, max(t2.blah) as max_blah
from sq1
join t2 on t2.whatever = sq1.whatever
) , sq3 as (
select sq2.whatever
,(t3.meh + t3.huh) as qty
from sq2
join t3 on t3.whatever = sq2.whatever
where t3.something >= sq2.max_blah
)
select sq1.whatever
,sq1.t1_tot
,sq2.max_blah
,sq3.qty
from sq1
join sq2 on sq2.whatever = sq1.whatever
join sq3 on sq3.whatever = sq1.whatever
Not saying it won't be a monstrous query, the terror of the department. But it will probably perform way better than your MViews ot GTTs. (Oracle may choose to materialize those intermediate result sets but we can use hints to affect that.)
You may even find from taking this approach that some of your steps are unnecessary and you can combine several steps into one query. Certainly in real life I would write my toy statement above as one query not a join of three subqueries.
From what you said, I'd say that using (global or private, depending on database version you use) temporary tables is a better choice. Why? Because you are "calculating" something, storing results of those calculations into some tables, reusing them for additional processing. All of that - if it can't be done without temporary tables - is to be done with tables.
Materialized view is, as its name says, a view. It is a result of some query, but - opposed to "normal" views, it actually takes space. Can be refreshed (either on demand, when source data is changed, or based on a schedule). Yes, it has its advantages, though I can't see any in what you are currently doing.

Performance for big query in SQL Server view

I have a big query for a view that takes a couple of hours to run and I feel like it may be possible to work on its performance "a bit"..
The problem is that I am not sure of what I should do. The query SELECT 39 values, LEFT OUTER JOIN 25 tables and each table could have up to a couple of million rows.
Any tip is good. Is there any good way to attack this problem? I tried to look at the actual execution plan on a test with less data (took about 10 min to run) but it's crazy big. Is there any general things I could do to make this faster? Do I have to tackle one small part at the time..?
Maybe there is just one join that slows down everything? How do I detect it? So what I mean for short, how do I work on a query like this?
As a said, all feedback is good. Is there some more information I need to show, tell me!
The query looks something like this:
SELECT DISTINCT
A.something,
A.somethingElse,
B.something,
C.somethingElse,
ISNULL(C.somethingElseElse, '')
C.somethingElseElseElse,
CASE *** THEN D.something ELSE 0,
E.something,
...
U.something
FROM
TableA A
JOIN
TableB B on ...
JOIN
TableC C on ...
JOIN
TableD D on ...
JOIN
TableE E on ...
JOIN
TableF F on ...
JOIN
TableG G on ...
...
JOIN
Table U on ...
Break your problem into manageable pieces. If the execution plan is too large for you to analyze, start with a smaller part of the query, check its execution plan and optimize it.
There is no general answer on how to optimize a query, since there is a whole bunch of possible reasons why a query can be slow. You have to check the execution plan.
Generally the most promising ways to improve performance are:
Indexing:
When you see a a Clustered Index Scan or - even worse (because then you don't have a clustered index) - a Table Scan in your query plan for a table that you join, you need an index for your JOIN predicate. This is especially true if you have tables with millions of entries, and you select only a small subset of those entries. Check also the index suggestions in the execution plan.
You see that the index works when your Clustered Index Scan turns into an Index Seek.
Index includes:
You probably are displaying columns from your joined tables that are different from the fields you use to join (otherwise, why would you need to join then?). SQL Server needs to get the fields that you need from the table, which you see in the execution plan as Key Lookup.
Since you are taking 39 values from 25 tables, there will be very few fields per table that you will need to get (mostly one or two). SQL Server needs to load entire pages of the respecitive table and get the values from them.
In this case, you should INCLUDE the column(s) you want to display in your index to avoid the key lookups. This comes at an increased index size, but considering you only include a few columns, that cost should be neglectable compared to the size of your tables.
Checking views that you join:
When you join VIEWs you should be aware that it basically means an extension to your query (which means also of the execution plan). Do the same performance optimizations for the view as you do for your main query. Also, check if you join tables in the view that you already join in the main query. These joins might be unnecessary.
Indexed views (maybe):
In general, you can add indexes to views you are joining to your query or create one or more indexed views for parts of your query. There are some caveats though:
Indexed views take storage space in your DB, because you store parts of the data multiple times.
There are a lot of restrictions to indexed views, most notably in your case that OUTER JOINs are forbidden. If you can transform at least some of your OUTER JOINs to INNER JOINs this might be an option.
When you join indexed views, don't forget to use WITH(NOEXPAND) in your join, otherwise they might be ignored.
Partitioned tables (maybe):
If you are running on the Enterprise Edition of SQL Server, you can partition your tables. That can be useful if the rows you join are always selected from a small subset of the available rows. You can make a partition for this subset and increase performance.
Summary:
Divide and conquer. Analyze your query bit by bit to optimize it. The most promising options are indexes and index includes. If you still have trouble, go from there.

How to increase SQL Query performance without changing the query

I have a project where I have very complex queries (Legacy project)
There are a lot of queries, stored procedures in the project, etc.
Queries have anywhere from 10-30 joins and slow filtering.
Project cannot be modified right away (Will take at least a year worth of work)
Is there a hardware way to increase performance. Can I use some smart Azure setup with increased computing power to increase speed?
Or what things do I need to look for in a physical server?
Avoid Multiple Joins in a Single Query.
Eliminate Cursors from the Query.
Avoid Use of Non-correlated
Scalar Sub Query.
Avoid Multi-statement Table Valued Functions (TVFs).
Creation and Use of Indexes.
Understand the Data.
Create a Highly Selective Index Filter data separately in hash table
from actual table
Do for every table that contains huge data then apply join on hash
tables rather actual tables in store procedure to combine the result.
Hardware approaches:
More RAM on motherboard and disk controller.
Additional processors (may require a different SQL Server license).
Faster storage devices.
If data is on an external SAN device, consider switching to a device with a faster connection type (Fibre Channel vs iSCSI, ATA over Ethernet (AoE), or HyperSCSI)
You can scale your performance with Azure using Standard and Premium subscriptions. So if you have slow queries you can always throw more hardware at it while you have the benefit of doing so only when needed. You can set your database to automatically scale, so when you're demand is low you pay less and when it's high your workload doesn't suffer.
Azure provides QPI, that basically identifies which of your queries are the most expensive ones so you can optimize them first.
Azure also provides different advisors, like index advisor, that will learn how you use your database and advise indexes to be created or dropped - the best thing about this is that it can do it automatically for you.
If you're thinking in on-prem solution you should consider the operating system cost and hardware cost as well and the time and cost of setting this up and configuring properly. Creating geo-replicated settings can add another level of complexity. So if you need to start fresh and your business requirements allow cloud services I'd say Azure is the way to go since it provides rich telemetry and all kinds of smart database capabilities (and they are coming month after month). Also don't forget that Azure is updated roughly every month while box editions get cumulative update packages after half a year or more.
hardware is very important, the better the faster.
but also you need to check your queries with aspect of performance.
for example,
check for missing indexes with actual execution plans. if there is,
you must add.
learn how to read execution plans & STATICS. cpu cost
is important. table-scans are deadly :) do not use.
rebuild indexes frequently.
if you are using ms-sql, you need "NOLOCK" property
after tables, otherwise you lock your table while reading/select.
when you are joining table, try to add your conditions on join. not
"where clause"
for example
SELECT * FROM TABLE A WITH (NOLOCK) INNER JOIN TABLE B WITH (NOLOCK)
ON A.ID = B.ID
WHERE B.SOMECOLUMN IS NOT NULL
SELECT * FROM TABLE A WITH (NOLOCK) INNER JOIN TABLE B WITH (NOLOCK)
ON A.ID = B.ID AND B.SOMECOLUMN IS NOT NULL
second one is better.
avoid "ORDER BY", "DISTINCT", if not necessary.
:)

Slow SQL Queries, Order Table by Date?

I have a Sql-Server-2008 database that I am querying from on the regular that was over 30 million entries (joy!). Unfortunately this database cannot be drastically changed because it is still in use for R/D.
When I query from this database, it takes FOREVER. By that I mean I haven't been patient enough to wait for results (after 2 mins I have to cancel to avoid locking the R/D department out). Even if I use a short date range (more than a few months), it is basically impossible to get any results from it. I am querying with requirements from 4 of the columns and unfortunately have to use an inner-join for another table (which I've been told is very costly in terms of query efficiency -- but it unavoidable). This inner joined table has less than 100k entries.
What I was wondering, is it is possible to organize the table to have it defaultly be ordered by date to reduce the number of results it has to search through?
If this is not possible, is there anything I can do to reduce query times? Is there any other useful information that could assist me in coming up with a solution?
I have included a sample of the query that I use:
SELECT DISTINCT N.TestName
FROM [DalsaTE].[dbo].[ResultsUut] U
INNER JOIN [DalsaTE].[dbo].[ResultsNumeric] N
ON N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
AND N.ResultsUutId = U.ResultsUutId
WHERE U.DeviceName = 'BO-32-3HK60-00-R'
AND U.StartDateTime > '2011-11-25 01:10:10.001'
ORDER BY N.TestName
Any help or suggestions are appreciated!
It sounds like datetime may be a text based field and subsequently an index isn't being used?
Could you try the following to see if you have any speed improvement:
select distinct N.TestName
from [DalsaTE].[dbo].[ResultsUut] U
inner join [DalsaTE].[dbo].[ResultsNumeric] N
on N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
and N.ResultsUutId = U.ResultsUutId
where U.DeviceName = 'BO-32-3HK60-00-R'
and U.StartDateTime > cast('2011-11-25 01:10:10.001' as datetime)
order by N.TestName
It would also be worth trying changing your inner join to a left outer join as those occasionally perform faster for no conceivable reason (at least one that I'm not aware of).
you can add an index based on your date column, which should improve your query time. You can either use an alter table command, or use the table designer.
Is the sole purpose of the join to provide sorting? If so, a quick thing to try would be to remove this, and see how much of a difference it makes - at least then you'll know where to focus your attention.
Finally, SQL server management studio has some useful tools such as execution plans that can help diagnose performance issues. Good luck!
There are a number of problems which may be causing delays in the execution of your query.
Indexes (except the primary key) do not reorder the data, they merely create an index (think phonebook) which orders a number of values and points back to the primary key.
Without seeing the type of data or the existing indexes, it's difficult, but at the very least, the following ASCENDING indexes might help:
[DalsaTE].[dbo].[ResultsNumeric] ModeDescription and ResultsUutId and TestName
[DalsaTE].[dbo].[ResultsUut] StartDateTime and DeviceName and ResultsUutId
Without the indexes above, the sample query you gave can be completed without performing a single lookup on the actual table data.