Optimize query when updating - sql

I have the following query that took too much time to be executed.
How to optimize it?
Update Fact_IU_Lead
set
Fact_IU_Lead.Latitude_Point_Vente = Adr.Latitude,
Fact_IU_Lead.Longitude_Point_Vente = Adr.Longitude
FROM Dim_IU_PointVente
INNER JOIN
Data_I_Adresse AS Adr ON Dim_IU_PointVente.Code_Point_Vente = Adr.Code_Point_Vente
INNER JOIN
Fact_IU_Lead ON Dim_IU_PointVente.Code_Point_Vente = Fact_IU_Lead.Code_Point_Vente
WHERE
Latitude_Point_Vente is null
or Longitude_Point_Vente is null and Adr.[Error]=0

Couple of things I would look at on this to help.
How many records are on each table? If it's millions, then you may need to cycle through them.
Are the columns you're joining on or filtering on indexed on each table? If no, add them in! typically a huge speed difference with less cost.
Are the columns you're joining on stored as text instead of geo-spatial? I've had much better performance out of geo-spatial data types in this scenario. Just make sure your SRIDs are the same across tables.
Are the columns you're updating indexed, or is the table that's being updated heavy with indexes? Tons of indexes on a large table can be great for looking things up, but kills update/insert speeds.
Take a look at those first.
I've added a bit of slight cleaning to your code in regard to aliases.
Also, take a look at the where clauses. Choose one of them.
When you have mix and's and or's the best thing you can ever do is add parenthesis.
At a minimum, you'll have zero question regarding your thoughts when you wrote it.
At most, you'll know that SQL is executing your logic correctly.
Update Fact_IU_Lead
set
Latitude_Point_Vente = Adr.Latitude --Note the table prefix is removed
, Longitude_Point_Vente = Adr.Longitude --Note the table prefix is removed
FROM Dim_IU_PointVente as pv --Added alias
INNER JOIN
Data_I_Adresse AS adr ON pv.Code_Point_Vente = adr.Code_Point_Vente --carried alias
INNER JOIN
Fact_IU_Lead as fl ON pv.Code_Point_Vente = fl.Code_Point_Vente --added/carried alias
WHERE
(pv.Latitude_Point_Vente is null or pv.Longitude_Point_Vente is null) and adr.[Error] = 0 --carried alias, option one for WHERE change
pv.Latitude_Point_Vente is null or (pv.Longitude_Point_Vente is null and adr.[Error] = 0) --carried alias, option two for WHERE change

Making joins is usually expensive, the best approach in your case will be to place the update into a stored procedure, split your update into selects and use a transaction to keep everything consistent (if needed) instead.
Hope this answer point you in the right direction :)

Related

SQL - Make Join Query Faster

I've made a query that selects 2 values from 2 tables. I need to run this query about 32 times when a visitor visits my website. This makes the page quite slow (it takes over 5 seconds to fully load).
The query looks like this:
SELECT tmdb.name, patch.sfo_title
FROM tmdb
RIGHT JOIN patch
ON tmdb.titleid = CONCAT(patch.cusa, '_00')
WHERE cusa = :titleid
LIMIT 1
is there any way to make this query faster? The query isn't the biggest operation if I look at it, so I'm not really sure why it's so slow?
I would write the query as a left join (out of preference, not performance):
SELECT t.name, p.sfo_title
FROM patch p LEFT JOIN
tmdb t
ON t.titleid = CONCAT(p.cusa, '_00')
WHERE p.cusa = :titleid
LIMIT 1;
Then for performance, I would recommend indexes on patch(cusa, sfo_title) and t(titleid).
Note that the use of LIMIT without ORDER BY is suspicious, although you might have reasons for it.
You are joining two tables based on a CALCULATED field? No wonder it is slow. I have no idea how your tables get maintained, but you need to get that concat'ed value of CUSA into the data base as a separate field and get it indexed, as Gordon Linoff suggested. You could even maintain it through On Insert and On Update triggers. Personally, I would examine why you have similar but different keys in your two tables and try to rationalize it down to one. That Concat(CUSA, '_00' ) looks suspiciously like an opportunity to simplify the application.

Performance of JOINS in SAP HANA Calculation View

For Example:
I have 4 columns (A,B,C,D).
I thought that instead of connecting each and every column in join I should make a concatenated column in both projection(CA_CONCAT-> A+B+C+D) and make a join on this, Just to check on which method performance is better.
It was working faster earlier but in few CV's this method is slower sometimes, especially at time of filtering!
Can any one suggest which is an efficient method?
I don't think the JOIN conditions with concatenated fields will work better in performance.
Although we say in general there is not a need for index on column tables on HANA database, the column tables have a structure that works with an index on every column.
So if you concatenate 4 columns and produce a new calculated field, first you loose the option to use these index on 4 columns and the corresponding joining columns
I did not check the execution plan, but it will probably make a full scan on these columns
In fact I'm surprised you have mentioned that it worked faster, and experienced problems only on a few
Because concatenation or applying a function on a database column is even only by itself a workload over the SELECT process. It might include implicit type cast operation, which might bring additional workload more than expected
First I would suggest considering setting your table to column store and check the new performance.
After that I would suggest to separate the JOIN to multiple JOINs if you are using OR condition in your join.
Third, INNER JOIN will give you better performance compare to LEFT JOIN or LEFT OUTER JOIN.
Another thing about JOINs and performance, you better use them on PRIMARY KEYS and not on each column.
For me, both the time join with multiple fields is performing faster than join with concatenated fields. For filtering scenario, planviz shows when I join with multiple fields, filter gets pushed down to both the tables. On the other hand, when I join with concatenated field only one table gets filtered.
However, if you put filter on both the fields (like PRODUCT from Tab1 and MATERIAL from Tab2), then you can push the filter down to both the tables.
Like:
Select * from CalculationView where PRODUCT = 'A' and MATERIAL = 'A'

Poor performance with stacked joins

I'm not sure I can provide enough details for an answer, but my company is having a performance issue with an older mssql view. I've narrowed it down to the right outer joins, but I'm not familiar with the structure of joins following joins without a "ON" with each one, as in the code snippet below.
How do I write the joins below to either improve performance or to the simpler format of Join Tablename on Field1 = field2 format ?
FROM dbo.tblObject AS tblObject_2
JOIN dbo.tblProspectB2B PB ON PB.Object_ID = tblObject_2.Object_ID
RIGHT OUTER JOIN dbo.tblProspectB2B_CoordinatorStatus
RIGHT OUTER JOIN dbo.tblObject
INNER JOIN dbo.vwDomain_Hierarchy
INNER JOIN dbo.tblContactUser
INNER JOIN dbo.tblProcessingFile WITH ( NOLOCK )
LEFT OUTER JOIN dbo.enumRetentionRealization AS RR ON RR.RetentionRealizationID = dbo.tblProcessingFile.RetentionLeadTypeID
INNER JOIN dbo.tblLoan
INNER JOIN dbo.tblObject AS tblObject_1 WITH ( NOLOCK ) ON dbo.tblLoan.Object_ID = tblObject_1.Object_ID ON dbo.tblProcessingFile.Loan_ID = dbo.tblLoan.Object_ID ON dbo.tblContactUser.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.vwDomain_Hierarchy.Object_ID = tblObject_1.Domain_ID ON dbo.tblObject.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.tblProspectB2B_CoordinatorStatus.Object_ID = dbo.tblLoan.ReferralSourceContactID ON tblObject_2.Object_ID = dbo.tblLoan.ReferralSourceContactID
Your last INNER JOIN has a number of ON statements. Per this question and answer, such syntax is equivalent to a nested subquery.
That is one of the worst queries I have ever seen. Since I cannot figure out how it is supposed to work without the underlying data, this is what I suggest to you.
First find a good sample loan and write a query against this view to return where loan_id = ... Now you have a data set you chan check you changes against more easily than the, possibly, millions of records this returns. Make sure these results make sense (that right join to tbl_objects is bothering me as it makes no sense to return all the objects records)
Now start writing your query with what you think should be the first table (I would suggest that loan is the first table, if it not then the first table is Object left joined to loan)) and the where clause for the loan id.
Check your results, did you get the same loan information as teh view query with the where clause added?
Then add each join one at a time and see how it affects the query and whether the results appear to be going off track. Once you have figured out a query that gives the same results with all the tables added in, then you can try for several other loan ids to check. Once those have checked out, then run teh whole query with no where clause and check against the view results (if it is a large number you may need to just see if teh record counts match and visually check through (use order by on both things in order to make sure your results are in the same order). In the process try to use only left joins and not that combination of right and left joins (its ok to leave teh inner ones alone).
I make it a habit in complex queries to do all the inner joins first and then the left joins. I never use right joins in production code.
Now you are ready to performance tune.
I woudl guess the right join to objects is causing a problem in that it returns teh whole table and the nature of that table name and teh other joins to the same table leads me to believe that he probably wanted a left join. Without knowing the meaning of the data, it is hard to be sure. So first if you are returning too many records for one loan id, then consider if the real problem is that as tables have grown, returning too many records has become problematic.
Also consider that you can often take teh view and replace it with code to get the same results. Views calling views are a poor technique that often leads to performance issues. Often the views on top of the other views call teh same tables and thus you end up joining to them multiple times when you don;t need to.
Check your Explain plan or Execution plan depending on what database backend you have. Analysis of this should show where you might have missing indexes.
Also make sure that every table in the query is needed. This especially true when you join to a view. The view may join to 12 other tables but you only need the data from one of them and it can join to one of your tables. MAke sure that you are not using select * but only returning teh fields the view actually needs. You have inner joins so, by definition, select * is returning fields you don't need.
If your select part of teh view has a distinct in it, then consider if you can weed down the multiple records you get that made distinct needed by changing to a derived table or adding a where clause. To see what is causing the multiples, you may need to temporarily use select * to see all the columns and find out which one is not uniques and is causing the issue.
This whole process is not going to be easy or fun. Just take it slowly, work carefully and methodically and you will get there and have a query that is understandable and maintainable in the end.

How to improve the performance of multiple joins

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO
Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)
You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and
It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

Is there an alternative to joining 3 or more tables?

Is it a good idea to join three or more tables together as in the following example. I'm trying to focus on performance. Is there any way to re-write this query that would be more efficient and faster performing? I've tried to make is as simplistic as possible.
select * from a
join b on a.id = b.id
join c on a.id = c.id
join d on c.id = d.id
where a.property1 = 50
and b.property2 = 4
and c.property3 = 9
and d.property4 = 'square'
If you want faster performance, make sure that all of the join's are covered by an index (either clustered or non-clustered). It looks like this could all be done in your query above by creating an index on the id and appropriate property columns of each table
You could make it faster if you only selected a subset of the columns, at the moment you're selecting everything from all 3 tables.
Performance wise, I think it really depends on the number of records in each table, and making sure that you have the proper indexes defined. (I'm also assuming that SELECT * is a placeholder; you should avoid wildcards)
I'd start off by checking out your execution plan, and start optimizing there. If you're still getting suboptimal performance, you could try using temp tables to break up the 4 table join into separate smaller joins.
Assuming a normalized database, this is the best you can do, in terms of structuring a query and the joins in place.
There are other options to look at, including adding indexes on the different join and select clause columns, denormalizing the table structures and narrowing the result set.
Adding indexes on the join columns (which appear to be primary keys, so may already be indexed) will help with the join performance, indexing the columns in the select clause will help with speeding up the filtering on each table.
If you denormalize, you get a structure with duplicate data with all the implications of duplicate data (data maintenance issues mostly), but you gain performance as you no longer need to join.
When selecting columns, you should specify which ones you want - using * is generally a bad idea. This way you only transfer the data that the application really needs.