Optimize right joins - sql

The following query is working as expected. But I guess there is enough room for optimization. Any help?
SELECT a.cond_providentid,
b.flag1
FROM c_master a
WHERE a.cond_status = 'OnService'
ORDER BY a.cond_providentid,
a.rto_number;

May I suggest placing the query within your left join in a database view - in that way, the code can be much more cleaner and easier to maintain.
Also, check the columns that you often use the most.. it could be a candidate for indexing so that when you run your query, it can be faster.
You also might check your column data types... I see that you have this type of code:
(CASE
WHEN b.tray_type IS NULL
THEN 1
ELSE 0
END) flag2
If you have a chance to change the design for your tables, (i.e. b.Tray_Type to bit, or use a computed column to determine the flag) it would run faster because you don't have to use Case statements to determine the flag. You can just add it as another column for your query.
Hope this helps! :)
Ann

Related

Optimize query when updating

I have the following query that took too much time to be executed.
How to optimize it?
Update Fact_IU_Lead
set
Fact_IU_Lead.Latitude_Point_Vente = Adr.Latitude,
Fact_IU_Lead.Longitude_Point_Vente = Adr.Longitude
FROM Dim_IU_PointVente
INNER JOIN
Data_I_Adresse AS Adr ON Dim_IU_PointVente.Code_Point_Vente = Adr.Code_Point_Vente
INNER JOIN
Fact_IU_Lead ON Dim_IU_PointVente.Code_Point_Vente = Fact_IU_Lead.Code_Point_Vente
WHERE
Latitude_Point_Vente is null
or Longitude_Point_Vente is null and Adr.[Error]=0
Couple of things I would look at on this to help.
How many records are on each table? If it's millions, then you may need to cycle through them.
Are the columns you're joining on or filtering on indexed on each table? If no, add them in! typically a huge speed difference with less cost.
Are the columns you're joining on stored as text instead of geo-spatial? I've had much better performance out of geo-spatial data types in this scenario. Just make sure your SRIDs are the same across tables.
Are the columns you're updating indexed, or is the table that's being updated heavy with indexes? Tons of indexes on a large table can be great for looking things up, but kills update/insert speeds.
Take a look at those first.
I've added a bit of slight cleaning to your code in regard to aliases.
Also, take a look at the where clauses. Choose one of them.
When you have mix and's and or's the best thing you can ever do is add parenthesis.
At a minimum, you'll have zero question regarding your thoughts when you wrote it.
At most, you'll know that SQL is executing your logic correctly.
Update Fact_IU_Lead
set
Latitude_Point_Vente = Adr.Latitude --Note the table prefix is removed
, Longitude_Point_Vente = Adr.Longitude --Note the table prefix is removed
FROM Dim_IU_PointVente as pv --Added alias
INNER JOIN
Data_I_Adresse AS adr ON pv.Code_Point_Vente = adr.Code_Point_Vente --carried alias
INNER JOIN
Fact_IU_Lead as fl ON pv.Code_Point_Vente = fl.Code_Point_Vente --added/carried alias
WHERE
(pv.Latitude_Point_Vente is null or pv.Longitude_Point_Vente is null) and adr.[Error] = 0 --carried alias, option one for WHERE change
pv.Latitude_Point_Vente is null or (pv.Longitude_Point_Vente is null and adr.[Error] = 0) --carried alias, option two for WHERE change
Making joins is usually expensive, the best approach in your case will be to place the update into a stored procedure, split your update into selects and use a transaction to keep everything consistent (if needed) instead.
Hope this answer point you in the right direction :)

Can I alias and reuse my subqueries?

I'm working with a data warehouse doing report generation. As the name would suggest, I have a LOT of data. One of the queries that pulls a LOT of data is getting to take longer than I like (these aren't performed ad-hoc, these queries run every night and rebuild tables to cache the reports).
I'm looking at optimizing it, but I'm a little limited on what I can do. I have one query that's written along the lines of...
SELECT column1, column2,... columnN, (subQuery1), (subquery2)... and so on.
The problem is, the sub queries are repeated a fair amount because each statement has a case around them such as...
SELECT
column1
, column2
, columnN
, (SELECT
CASE
WHEN (subQuery1) > 0 AND (subquery2) > 0
THEN CAST((subQuery1)/(subquery2) AS decimal)*100
ELSE 0
END) AS "longWastefulQueryResults"
Our data comes from multiple sources and there are occasional data entry errors, so this prevents potential errors when dividing by a zero. The problem is, the sub-queries can repeat multiple times even though the values won't change. I'm sure there's a better way to do it...
I'd love something like what you see below, but I get errors about needing sq1 and sq2 in my group by clause. I'd provide an exact sample, but it'd be painfully tedious to go over.
SELECT
column1
, column2
, columnN
, (subQuery1) as sq1
, (subquery2) as sq2
, (SELECT
CASE
WHEN (sq1) > 0 AND (sq2) > 0
THEN CAST((sq1)/(sq2) AS decimal)*100
ELSE 0
END) AS "lessWastefulQueryResults"
I'm using Postgres 9.3 but haven't been able to get a successful test yet. Is there anything I can do to optimize my query?
Yup, you can create a Temp Table to store your results and query them again in the same session
I'm not sure how good the Postgres optimizer is, so I'm not sure whether optimizing in this way will do any good. (In my opinion, it shouldn't because the DBMS should be taking care of this kind of thing; but it's not at all surprising if it isn't.) OTOH if your current form has you repeating query logic, then you can benefit from doing something different whether or not it helps performance...
You could put the subqueries in with clauses up front, and that might help.
with subauery1 as (select ...)
, subquery2 as (select ...)
select ...
This is similar to putting the subqueries in the FROM clause as Allen suggests, but may offer more flexibility if your queries are complex.
If you have the freedom to create a temp table as Andrew suggests, that too might work but could be a double-edged sword. At this point you're limiting the optimizer's options by insisting that the temp tables be populated first and then used in the way that makes sense to you, which may not always be the way that actually gets the most efficiency. (Again, this comes down to how good the optimizer is... it's often folly to try to outsmart a really good one.) On the other hand, if you do create temp or working tables, you might be able to apply useful indexes or stats (if they contain large datasets) that would further improve downstream steps' performance.
It looks like many of your subqueries might return single values. You could put the queries into a procedure and capture those individual values as variables. This is similar to the temp table approach, but doesn't require creation of objects (as you may not be able to do that) and will have less risk of confusing the optimizer by making it worry about a table where there's really just one value.
Sub-queries in the column list tend to be a questionable design. The first approach I'd take to solving this is to see if you can move them down to the from clause.
In addition to allowing you to use the result of those queries in multiple columns, doing this often helps the optimizer to come up with a better plan for your query. This is because the queries in the column list have to be executed for every row, rather than merged into the rest of the result set.
Since you only included a portion of the query in your question, I can't demonstrate this particularly well, but what you should be looking for would look more like:
SELECT column1,
column2,
columnn,
subquery1.sq1,
subquery2.sq2,
(SELECT CASE
WHEN (subquery1.sq1) > 0 AND (subquery2.sq2) > 0 THEN
CAST ( (subquery1.sq1) / (subquery2.sq2) AS DECIMAL) * 100
ELSE
0
END)
AS "lessWastefulQueryResults"
FROM some_table
JOIN (SELECT *
FROM other_table
GROUP BY some_columns) subquery1
ON some_table.some_columns = subquery1.some_columns
JOIN (SELECT *
FROM yet_another_table
GROUP BY more_columns) subquery1
ON some_table.more_columns = subquery1.more_columns

Can a SQL Case statment be used to test if a Join statement should be used?

What I am trying to do, using java, is:
access a database
read a record from table "Target_stats"
if the field "threat_level" = 0, doAction1
if the field "threat_level" > 0, get additional fields from another table "Attacker_stats" and doAction2
read the next record
Now I have everything I need but a well thought out SQL statement that will allow me to only go through the database only once, if this does not work I suspect I will need to use two separate SQL statements and go through the database a second time. I do not have a clear understanding of case statements, so I will just provide pseudo code using an if statement.
SELECT A.1, A.2, A.3
if(A.3 > 0){
SELECT A.1, A.2, A.3, B.1, B.3
FROM A
JOIN B
ON A.1 = B.1
}
FROM A
Can anyone shed any light on my situation?
EDIT: Thankyou both for your time and effort. I understand both of your comments and I believe that I am headed more towards the right direction however, I'm still having some trouble. I didn't know about SQLfiddle before so I have now gone ahead and made a sample DB and tried to demonstrate my purpose. Here is the link: http://sqlfiddle.com/#!3/ea108/1 What I want to do here is Select target_stats.server_id, target_stats.target, target_stats.threat_level Where interval_id=3 and if the threat_level>0 I want to retrieve attack_stats.attacker, attack_stats.sig_name Where interval_id=3. Again, thankyou for your time and effort it is very useful to me
EDIT: after some tinkering around, I figured it out. thankyou so much for your help
As #Ocelot20 said, SQL is not procedural code. It is based on set-based operations, not per row operations. One immediate consequence of this is that the SELECT in your pseudo-example is wrong as it relies on rows in the same result set having different column lists.
That said, you can get pretty close to your pseudo-code example, if you can tolerate NULL values where the join is not possible.
Here's an example that (to me anyway) seems to be close to what your are driving at:
select *
from A
left outer join B
on A.a = B.d and A.a > 2
You can see it in action in this SQLFiddle, which should show you what sort of output to expect.
Note that what this is actually saying is something like this:
Fetch all the records from table A and also fetch any records from
table B have their d column the same as the a column in table
A, provided the value of A.a is greater than 2.
(This was picked for convenience. In my rather contrived example shifting the conditional column does not effect the output as can be see here).

Why does my query take 2 minutes to run?

Note - There are about 2-3 million records in the db.
SELECT
route_date, stop_exception_code, unique_id_no,
customer_reference, stop_name, stop_comment,
branch_id, customer_no, stop_expected_pieces,
datetime_updated, updated_by, route_code
FROM
cops_reporting.distribution_stop_information distribution_stop_information
WHERE
(stop_exception_code <> 'null') AND
(datetime_updated >= { ts '2011-01-25 00:00:01' })
ORDER BY datetime_updated DESC
If you posted the indexes you already have on this table, or maybe a query execution plan, it would be easier to know. As it is, I'm going to guess that you could improve performance if you create a combined index that contains stop_exception_code and datetime_updated. And I can't promise this will actually work, but it might be worth a shot. I can't say much more than that without any other information...
Some rules of thumb:
Index on columns that JOIN.
Index on columns used in WHERE clauses.
'Not equals' is always slower than an 'Equals' condition. Consider splitting the table into those that are null and those that are not, or hiving it off into a joined table as a index.
Using proper JOIN syntax i.e. being explicit about where joins are by writing INNER JOIN speeds things up on some databases (I've seen a 10min+ query get down to 30 secs on mysql just by this change alone)
use aliases for each table and prefix to each column
store as a function/procedure and it will precompile and get much quicker
stop_exception_code <> 'null'
Please tell me that 'null' isn't a string in your database. Standard SQL would be
stop_exception_code IS NOT NULL
or
stop_exception_code is NULL
I'm not sure what a NULL stop_exception_code might mean to you. But if it means something like "I don't know", then using a specific value for "I don't know" might let your server use an index on that column, and index that it might not be able to use for NULL. (Or maybe you've already done that by using the string 'null'.)
Without seeing your DDL, actual query, and execution plan, that's about all I can tell you.

Why is my SQL Server cursor very slow?

I am using a Cursor in my stored procedure. It works on a database that has a huge number of data. for every item in the cursor i do a update operation. This is taking a huge amount of time to complete. Almost 25min. :( .. Is there anyway i can reduce the time consumed for this?
When you need to do a more complex operation to each row than what a simple update would allow you, you can try:
Write a User Defined Function and use that in the update (probably still slow)
Put data in a temporary table and use that in an UPDATE ... FROM:
Did you know about the UPDATE ... FROM syntax? It is quite powerful when things get more complex:
UPDATE
MyTable
SET
Col1 = CASE WHEN b.Foo = "Bar" THEN LOWER(b.Baz) ELSE "" END,
Col2 = ISNULL(c.Bling, 0) * 100 / Col3
FROM
MyTable
INNER JOIN MySecondTable AS b ON b.Id = MyTable.SecondId
LEFT JOIN ##MyTempTable AS c ON c.Id = b.ThirdId
WHERE
MyTabe.Col3 > 0
AND b.Foo NOT IS NULL
AND MyTable.TheDate > GETDATE() - 10
The example is completely made-up and may not make much sense, but you get the picture of how to do a more complex update without having to use a cursor. Of course, a temp table would not necessarily be required for it to work. :-)
The quick answer is not to use a cursor. The most efficient way to update lots of records is to use an update statement. There are not many cases where you have to use a cursor rather than an update statement, you just have to get clever about how you write the update statement.
If you posted a snapshot of your SQL you might get some help to achieve what you're after.
I would avoid using a cursor, and work with views or materialized views if possible. Cursors is something that Microsoft doesn't optimize much in SQL Server, because most of the time, you should be using a more general SQL statement (SELECT, INSERT, UPDATE, DELETE) than with a cursor.
If you cannot perform the same end result even with using views or subqueries, you may want to use a temp table or look at improving the data model.
You don't provide much specific information, so all that I can do is give some general tips.
Are you updating the same data that the cursor is operating over?
What type of cursor? forward only? static? keyset? dynamic?
The UPDATE...FROM syntax mentioned above is the prefered method. You can also do a sub query such as the following.
UPDATE t1
SET t1.col1 = (SELECT top 1 col FROM other_table WHERE t1_id = t1.ID AND ...)
WHERE ...
Sometimes this is the only way to do it, as each column update may depend on a differant criteria (or a diferant table), and there may be a "best case" that you want to preserve bysing the order by clause.
Can you post more information about the type of update you are doing?
Cursors can be very useful in the right context (I use plenty of them), but if you have a choice between a cursor and a set-based operation, set-based is almost always the way to go.
But if you don't have a choice, you don't have a choice. Can't tell without more detail.